Working Paper Series No. 66
This paper reports the results of research and analysis undertaken by Census Bureau Staff. It has undergone a more limited review than official Census Bureau publications. This report is released to inform interested parties of research and to encourage discussion.
On March 1, 2001, the U.S. Census Bureau issued the recommendation of the Executive Steering Committee for A.C.E. Policy (ESCAP) that the Census 2000 Redistricting Data not be adjusted based on the Accuracy and Coverage Evaluation (A.C.E.). By mid-October 2001, the Census Bureau had to recommend whether Census 2000 data should be adjusted for future uses, such as the census long form data products, post-censal population estimates, and demographic survey controls. In order to inform that decision, the ESCAP requested that further research be conducted.
Between March and September 2001, the Demographic Analysis-Population Estimates (DAPE) research project addressed the discrepancy between the demographic analysis data and the A.C.E. adjusted estimates of the population. Specifically, the research examined the historical levels of the components of population change to address the possibility that the 1990 Demographic Analysis understated the national population and assessed whether demographic analysis had not captured the full population growth between 1990 and 2000. Assumptions regarding the components of international migration (specifically, emigration, temporary migration, legal migration, and unauthorized migration) contain the largest uncertainty in the demographic analysis estimates. Therefore, evaluating the components of international migration was a critical activity in the DAPE project.
This report focuses on the consistency of the data sources related to the foreign-born population. Specifically, the analysis examines the comparability and consistency of data from three different data sources collected in 2000: the March 2000 Current Population Survey (original and reweighted); the Census 2000 Supplementary Survey; and a provisional Census 2000 nativity data file. We examine differences in estimates among survey/census items specific to the foreign-born population - citizenship, place of birth and year of entry - as well as by general population characteristics, such as age, sex, race and Hispanic origin.
Our evaluation reveals that there is no significant difference in the total foreign born estimated among the data sources (when controlling for differences in coverage and methodologies). The provisional Census 2000 estimate of 30.6 million foreign born does not differ significantly from the Census 2000 Supplementary Survey estimate of 30.5 million foreign born. Further, both of these figures fall within the 90-percent confidence interval of the (reweighted) March 2000 Current Population Survey estimate of 30.1 million foreign born. Additional detailed comparisons show similar results, a general consistency in nativity data across data sources.
Table of Contents
Table of Contents
Mode of collection
Question position, format and skip patterns
Total foreign born
List of Tables
All Tables in Excel (153k) | All Tables in PDF (763k)
(includes Appendices A and B)
Table 1. Summary of Coverage Among Data Sources: 2000
Excel (15k) | PDF (41k)
Table 2. Census 2000 Supplementary Survey Population by Nativity and Mode of Data Collection
Excel (15k) | PDF (44k)
Table 3. Item Sequence, by Data Source: 2000
Excel (17k) | PDF (47k)
Table 4. Nativity Item Response Rates, by Data Source: 2000
Excel (16k) | PDF (45k)
Table 5. Nativity Status of the Population, by Data Source: 2000
Excel (18k) | PDF (46k)
Table 6. Citizenship Status, Place of Birth and Year of Entry of the Foreign-Born Population, by Data Source: 2000
Excel (23k) | PDF (50k)
Table 7. Age and Sex Distributions of the Foreign-Born Population, by Data Source: 2000
Excel (23k) | PDF (49k)
Table 8. Hispanic Origin and Race Distributions of the Foreign-Born Population, by Data Source: 2000
Excel (20k) | PDF (49k)
Table 9. Foreign-Born Population Entering Before 1990, by Data Source: 2000
Excel (26k) | PDF (52k)
Table 10. Foreign-Born Population Entering 1990 to 1999, by Data Source: 2000
Excel (26k) | PDF (52k)
Table 11. Foreign-Born Population Born in Europe, by Data Source: 2000
Excel (26k) | PDF (52k)
Table 12. Foreign-Born Population Born in Asia, by Data Source: 2000
Excel (26k) | PDF (53k)
Table 13. Foreign-Born Population Born in Mexico, by Data Source: 2000
Excel (26k) | PDF (53k)
Table 14. Foreign-Born Population Born in Other Latin America: 2000
Excel (26k) | PDF (52k)
List of Figures
Figure 1. Place-of-Birth Items, by Data Source (20k)
Figure 2. Citizenship Items, by Data Source (26k)
Figure 3. Year-of-Entry Items, by Data Source (15k)
Appendix A. Citizenship Status of the Population, by Data Source: 2000
Excel (22k) | PDF (54k)
Appendix B. General Characteristics of the Foreign-Born Population, by Data Source: 2000
Excel (34k) | PDF (64k)
Appendix C. DAPE Census 2000 data file compilation Appendix D. Demographic Analysis-Population Estimates (DAPE) Research Project Reports Related to Evaluating Components of International Migration
Population Division Working Paper Series
Evaluating Components of International Migration:
Consistency of 2000 Nativity Data
We examine the consistency of the data sources related to the foreign-born population used for several components of the Demographic Analysis Population Estimates (DAPE) project. Initial examinations of the Demographic Analysis (DA) assumptions relied heavily on tabulations gleaned from the March 2000 Current Population Survey (CPS). Since that time, however, other data sources have been introduced, including the provisional Census 2000 nativity data, the March 2000 CPS reweighted to results from Census 2000, and the Census 2000 Supplementary Survey (C2SS). Consequently, this report seeks to examine the consistency of the data associated with the foreign-born population across all data products currently under scrutiny, and to address the potential implications of significant differences.
Using SAS, we examine distributions of the nativity data from each data source across various characteristics intrinsic to the Demographic Analysis estimates. We conduct Sigma tests to assess statistically significant differences among the estimates produced.
We analyze the March 2000 CPS in both its original form - based on distributions from the 1990 Census1 adjusted for underenumeration and carried forward to March 2000 - and its reweighted form - based on results from the 2000 Census. For a detailed description of the CPS, see "Technical Paper 63: Current Population Survey Design and Methodology."
The provisional Census 2000 file was developed jointly by the Population Division, Decennial Systems and Contracts Management Office (DSCMO) and Decennial Statistical Methods Division (DSSD) to best approximate the 100-percent Census 2000 data file. (See Appendix B for a detailed explanation of the data compilation process.) For the purposes of this examination, we analyze Census 2000 data limited to households only (i.e., excluding group quarters2), which more closely reflect the respondent base of the two comparison data sources: March 2000 CPS and C2SS.
The C2SS file was last updated on 9 July 2001 and serves as the third data source included in this comparison. For a detailed explanation of these data, see "Accuracy of the Data (2000)" (Demographic Surveys Division 2001).
The March CPS, Census 2000 and C2SS contain three nativity-specific items: Place of birth, Citizenship, and Year of entry.
The Place-of-birth questions identify where each respondent was born: name of state for those born within the United States; or, country name for those born elsewhere. The Citizenship questions separate respondents into various citizenship categories based on how citizenship was obtained, or into a residual non-citizen group. The Year-of-entry questions identify the year during which each respondent born outside the United States came to the United States.
These three data sources contain no other direct measures of nativity, although the March 2000 CPS also contains information on Place of birth for the respondent's mother and father.
In order to adequately examine and contrast figures from the various data sources, one must consider the overall comparability of the data in question. Because each data set analyzed here is unique, based on differing census/survey designs and methodologies, several important factors have the potential to significantly affect counts of the foreign-born population. The following sections address various areas of comparability.
The March 2000 CPS and the C2SS are surveys. As such, they produce only samples upon which statistical inferences can be made for the total population, as measured by Census 2000.
The March 2000 CPS is based on a sample of 50,000 households within 754 primary sampling units throughout the United States (Technical Paper 63). Households are inducted into the sample based on the 1990 Census address file and its subsequent updates throughout the decade. The sample is restricted to the "civilian, non-institutionalized" population. Consequently, individuals residing in many group quarters, such as army barracks, hospitals and prisons, are excluded.3 A sample household participates in the survey over a 16-month period, during which it is surveyed during the first four months, dropped the following eight months, and then rotated back in the remaining four months. Because respondent households respond to questions regarding nativity only once upon inclusion in the survey, a data value for a nativity item in the March 2000 Current Population Survey may reflect the response received as many as 15 months prior. Subsequent changes to a respondent's citizenship status after this time would not be known.
Like the CPS, the C2SS is a national sample of 700,000 households and excludes group quarters. Unlike the CPS, however, the C2SS uses a sampling frame based on the 2000 Master Address File (MAF), the same source used for Census 2000. The Census 2000 Supplementary Survey was conducted over a time period from January through December, 2000.
The population universe for Census 2000 is the resident population of the United States as of 1 April 2000. All housing units are identified and enumerated using the 2000 MAF. Nativity questions, however, are included only on the long form, a sample of the U.S. population. Because the Census seeks to count the entire U.S. resident population, the data include individuals living in group quarters. For the purposes of this comparison, however, the provisional Census 2000 data are divided into household and group quarters populations.
Table 1 summarizes these basic differences in coverage across the three data sources. Differences in sampling frames/address files, respondent pools, and especially survey/census durations provide possible explanations for differences in nativity estimates across the data sources under analysis.
With respect to overall coverage of the data sources in question, there exists a mixture of census and survey methodologies used to obtain data. The following sections address some of these differences.
Mode of collection
The nativity data collected in the CPS are acquired in the initial interview of the sample household. That is, questions that concern place of birth, citizenship status and year of entry are asked by a CPS interviewer at the time the household enters the survey. Consequently, nearly all CPS nativity data come from person-to-person interactions, where probing can produce more reliable responses.
The Census 2000 data rely heavily on self-administered mail-back questionnaires from long-form recipients. Enumerators usually contact individuals only if no questionnaire is received by the Census Bureau after a reasonable duration. The General Accounting Office report, "Status of Nonresponse Follow-up and Key Operations" (2000), reports a long-form mail-return rate of 54.1 percent. Measures to acquire missing information include Nonresponse Follow-up (NRFU), Coverage Edit Follow-up (CEFU) and the Coverage Improvement Follow-up (CIFU), which involve enumerator interaction. (These response rates refer to completion of the actual questionnaire; individual item response rates are discussed and presented below.)
The C2SS data were collected by means of mail-back questionnaires, follow-up telephone calls and follow-up personal visits. All sample households received an announcement of their selection as part of the survey, followed by a questionnaire. If the questionnaire was not returned within the time frame specified, a second questionnaire was delivered to the household. Households that did not return questionnaires were subject to Computer Assisted Telephone Interview (CATI) completion of the survey. One of every three remaining non-response households was visited by field interviewers for Computer Assisted Personal Interview (CAPI) completion of the survey. Table 2 presents the mode of data collection for the Census 2000 Supplementary Survey, by nativity. In brief, these figures show that data for the foreign born were more likely to be collected via personal interview - the third and final data collection attempt - than by other methods: 48.4 percent of foreign-born respondents received CAPI data collection, while only 31.4 percent of natives were surveyed in such a manner.
The varying modes of data collection described above, especially concerning potentially sensitive questions of nativity, may contribute to observed differences in estimates.
Question position, format and skip patterns
The sequence of the nativity items (i.e., the order and position of the questions within each questionnaire), the manner in which they are stated, and the skip patterns contained within may influence certain response outcomes. As shown in Table 3, although the order of the nativity items never change in sequence relative to each other, they do shift location within the overall questionnaire across all three instruments. Further, differences exist with respect to question and response wording as well as with certain skip patterns.
Figure 1 presents the actual place-of-birth questions used in each of the three data sources. For Census 2000 and C2SS, the place-of-birth items are nearly identical: "Where was this person born?" followed by two check boxes and fields (or delimited spaces) for the name of the state of birth (if in the United States) or the country of birth (if elsewhere). The March 2000 CPS, however, phrases the question somewhat differently - "In what country were/was __________ born?" - followed by a field in which the interviewer must enter the appropriate country code (derived from adjacent computer screens).
Figure 2 shows that the citizenship item, including question and response options, is identical for the Census 2000 and the C2SS: "Is this person a citizen of the United States?" followed by four citizenship categories and a residual non-citizen category. Again, however, the Current Population Survey differs. The Current Population Survey uses a skip pattern based upon the responses to the place-of-birth questions obviating the citizenship question altogether if either the respondent or his/her parents report the United States as the place of birth. Only those individuals who report a foreign country of birth for self and parents are questioned about citizenship through as many as three questions: "(Are/Is) ... a citizen of the United States?," "(Were/was) ... born a citizen of the United States?," and "Did ... become a citizen of the United States through naturalization?"
Finally, as shown in Figure 3, all three data products share nearly identical year-of-entry items. Posed to all respondents who report a non-U.S. place of birth, the question asks, "When did (this person/_________) come to live in the United States?" followed by spaces for a four-digit year response.
In sum, although Census 2000 and C2SS closely resemble each other concerning nativity item placement and formats, the CPS differs sufficiently, especially with respect to the citizenship item, to warrant caution when making comparisons.
In addition to the mode of overall data collection, individuals may return census and survey questionnaires without responding to certain items. Differences in item response rates could contribute to data incomparability.
Although all three data sources contain similar questions regarding nativity, Table 4 shows that response rates for the nativity items vary across data sources and population sub-groups. Rates for both the entire population and foreign born only are shown. Excluding group quarters, response rates among the total population range from 91.7 (Census 2000) to 99.0 percent (CPS) for the place-of-birth item, a difference of 7.3 percentage points. Response rates for the citizenship item range from 95.7 percent (Census 2000) to 98.8 percent (CPS) among the total population (excluding group quarters). Finally, response rates for the year-of-entry item5 among the foreign born range from 88.0 percent (Census 2000) to 91.9 percent (C2SS).
In sum, among the civilian, non-institutionalized population, response rates appear to be slightly lower for the provisional Census 2000 data than for CPS or C2SS.
Each of the three data sources is maintained by a separate group within the Census Bureau (and, in the case of CPS, the Bureau of Labor Statistics). As a result, decisions regarding data imputation and edits ultimately rest with different decision makers. Owing to the complexity of the edit and imputation processes adopted by each group, this report acknowledges that differences do exist, not only in nativity item specifications, but in other items (race, Hispanic origin, sex) that contribute to the overall DA estimates. Specifically, an examination of the Census 2000 and C2SS edit specifications by the Ethnic and Hispanic Statistics Branch reveals little difference among nativity items. However, owing to the unique skip pattern of the CPS mentioned earlier, the imputation/edit specifications differ significantly from those of the other two data sources (Costanzo, Davis and Malone 2001; Hansen 1994). These differences in data evaluation and reconstruction introduce another source of potential lack of comparability.
Weights are assigned to each record to estimate the number of individuals in the total population represented by the sample data. Differences in weighting techniques may introduce potential sources of incomparability.
Owing to differences in sampling between the CPS and C2SS, differing weighting schemes exist also. Further, as mentioned earlier, the provisional Census 2000 data are a sample of the enumerated population and thus are weighted as well. Details of the weighting process for the provisional Census 2000 data are contained in Appendix C.
Total foreign born
Overall, there is no significant difference in the total foreign born estimated from the three data sources. Table 5 shows a provisional Census 2000 estimate of 30.6 million and a C2SS estimate of 30.5 million, an apparent difference of only 0.1 percent, but not significantly different. Both these figures fall within the 90-percent confidence interval of the March 2000 CPS estimate of 30.1 million.
Although significant differences do exist among counts/estimates of the native population, these figures vary by less than two percent.
Table 6 presents a more detailed look at the three questionnaire items that address the foreign-born population: citizenship, place of birth, and year of entry. Estimates of citizenship status, broad periods of entry and general regions of birth show mixed results. Citizenship status among the foreign born is limited to two groups: those naturalized as U.S. citizens, and those who are not citizens. While figures for the total foreign born do not vary significantly, some differences do exist when disaggregated into these two categories. Specifically, while the Census 2000 and C2SS figures do not differ significantly from each other, they do differ from those of the March 2000 CPS for each citizenship group: a 10.3 percent to 11.3 percent difference among "naturalized" figures; and a 3.7 percent to 4.1 percent difference among "non-citizen" figures.
Table 6 shows also estimates for five general regions of birth for foreign-born individuals: Europe, Asia, Mexico, Other Latin America and (residual) Other Regions. Again, little difference exists between Census 2000 and Census 2000 Supplementary Survey figures, with the exception of Asia - a significant difference of approximately 3.1 percent. A comparison of Census 2000 counts and C2SS estimates with March 2000 CPS shows approximately one-half million more foreign born from Europe, representing significant differences of 11.4 percent and 9.6 percent respectively. Both Census 2000 and C2SS show roughly one-half million fewer foreign born from the residual category, Other Regions, than the March 2000 CPS, significant differences of slightly more than 20 percent. The Census 2000 count of foreign born from Mexico differs significantly from the March 2000 CPS estimate, although the difference of 6.5 percent is just outside the CPS 90-percent confidence interval. Figures for Asia and Other Latin America do not differ significantly between the March 2000 CPS and the other two data sources.
Finally, Table 6 presents broad periods of entry for all foreign born. Most notable among these figures are the large discrepancies among estimates for entries in the year 2000. Recall that each data source involved differing time frames of data collection (see "Limitations: Coverage" above). Differences in year-2000 entries range from 56.1 percent (Census 2000 - March 2000 CPS) to 184.3 percent (C2SS - March 2000 CPS). Among other periods of entry, however, there are no significant differences between Census 2000 counts and March 2000 CPS estimates. C2SS estimates differ significantly from March 2000 CPS figures for the periods 1980-1989 and 1990-1999, although the difference never exceeds 6.3 percent. Significant differences exist for each of the period of entry examined between Census 2000 and C2SS. However, with the exception of year-2000 entries, these differences never exceed 4.5 percent and are likely due to the small standard errors of the C2SS data.
In sum, there is general consistency among estimates the total foreign born population across the three data sources, with some exceptions. Conformity between Census 2000 and C2SS estimates of citizenship status seemingly translate to similar differences with March 2000 CPS estimates: about 10 percent more naturalized citizens compared with about 4 percent fewer non-citizens. Again, continuity between Census 2000 and C2SS estimates of region of birth translate to more individuals born in Europe (roughly one-half million) and fewer born in Other Regions (again, roughly one-half million). Finally, with the exception of year-2000 entries, the broad period-of-entry figures never vary by more than 6.3 percent.
Among the various components that contribute to the overall Demographic Analysis, age, sex, Hispanic origin and race constitute the basis for many estimates of specific foreign-born population types: unauthorized, temporary, legal resident, etc. Therefore, we produce tabulations with respect to these four subject areas.
Table 7 presents the broad age distribution of the foreign-born population for each of the data sources under investigation. Ages are distributed in the form of four broad age categories (as used in the DA estimation process): younger than 18 years old; 18 to 29 years old; 30 to 49 years old; and 50 years and older. No significant differences occur among any of the four age groupings. Furthermore, differences in counts/estimates never exceed 5.4 percent.
Table 7 also shows sex distributions of the foreign-born population by data source. There are no significant differences among the data sets with the exception of the Census 2000 count and the March 2000 CPS estimate of foreign-born females: a difference of 3.5 percent.
Because age and sex are combined in certain elements of the DA estimation process, combined tabulations are also shown in Table 7. There are no significant differences among the various age-sex categories across the three data sources.
We present the distributions of Hispanic Origin among the foreign born in Table 8. While no percentage difference among data source counts/estimates exceeds 3.9 percent, the differences between the Census 2000 Supplementary Survey and the March 2000 CPS estimates are significant. Furthermore, the difference between the Census 2000 count and C2SS estimate of the foreign-born Hispanic population is significant, although the percentage difference is less than 1.0 percent.
Although Census 2000 and Census 2000 Supplementary Survey share similar questions, response options and edit specifications for the race item on the March 2000 CPS differ considerably. Some differences include: (1) multiple race options for Census 2000 and C2SS, but only one race option per respondent in the CPS; (2) the Census 2000 and C2SS separate "Asian" from "Pacific Islander" while the March 2000 CPS combines the two categories; and, (3) the Census 2000 and C2SS provide "Other race" as a possible response option; the CPS does so only after extensive interviewer probing and data edits are exhausted. To help overcome these obstacles, we construct a single race/Hispanic origin variable based on the two items from each data product to permit reasonable comparisons.
Table 8 shows distributions of race/Hispanic origin among the foreign born across the three data sources. As anticipated, large (significant) differences exist among all three "Non-Hispanic Other" figures. Exclusive of this race category, other significant differences are also apparent: Census 2000 and C2SS record nearly 20 percent more "Non-Hispanic Black" respondents than the March 2000 CPS. Further, the foreign-born Non-Hispanic Asian figure for Census 2000 is significantly lower than that of the March 2000 CPS (8.4 percent), possibly owing to the inclusion of Pacific Islanders in the latter estimate. Finally, although largely similar in construction of the race item, Census 2000 and C2SS differ significantly for three of the five race categories: Non-Hispanic White, Non-Hispanic Asian, and Non-Hispanic Other.
Given the size and frequency of these differences, we suggest caution with respect to analyses using the race variable as an estimation component.
Because DA estimates are contingent on more than one dimension of foreign-born characteristics, we conduct additional tabulations controlling for selected nativity indicators: the foreign born entering in the 1990s; those born in Europe; those born in Asia; those born in Mexico; and those born in Other Latin America.
Table 9 presents tabulations of the foreign born who entered before 1990, while Table 10 shows the same distributions of the foreign born who entered 1990 to 1999, per each data source. (Data for those who entered in year 2000 are not presented owing to the substantial discrepancies in coverage that affect the counts/estimates.) These data reflect many of the same characteristics of those for the total foreign-born population: overall precision among estimates, with the exception of "Other Region" place-of-birth, problematic race classifications, and certain younger age groups.
Table 11 reports general statistics for the foreign-born population who report countries in Europe as the place of birth; Table 12 provides the same data for those born in Asian countries; Table 13 for those born in Mexico; and Table 14 for those born in Other Latin American countries. (Data for "Other Regions" are not presented here.) Again, comparisons across data sources mirror those of the total foreign-born population: general conformity in all measures with the exception of year-2000 entries and race classifications.
In general, the (reweighted) March 2000 CPS nativity data are consistent with the new comparison data sources: Census 2000 and the C2SS. Whether examined in total, or across various sub-groups, the tabulations of the foreign-born population generally do not differ significantly. When significant differences exist, they are often minor in magnitude (less than 5 percent difference).
In spite of the general agreement in counts and estimates, caution is advised in certain respects.
As mentioned earlier, year-of-entry data for the year 2000 are highly inconsistent, differing substantially across data sources and within subgroups. We attribute these large discrepancies to the variation in coverage periods among the data products.
Further, race incongruities exist across the three data sources and may prove problematic. Differences in racial classifications across the three data sources should be considered when producing estimates requiring race components.
Moreover, significant differences consistently appear when controlling for the residual place-of-birth category, Other Regions. These incongruities may be the result of differing modes of data acquisition or edit/imputation techniques among the data sources presented.
Finally, although the findings here reflect statistical comparisons of estimates, one should bear in mind that lower and upper bounds will not be the inputs used in the DA estimation procedure. Instead, point estimates of various foreign-born sub-groups will form the basis of many of the components being developed and hence will differ. The adoption of a uniform data source throughout the DA estimation process is advised.
[PDF] or denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® available free from Adobe.