U.S. Census Bureau

Evaluating Components of International Migration:
Quality of Foreign-Born and Hispanic Population Data

Arthur Cresce
Roberto Ramirez
Gregory Spencer

Population Division
U. S. Bureau of the Census
Washington, D.C. 20233

December 2001

Population Division Working Paper No. 65

DISCLAIMER:

This paper reports the results of research and analysis undertaken by Census Bureau Staff. It has undergone a more limited review than official Census Bureau publications. This report is released to inform interested parties of research and to encourage discussion.


Synopsis

On March 1, 2001, the U.S. Census Bureau issued the recommendation of the Executive Steering Committee for A.C.E. Policy (ESCAP) that the Census 2000 Redistricting Data not be adjusted based on the Accuracy and Coverage Evaluation (A.C.E.). By mid-October 2001, the Census Bureau had to recommend whether Census 2000 data should be adjusted for future uses, such as the census long form data products, post-censal population estimates, and demographic survey controls. In order to inform that decision, the ESCAP requested that further research be conducted.

Between March and September 2001, the Demographic Analysis-Population Estimates (DAPE) research project addressed the discrepancy between the demographic analysis data and the A.C.E. adjusted estimates of the population. Specifically, the research examined the historical levels of the components of population change to address the possibility that the 1990 Demographic Analysis understated the national population and assessed whether demographic analysis had not captured the full population growth between 1990 and 2000. Assumptions regarding the components of international migration (specifically, emigration, temporary migration, legal migration, and unauthorized migration) contain the largest uncertainty in the demographic analysis estimates. Therefore, evaluating the components of international migration was a critical activity in the DAPE project.

This report addressed the question: "How do edit and imputation procedures affect the consistency of foreign-born and Hispanic populations?" Comparisons were made between the edit and imputation specifications for the 1990 census and Census 2000 for the questions on place of birth and Hispanic origin to determine what impact, if any, such differences might have had on comparisons of numbers between the censuses. There were few significant differences in the specifications for the question on place of birth. The most significant difference - "hot deck" imputation of specific countries of birth in Census 2000 but not in 1990 - did not affect the overall total of foreign-born people. Regarding the specifications for the Hispanic question, several important differences were noted, the most important of which was the use of surname-assisted "hot decks" in assigning an origin. Overall, the Census 2000 edit and imputation procedures seemed to be more accurate than the 1990 procedures in assigning an origin. The improvement in assigning an origin was assisted by a substantial decline between 1990 and 2000 in the level of nonresponse to the question on Hispanic origin.


Table of Contents

  1. Introduction
  2. Executive Summary
  3. Philosophy of Edit and Imputation Procedures
  4. Comparison of Edit and Imputation Procedures for Place of Birth
    1. Changes to the Hot Deck
    2. Separate Procedures for Group Quarters (GQ)
    3. Availability of Information for Adopted Children
  5. Comparison of Edit and Imputation Procedures for Hispanic Origin
    1. Summary of Differences
    2. Context for Comparing Edit and Imputation Procedures
    3. Impact of Editing on Hispanic Origin Population in 1990
    4. Impact of Editing on Hispanic Origin Population in 2000
  6. Conclusion
  7. Bibliography
  8. Table A and Detailed Tables

Population Division Working Paper Series


Consistency of Edit and Imputation Procedures for the Place of Birth and
Hispanic Origin Questions: 1990 and 2000

  1. Introduction
    The purpose of the Task 11 Team as to answer the following question: "How do edit and imputation procedures affect the consistency of foreign-born and Hispanic populations' data?" We analyzed the edit and imputation procedures from the 1990 census and Census 2000 to answer this question.

  2. Executive Summary
    1. Foreign-Born Population - A comparison of the key differences between the 1990 census and Census 2000 edit specifications for the place of birth question reveals one significant difference. (See Table A.) In 1990, 808,158 people who were imputed as foreign born were not assigned a specific country of birth. Instead, these people were assigned the generic code for "Country of birth not reported." By contrast, people imputed as foreign born in Census 2000 will be assigned a specific country of birth. While this difference does not have an impact on the total foreign-born population, it has a significant impact on comparisons of country of birth totals between 1990 and 2000. Task Team 5 is investigating how people in the "country of birth not reported category" in 1990 were allocated after the fact to a specific country of birth for the purpose of developing population estimates. Other differences in the edit and imputation procedures do not appear to be of sufficient magnitude to warrant further quantitative analysis.

    2. Hispanic Population - Comparison of the 100-percent edit and imputation procedures for the 1990 Census and Census 2000 reveals differences between the two procedures. (See Table A.) In general, the 1990 procedures were not as rigorous as the Census 2000 procedures in assigning an origin. One significant difference between the two procedures specifications is the use of surname-assisted hot decks in Census 2000.

      An extremely important context for understanding the impact of these differences is the fact that the number of allocations for the origin question dropped by 34 percent between 1990 and 2000. This translated into a drop from 25.5 million allocations in 1990 to 16.8 million allocations in 2000. In addition to the drop in overall allocations, there was a fundamental shift in the type of allocation made. In 1990, 75.6 percent of allocations occurred through the "hot deck" (nearest neighbor) method. By contrast, only 41.2 percent of allocations required hot deck allocation in Census 2000. This is an important point, because of the techniques used (imputation based on other information provided by the respondent, allocation from other household members, and hot deck allocation), hot deck allocation is the least reliable. We can attribute this improvement, in large part, to moving the question on origin before the question on race.

      There is strong evidence that the less restrictive 1990 edit and imputation procedures and greater reliance on hot deck allocation, combined with a much higher level of nonresponse to the Hispanic origin question in 1990, may have resulted in "over-editing" at least 161,000 people as Hispanic. Although we did not attempt to run the Census 2000 edit and imputation program on 1990 data, we believe the Census 2000 would have imputed fewer people as Hispanic than did the 1990 program.

  3. Philosophy of Edit and Imputation Procedures1
    In any imputation scheme, imputed values may differ (sometimes significantly) from what would have been obtained had the information been reported by the respondent. Edit and imputation techniques are designed to make the best possible estimate of the probable response given the best information available. For example, if the respondent did not provide an origin, the procedure first checked to determine if the person indicated that he (or she) was Hispanic in the question on race (close to half of Hispanics provided an Hispanic ethnicity in the race question). If an origin could not be obtained from race, then the procedures attempted to allocate an origin from other people in the household (according to a hierarchy of household relationship) under the assumption that people living in the same household would tend to have the same origin. If an origin could not be obtained from within the household, as a last resort, an origin was assigned by hot deck allocation under the assumption that people of the same origin tend to live in close proximity to each other. To the extent that these assumptions do not hold for a given person or household, allocated values might differ from what would have been obtained had the information been obtained directly from the respondent.

    Edit and imputation procedures attempt to rely as much as possible on sources of information about which there is the most confidence (other information provided by the respondent or responses of other household members) and to rely less on last resort procedures such as hot deck allocation. Even with hot decks, efforts are made to improve the accuracy of allocation by matching donors and donees according to one or more key characteristics. For example, in the 1990 census, origin hot decks used race as a matching variable for donors and donees. In contrast, Census 2000 used not only race, but also age and whether the surname was Spanish or not Spanish, as matching variables. We believe these additional variables improved the accuracy of origin allocation from the hot deck.

  4. Comparison of Edit and Imputation Procedures for Place of Birth
    Table A provides a summary of the differences between the 1990 and the Census 2000 edit and imputation procedures for the question on place of birth. An analysis of the differences noted indicates that none of the changes should have significantly affected comparisons in the overall number of native and foreign-born people between 1990 and 2000. The following changes were deemed to have had only a minor impact (if any) on the totals:

    1. Changes to the hot deck
      The age/race/Hispanic controls for the hot deck were revised for the two main hot decks by combining race and Hispanic categories (Hispanic; non-Hispanic White; non-Hispanic Black; non-Hispanic Asian; non-Hispanic Other) rather than a cross tabulation of race and Hispanic origin (Hispanic White; Hispanic Black; Hispanic Other; non-Hispanic White; non-Hispanic Black; non-Hispanic Other). In addition, more hot decks with limited universes (e.g. Puerto Rico and outlying areas only) were used.

    2. Separate procedures for group quarters (GQ)
      The 1990 edit and imputation procedures included the GQ population within the same edit and imputation procedures as those used for the household population. First, the portion of the procedures that attempted to assign a state or country of birth from other information provided by the respondent was imbedded within the procedures used for the household population. Furthermore, people in GQ's needing a state or country of birth from the hot deck obtained one from the same hot deck as that used by the rest of the population. By contrast in Census 2000, the portion of the procedures that attempted to assign a state or country of birth was entirely separate from that for the household population. Furthermore, the Census 2000 procedures use separate hot decks for the GQ population controlled by age and 6 GQ types (correctional institutions, nursing homes, college quarters, military quarters, other institutions, and all other GQ's).

    3. Availability of information for adopted children
      The 1990 edit and imputation procedures distinguished between "natural born or adopted sons or daughters" and "stepsons/stepdaughters" in assigning a state or country of birth. The Census 2000 procedures combined these categories into one category "son/daughter" in attempting to assign a state or country of birth.

    While these changes reflect an attempt to provide a more precise allocation of state or country of birth, they do not appear to be of sufficient importance to affect adversely comparisons of levels of foreign born compared with natives between the two censuses. It is unclear, however, how differences between the edit and imputation procedures may have affected comparisons between specific states or countries of birth for the two censuses. We will need to evaluate these issues when we obtain the long form data in Spring 2002.

    The use of a native or foreign-born check box in the question may have had some impact for prompting people to report a place of birth. However, because the question relies primarily on a write-in entry for appropriate classification as native or foreign born (in fact, the write-in entry takes precedence over the check box), it is not clear that we would have obtained different results because of the check box categories. The check box categories played a role in the edit and imputation procedures when no write-in response was provided, but this role was a rather limited one. When there was no write-in response, a citizenship response, in some instances, was actually given higher weight in assigning a place of birth than the check box response.

    The most important difference between the 1990 and 2000 edit and imputation procedures was in the assignment of a specific country of birth for people not reporting a place of birth who were assigned as foreign born. In 1990, people who were assigned as foreign born were not assigned a specific country of birth. Instead, these people were classified as "Area not reported." By contrast, the edit and imputation procedures for Census 2000 will assign a specific country of birth. While this difference does not affect comparisons of the total foreign born between the two censuses, it does affect any comparison by country of birth. In fact, we had to distribute the "Area not reported" population among countries of birth for intercensal estimates that required detailed country of birth data. Another DAPE task team is analyzing how these distributions were made and will not be discussed further in this report.2

    The allocation rate for the place of birth question in 1990 was 5.4 percent. By contrast, the rate for Census 2000 was 9.0 percent.3 The difference in the level of nonresponse between the two censuses can be explained partially by the fact that the 1990 census had a content edit follow-up operation that attempted to obtain answers from census forms that had more than a pre-specified threshold of questions with no answers. Census 2000 did not implement a content edit follow-up operation. The increased level of nonresponse, however, does not necessarily imply that comparisons of data on specific countries of birth between 1990 and 2000 would be adversely affected, especially given the improvements in Census 2000 edit and imputation procedures and the fact that specific country of birth was not assigned in the 1990 census procedures.

  5. Comparison of Edit and Imputation Procedures for Hispanic Origin

    1. Summary of Differences

      Table A summarizes the key differences between the edit and imputation procedures for the Hispanic origin question in 1990 and 2000. First, while multiple responses were not allowed in either census, Census 2000 allowed for the data capture of more than one response and the edit and imputation procedures assigned one origin. In the case of multiple non-Hispanic or multiple Hispanic responses, a respondent remained non-Hispanic or Hispanic, respectively. However, in the case of a conflicting Hispanic/non-Hispanic response, an attempt was made to resolve this conflict by using other information provided by the respondent (for example, an Hispanic response in the race question), responses of other people in the household, or people living near by who are of the same race.

      Census 2000 edit and imputation procedures also differed from the 1990 procedures in how origin could be assigned from other people in the household. In 1990, anyone in the household could donate an origin regardless of their race. By contrast, Census 2000 rules only allowed other household members to "donate" an origin if the person needing an origin and the donor had the same race.

      One of the most important differences between the two procedures was how "hot deck" allocation was implemented.4 In 1990, hot deck values were stored and assigned by the race of the "donor" and "donee." In Census 2000, the hot decks also were controlled by the race of the donor. However, Census 2000 hot decks also were controlled by four broad age groups.

      More importantly, Census 2000 origin hot decks were further differentiated by whether the donor (and donee) had a Spanish or non-Spanish surname. Use of surname in storing and assigning an origin was one of the most important innovations implemented in Census 2000 in that it allowed a much more precise method for assigning an origin from a hot deck. This innovation was cited in a recent evaluation of having a "profound" impact on the assignment of origin.5

      Finally, if both race and Hispanic origin were not reported, the edit attempted to assign both a race and an origin from another donor (both within household imputation and hot deck allocation). The 1990 procedures assigned race and origin independently of each other, thus increasing the possibility of creating race/origin combinations that were not that common in the population.

    2. Context for Comparing Edit and Imputation Procedures

      Before assessing the impact of these differences on the Hispanic origin population, it is important to understand the differing contexts within which each edit operated. One of the hallmarks of the Hispanic origin question in 1990 was the relatively high level of nonresponse. Table 1 compares the allocation rates6 for Census 2000 and the 1990 census. It is clear from this table that the allocation rate for this question was almost twice as high in 1990 as it was in 2000 (10.4 percent versus 5.6 percent). What is striking is that the range of allocation rates by region narrowed considerably from 1990 to 2000. In 1990, the rates ranged from 7.2 percent in the West to 11.8 percent in the Northeast - a difference of 4.6 percentage points. Among states and the District of Columbia, the range was even wider with Idaho having the lowest percent (4.2 percent) and the District of Columbia having the highest (18.3 percent) - a difference of 14.1 percentage points. In Census 2000, by contrast, the range by region was much narrower, with the Midwest having the lowest rate (4.7 percent) and the South having the highest rate (6.0 percent) - a difference of only 1.3 percentage points. By state, Minnesota had the lowest rate in Census 2000 (4.0 percent), while the District of Columbia had the highest rate (11.0 percent) - a difference of 7.0 percentage points. It is clear that the biggest improvement in these rates occurred for states that had high allocation rates in 1990. This dramatic improvement in response can be attributed in large part to the placement of the Hispanic question before the question on race in Census 2000.

      Tables 2-7 show the impact of the higher level of nonresponse to the origin question in the 1990 census.7 Table 2 shows that at the national level, hot deck allocation was the largest source of origin response after "reported origin." This means that for a significant proportion of the population (8.5 percent), no one in the household answered the Hispanic origin question. This relationship held for all states.

      Table 3 shows that for the 1990 Hispanic population alone, there was about equal reliance on "within household" and "hot deck" allocation, with some regions and states having a higher proportion of within-household allocation. This is not surprising since the question is primarily oriented to the Hispanic population. Table 4, by contrast, shows that for non-Hispanics, the proportion of responses coming from hot deck allocation was much higher than that from within household allocation. Tables 5-7 show the distribution of allocated responses by source of allocation and support the same conclusions but from a slightly different perspective.

      One of the most important changes made to the Hispanic origin question in Census 2000 to address the problem of nonresponse was to shift the order of the Hispanic origin and race questions. In the 1990 census, the race question appeared first and the Hispanic origin question appeared several questions later. It seems clear that after answering the question on race, many people felt that the Hispanic origin question did not apply and simply skipped the question. Shifting the order of the questions in tests conducted before Census 2000 seemed to improve overall response to the Hispanic origin question with some increased nonresponse to the question on race.

      Table 1 and Tables 8-13 show very clearly that not only the level of nonresponse was reduced but also that the relative contribution of within household and hot deck allocation was much more balanced for non-Hispanics in Census 2000 than in the 1990 census. More importantly, allocation from surname-assisted hot decks overall was greater than allocation from non-surname-assisted hot decks (Tables 8-10). Table 10, in particular, shows that for non-Hispanics, allocation from surname-assisted hot decks was about three times the level of allocation from non-surname assisted hot decks (2.0 percent compared to 0.6 percent).

      The impact of surname-assisted programs is clearly more dramatic when observing the source of allocations in Tables 11-13. Overall, surname-assisted hot decks represented 31.4 percent of all allocations, while non-surname assisted hot decks accounted for only 9.6 percent of all allocations. For Hispanic allocations, surname-assisted hot decks overall represented 8.1 percent of all allocations while non-surname assisted hot decks represented about 4.0 percent. For non-Hispanics, surname assisted hot decks provided 36.9 percent of all allocations, while non-surname assisted hot decks provided only 10.9 percent of all allocations (Table 13). In some states where the proportion of Hispanics is very low (such as Alabama, Georgia, Mississippi, North Carolina, South Carolina, and West Virginia), the proportion of people receiving an origin from a surname-assisted hot deck is five times the proportion receiving an origin from a non-surname assisted hot deck.

      It is clear from Tables 2-13 that there was a significant increase in Census 2000 in the level of substitution, from 0.7 percent of the population in households in 1990 to 1.2 percent of the total population in Census 2000 (Tables 2 and 8). Substitution occurs when there are no data for anyone in the housing unit, and we use data from a neighboring household of similar size, using the hot deck method, to allocate characteristics for the people in that housing unit. Given that the same basic method was used in both censuses, there is no reason to believe that the procedure itself created any upward or downward bias in assigning origin in 1990 and 2000.

      Tables 9 and 10 show that the percent substituted is slightly higher for the Hispanic population (1.6 percent) than for the non-Hispanic population (1.2 percent). There was a similar pattern in 1990, however, but at a lower level. Tables 3 and 4 show that in 1990 the percent substituted for the Hispanic population (0.9 percent) was again slightly higher than that for the non-Hispanic population (0.6 percent). In addition, it is also clear that substitution played a much larger role in the source of allocation of origin in 2000, with substitution constituting about 20 percent of allocations overall. Interestingly, as shown in Tables 12 and 13, the share of substitution was higher for the non-Hispanic population (21.1 percent) than for the Hispanic population (17.5 percent). By contrast, Tables 6 and 7 show that in1990 the share of substitution in total allocations was much higher for Hispanics (11.0 percent) than for non-Hispanics (5.9 percent). The reasons for the increase in substitution will be part of the Census Bureau's evaluation of Census 2000.

      Finally, to put all these results in a broader perspective by including the results from the Census 2000 Supplemental Survey (C2SS), Tables 14-16 show that the trend toward improved response to the origin question is continuing. Editing procedures were basically the same for Census 2000 and the C2SS, except that there was no substitution in the C2SS. Table 14, in particular, shows that allocation rates are lower for the total population and for the Hispanic and non-Hispanic populations in the C2SS than in Census 2000 and in 1990. Table 15 shows an even greater reliance on surname assisted hot decks in the C2SS, with Table 16 showing a much greater reliance on surname assisted hot decks for the non- Hispanic population than for the Hispanic population. It should be noted, however, that the level of response in C2SS was improved through the use of field follow-up procedures for people who did not fully answer the questions on the questionnaire, a procedure that was not used in Census 2000.

    3. Impact of Editing on Hispanic Origin Population in 1990

      In the 1990 census, there was an unusually high level of dependence on hot deck allocation because many of the people needing an imputed origin had no reported origin for anyone in the household. This greater reliance on hot deck allocation, combined with a relatively high level of nonresponse, meant that most allocations came from the hot deck, especially for the non-Hispanic population. For example, 75.6 percent of non-Hispanic allocations came from a hot deck, excluding substitutions. By contrast, only 29.9 percent of Hispanic allocations came from a hot deck (Tables 5 and 6), again excluding substitutions. This reflects the fact that the 1990 census hot decks matched donors and donees by their race, but did not match by age and by whether the donee had a Spanish or non-Spanish surname as did Census 2000 origin hot decks.

      Concerns about the impact of 1990 edit and imputation procedures emerged when the results of the sample data processing, including a separate edit and imputation for sample questionnaires, became available. The Hispanic origin question on the sample form was edited in sample processing independent of the 100-percent edit and imputation program. Although the basic structure of the two procedures were the same, the edit and imputation procedures for the Hispanic origin question during sample processing differed in a very important way from those used in 100-percent processing. Unlike the 100-percent procedures, sample procedures made use of the rich source of ethnic-related questions from the sample form (ancestry, place of birth, language spoken at home) that could assist in imputing for nonresponse. The use of ethnic-related information, combined with a higher response rate for the Hispanic origin question on the sample form, meant a much lower dependence on hot deck allocation.

      The estimate of the Hispanic origin population that resulted from sample processing was about 454,000 below the total of Hispanics obtained from 100-percent processing with the 100-percent total exceeding the sample estimate for most states. This difference existed despite the fact that sample estimates were controlled to 100-percent totals, including race and Hispanic origin.8

      Thompson (1991) addressed this difference and the difference between 100-percent totals and sample estimates for the American Indian population. He noted that the difference for the Hispanic population could be attributed to three factors: 1) weighting procedures; 2) a form of allocation bias; and 3) sample processing. Thompson attributed the difference between 100-percent totals and sample estimates primarily to undersampling of Hispanics and to a form of allocation bias. He also attributed part of the difference to different data processing procedures.9 His analysis, however, did not quantify how much each factor contributed to this difference.

      The "allocation bias" to which Thompson's analysis refers is directly related to the focus of this analysis. Thompson noted that the nonresponse for the Hispanic question on the short form was 10 percent while the nonresponse rate for the same question on the sample form was only 4 percent. This difference was due partly to the fact that during data collection all sample forms were subject to content edit follow-up (field follow-up of cases where the number of non-reported items exceeded a certain threshold). By contrast, only 10 percent of short forms were subject to content edit follow-up.

      Thompson reasoned that Hispanics were more likely to answer the Hispanic origin question than were non-Hispanics, making the donor pool more heavily Hispanic than it would have been had both Hispanics and non-Hispanics reported. If the nonresponse rate for the Hispanic question was high, there was an increased risk that an Hispanic origin would be disproportionately assigned. Evidence of this comes from Del Pinal (1994) who noted that the 1990 edit and imputation procedures tended to increase the overlap between various racial groups and the Hispanic population. For example, although there were very few Black Mexican origin persons, about 62 percent of Black Mexicans were created by the edit and imputation procedures.10 Not surprisingly, the Black population had a much higher nonresponse rate (18.4 percent) in the Hispanic origin question than did the White population (9.6 percent). (See Table 17.) The corresponding nonresponse rates for American Indians and Alaska Natives and Asians and Pacific Islanders were 10.2 percent and 9.7 percent, respectively. All these rates were still much higher than the nonresponse rates for other 100-percent questions such as race, age, gender and household relationship - all of which had nonresponse rates below 3 percent - and increased the possibility of a misallocation of respondents as Hispanic. To give a sense of the potential impact on the data, a net misallocation of only 0.1 percent of nonresponses as Hispanic out of a total of 24 million needing an origin would result in a net increase of 240,000 Hispanics.

      To attempt to quantify at some minimal level the impact of the potential misallocation of responses as Hispanic, we obtained records from the sample edited detailed file (SEDF) for 1990. On these records, we had not only the origin value from sample processing (along with its allocation flag to indicate whether the value was reported or imputed) but also the origin value from 100-percent processing along with its corresponding allocation flag. In particular, we were interested in determining how people who received an allocated origin in the 100-percent edit had their origin allocated in the sample edit. For the purposes of this analysis, the results of the sample edit are considered the standard for accuracy because sample editing procedures made use of data from additional ethnic-related questions (ancestry, place of birth, and language spoken at home) not available on the short form.

      Table 18 shows that, overall, the 100-percent edit produced a net of about 181,000 more Hispanics than did the sample edit when origin was allocated both in 100-percent and sample editing procedures. This net difference in edit outcomes represented only 2.1 percent of the 8.6 million people for whom origin was allocated in both 100-percent and sample processing.

      If we take into consideration also the situations in which we imputed a value in the 100-percent procedures but did not impute a value in the sample procedures, the 100-percent edit produced a net overall of about 161,000 more Hispanics than did the sample procedures.11 Assuming that the sample edit and imputation process is more accurate, the 100-percent edit appears to have imputed as Hispanic a net total of 161,000 people who were probably not Hispanic. However, this number represents only 1.8 percent of all people whose origin was imputed. It is also important to keep in mind that both edit procedures agreed on the edit outcome 96 percent of the time.

      It is clear from this table that the impact of this potential misallocation is different by race. The apparent degree of over-editing of Hispanics (as measured by taking the ratio of "Hispanic-100%; Not Hispanic - Sample" to "Not Hispanic - 100%; Hispanic - Sample") appeared to be much greater for Blacks (10.0) and Asian and Pacific Islanders (13.1) than for Whites (4.4). Analysis of the unweighted data shows the same pattern, but slightly lower ratios for each group. This finding is consistent with Del Pinal's finding that certain race/Hispanic combinations were more significantly affected by the editing procedures.

      It is important to keep in mind that the estimate of 161,000 is probably a lower bound because these data were obtained from sample forms that had a lower nonresponse rate and had much more ethnic-related information than did short form questionnaires. It is possible that the level of misallocation would be higher among the population that received only the short form, which experienced a higher nonresponse rate for origin than did the sample form. However, it is unlikely that the upper bound would be as high as the difference between the 100-percent and sample totals (454,000) because: 1) sample processing changed about 262,000 responses from "Other Spanish/Hispanic" to not Hispanic12 and 2) to an unknown degree there was undersampling of Hispanics for which the sample weighting procedures did not compensate.

      It is also very important to keep in mind that the impact on the overall total Hispanic population was very small. Overall, this net difference (161,000) represented only 0.7 percent of the total Hispanic population.

    4. Impact of Edit and Imputation Procedures on Hispanic Origin Population in Census 2000

      There are no comparable data available at this time from Census 2000 to perform the same type of analysis that was conducted on the 1990 census edit and imputation procedures. However, it is very clear that the Census 2000 procedures operated in an environment that was profoundly different from that in which the 1990 procedures operated. Significantly reduced nonresponse to the question, combined with more restrictions on the conditions under which origin could be assigned to an individual, probably has led to much lower level of erroneous imputations as Hispanic (or non-Hispanic).13 At the same time, innovations, such as the surname-assisted hot deck, has improved the accuracy and, therefore, the quality of data from the Hispanic origin question.

  6. Conclusion
    From the information provided above we have come to the following conclusions:
    1. There is no evidence based on a comparison of edit and imputation procedures from Census 2000 and the 1990 census for the place of birth question to conclude that differences in the procedures would have explained differences in the overall total of the foreign born population in 2000 and in 1990. There were some changes in the edit and imputation procedures between the two censuses, but none of these would have had any significant impact on the overall total of the foreign born.
    2. There were some significant differences in the edit and imputation procedures between the two censuses for the Hispanic origin question. The most important of these was the use of surname-assisted hot decks in Census 2000. These hot decks allowed for much greater precision in assigning an origin from neighboring housing units when no one in the household answered the question. Furthermore, there was a dramatic improvement in response to the Hispanic question in Census 2000, thus reducing the need (relative to 1990) for providing a response through edit and imputation procedures. In fact, there is evidence from 1990 that the combination of higher nonresponse, greater use of hot deck procedures, and lack of the benefit of surname-assisted hot deck procedures (surname capture was not done in 1990 for all census forms) led to some over-editing of people as Hispanic.

      We will continue our analysis of the quality of Census 2000 origin data as sample data and data from other evaluation studies become available.


1 In this report, "edit" refers to revising or imputing a response based on information provided by the respondent himself or herself. "Imputation," also used interchangeably with the term "allocation," refers to imputing a response based on the response of other people in the same household or the response of people in neighboring households.
2 This issue is one of the topics being analyzed by Task Team 5.
3 This percent is based on a file containing the results of automated coding (excluding any manual coding) of place of birth responses and using modified editing and weighting procedures to obtain a preliminary estimate of the native and foreign born populations. Official sample data will be available in Spring 2002.
4 "Hot deck" allocation involves the assignment of values from a set of stored values that are constantly updated as each person's data record is processed. A hot deck is usually the procedure of last resort when a value cannot be assigned either from information provided by the person or from other people in the household. In the case of race and origin , hot deck imputation is used most often when there no one in the household has provided a response to a particular question.
5 Summary provided by Yves Thibbaudeau, Statistical Research Division, March 31, 1999 concerning evaluation of editing of origin in the 1998 Census Dress Rehearsal.
6 Allocation rates represent the rate at which responses were imputed based on responses of others within the household or from people living nearby (also called "hot deck" imputation).
7 The universe for these tables is the population in housing units and excludes the population in group quarters.
8 Although efforts are made to control the weighting by race and Hispanic origin in each weighting area, there is no guarantee that these weighting control totals can be maintained in each area because each control total in the weighting matrix had to meet a certain minimum threshold. Those totals not meeting the threshold were merged with other totals according to a pre-determined collapsing sequence.
9 In 1990 processing for the Hispanic origin question, only optical marks, but no write-in responses were captured. Thus, people who provided a write-in response but did not fill the "Other Hispanic" circle were treated as a nonresponse in the 100-percent edit and could have been assigned either as Hispanic or not Hispanic. People who provided a write-in response and marked the "Other Spanish/Hispanic" circle would have been identified as "Other Spanish/Hispanic" in the 100-percent edit and then either as Hispanic or not Hispanic in the sample edit depending on whether the write-in response was Hispanic or not Hispanic in sample coding operations.
10 The percentages and rates in this paragraph were derived from special 1990 files containing only household records and excludes records from the group quarters population (such as college dorms, prisons, military bases, and nursing homes).
11 This was possible because we only captured optical marks in the 100-percent data processing and a person could have written in a response without marking any circles. Although the write-in entry could have been either an Hispanic or a non-Hispanic entry, most of the time the entry was Hispanic.
12 Based on the fact that the respondent provided a non-Hispanic response in the write-in space.
13 Another example of this is how we handled situations in which a respondent indicated that he or she was Hispanic and non-Hispanic. This situation occurred about 700,000 times nationally. Instead of simply assuming that all such people should be Hispanic, we looked at information provided by the respondent (such as the reporting of an Hispanic origin in race), information provided by others in the household, and ultimately by the hot deck, to adjudicate these situations. As it turned out, about half of the people were assigned as Hispanic and half were assigned as not Hispanic.


  1. Bibliography

    Sources cited in report:

    Del Pinal, Jorge. "Social Science Principles: Forming Race-Ethnic Categories for Policy Analysis." Paper presented at the "Workshop on Race and Ethnicity Classification: An Assessment of the Federal Standard for Race and Ethnicity Classification," National Research Council, Commission on Behavioral and Social Sciences and Education, Committee on National Statistics, February 18, 1994.

    Thompson, John H. "Difference Between Complete Count Figures and Sample Estimates by Race/Origin." Memorandum from John H. Thompson, Chief, Statistical Support Division to Charles D. Jones, Associate Director, Decennial Census, December 19, 1991.

    Demographic Analysis-Population Estimates (DAPE) Research Project Reports Related to Evaluating Components of International Migration (in order of Working Paper Series Number):

    Deardorff, K. and L. Blumerman. 2001. Evaluating Components of International Migration: Estimates of the Foreign-Born Population by Migrant Status: 2000. (Population Division Working Paper #58) (December 2001) U.S. Census Bureau.

    Perry, M., B. Van der Vate, L. Auman, and K. Morris. 2001. Evaluating Components of International Migration: Legal Migrants. (Population Division Working Paper #59) (December 2001) U.S. Census Bureau.

    Cassidy, R. and L. Pearson. 2001. Evaluating Components of International Migration: Temporary (Legal) Migrants. (Population Division Working Paper #60) (December 2001) U.S. Census Bureau.

    Costanzo, J., C. Davis, C. Irazi, D. Goodkind, R. Ramirez. 2001. Evaluating Components of International Migration: The Residual Foreign Born. (Population Division Working Paper #61) (December 2001) U.S. Census Bureau.

    Mulder, T., B. Guzmán, and A. Brittingham. 2001. Evaluating Components of International Migration: Foreign-Born Emigrants. (Population Division Working Paper #62) (December 2001) U.S. Census Bureau.

    Gibbs, J., G. Harper, M. Rubin, H. Shin. 2001. Evaluating Components of International Migration: Native Emigrants. (Population Division Working Paper #63) (December 2001) U.S. Census Bureau.

    Christenson, M. 2001. Evaluating Components of International Migration: Migration Between Puerto Rico and the United States. (Population Division Working Paper #64) (December 2001) U.S. Census Bureau.

    Cresce, A., R. Ramirez, and G. Spencer. 2001. Evaluating Components of International Migration: Quality of Foreign-Born and Hispanic Population Data. (Population Division Working Paper #65) (December 2001) U.S. Census Bureau.

    Malone, N. 2001. Evaluating Components of International Migration: Consistency of 2000 Nativity Data. (Population Division Working Paper #66) (December 2001) U.S. Census Bureau.

  2. Table A and Detailed Tables

    Table 1. Total Allocation Rates for the Hispanic Question for the United States, Regions, and States: 1990 and 2000.
    PDF (63k) | XLS (30k) | CSV (4k)
    Table 2. Total Household Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (66k) | XLS (31k) | CSV (5k)
    Table 3. Total Hispanic Household Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (66k) | XLS (31k) | CSV (5k)
    Table 4. Total Non-Hispanic Household Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (65k) | XLS (29k) | CSV (5k)
    Table 5. Total Allocation counts for the Hispanic Origin Question by Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (66k) | XLS (29k) | CSV (5k)
    Table 6. Total Allocation counts for Hispanics by Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (65k) | XLS (29k) | CSV (5k)
    Table 7. Total Allocation counts for Non-Hispanics by Type of Allocation Flag for the United States, Regions, and States: 1990.
    PDF (65k) | XLS (28k) | CSV (5k)
    Table 8. Total Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (67k) | XLS (33k) | CSV (6k)
    Table 9. Total Hispanic Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (67k) | XLS (33k) | CSV (6k)
    Table 10. Total Non-Hispanic Population for the Hispanic Origin Question by Allocation Status and Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (66k) | XLS (32k) | CSV (6k)
    Table 11. Total Allocation counts for the Hispanic Origin Question by Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (67k) | XLS (32k) | CSV (6k)
    Table 12. Total Allocation counts for Hispanics by Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (66k) | XLS (32k) | CSV (6k)
    Table 13. Total Allocation counts for Non-Hispanics by Type of Allocation Flag for the United States, Regions, and States: 2000.
    PDF (66k) | XLS (31k) | CSV (6k)
    Table 14. Allocation Rates by Type of Hispanic Origin for Census 2000, Census 1990, and Census 2000 Supplemental Survey, for the United States.
    PDF (42k) | XLS (17k) | CSV (1k)
    Table 15. Total Edit and Allocation Counts by Type of Allocation Flag for Census 2000, Census 1990 and Census 2000 Supplemental Survey, for the United States.
    PDF (44k) | XLS (18k) | CSV (1k)
    Table 16. Total Edit and Allocation Counts by Type of Hispanic Origin and by Type of Allocation Flag for Census 2000, Census 1990 and Census 2000 Supplemental Survey, for the United States.
    PDF (44k) | XLS (19k) | CSV (2k)
    Table 17. Allocation Rates for the Hispanic Origin Question by Race for the United States: 1990 Census.
    PDF (40k) | XLS (16k) | CSV (1k)
    Table 18. Allocation of Origin - 100% Edit Outcome vs Sample Edit Outcome for the United States: 1990 Census.
    PDF (49k) | XLS (27k) | CSV (4k)
    Table A. Differences Between Census 2000 and 1990 Census Edit and Imputation Procedures for the Questions on Place of Birth and Hispanic Origin.
    PDF (72k) | XLS (19k) | CSV (3k)


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.


Source: U.S. Census Bureau, Population Division
Authors: Arthur Cresce, Roberto Ramirez, Gregory Spencer

Questions? / 1-866-758-1060
Created: July 1, 2002
Last Revised: October 31, 2011 at 10:03:08 PM


This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.