Tracking Hispanic Ethnicity: Evaluation of Current Population Survey Data Quality for the Question on Hispanic Origin, 1971 to 2004
Dianne Schmidley and Arthur Cresce
In 1992, the United States Census Bureau and Statistics Canada held a joint conference on the topic of ethnicity in Ottawa, Canada. Among other aspects of ethnicity, the speakers attending the conference shared various national experiences with conceptualizing and measuring ethnicity; assessing the need for ethnicity data in their respective countries; and describing the social-political contexts associated with the collection and use of ethnicity data. 
For several decades, the Census Bureau has attempted to operationally define and measure the population referred to as Spanish, Hispanic or Latino, an ethnic category first recognized officially by the Office of Management and Budget of the U.S. government in the 1970s.  This paper describes an evaluation of the Hispanic data reported in the Current Population Survey (CPS) over the past thirty-five years (1969 to 2004), while focusing in particular on events transpiring during and since 2000. The paper also complements a discussion the authors initiated concerning the Census Bureau’s experience with Hispanic origin data reported in the decennial censuses. 
1. Overview of our approach
In the next sections, we provide a brief overview of the mechanics of the CPS which has measured Hispanic ethnicity since 1969. In this part of our paper we focus on various aspects of the survey that directly affect Hispanic data results. We then present an overview of the Census Bureau experience with CPS Hispanic origin data collected between 1969 when the CPS first asked an ethnic origin or descent question through 2002, when the Census Bureau introduced specific Hispanic origin questions designed to approximate the Hispanic origin question asked in Census 2000. In this part of our paper we refer to the shifts that took place in the CPS Hispanic data over the thirty-five year period as the Census Bureau calibrated CPS data to match the census results, which reflected Hispanic population growth decade by decade. We continue the paper by comparing CPS Hispanic origin data with results from other sources. In the last segments of the paper we focus on how the CPS has changed since 2002 following the introduction of the new CPS Hispanic origin question. 
Specifically, in this paper we:
- Assess the Internal consistency of CPS reporting;
- Compare CPS data with data from other sources such as Census 2000. 
2. Data sources
We used the following resources to carry out our research (goal in parentheses):
- CPS annual demographic supplement files as well as printed reports and tables --1969 to 2004; decennial census results; immigration data. (examine change over time);
- A specially prepared file containing matched cases showing Hispanic origin data collected in Census 2000 and CPS information collected February through May of 2000 (gauge effects associated with differing data collection modes and question design);
- A file containing CPS cases interviewed in a special supplement fielded in May 2002 that included both the old (pre-2002) CPS ethnic origin question and the new (2002) CPS Hispanic origin questions (specifically gauge question effect);
- 2000 CPS March supplement cases (ASEC) weighted with 1990-based and 2000-based second stage weights (assess Census 2000 weighting changes); and
- 2003 and 2004 Basic CPS and ASEC data.
3. What is ethnicity?
For over a century, social scientists have struggled to define and measure the phenomenon called ethnicity. Speakers attending the Joint Canada-United States Conference on the Measurement of Ethnicity in April 1992  observed that anthropologists, sociologists, historians, demographers, and other researchers have all viewed the concept of ethnicity somewhat differently. While social psychologists and anthropologists have attempted to measure ethnicity using attitudinal scales and other instruments designed to assess mental processes associated with self concept, demographers have been more inclined to use: 1) external empirical criteria shown to be linked to the concept of ethnicity, such as place of birth, birthplace of parents (parential nativity), language spoken, or surname  , as well as 2) self-reported information. In this paper, we provide data on ethnicity based primarily on self-reported information.
4. How does the Current Population Survey work? 
Before we begin the discussion of our findings and conclusions regarding CPS Hispanic origin data, it is important to review some important characteristics of the CPS which play a role in the development of Hispanic data.
4.1. CPS sample universe.
The Bureau of Labor Statistics (BLS) is the primary sponsor of the CPS and refers to it as the 'Household Survey' in publications such as Employment and Earnings. In published reports, the Census Bureau states that the CPS universe is the civilian noninstitutionalized population. The Current Population Survey Annual Social and Economic Supplement (ASEC) universe also includes households with military members who live off post or on post with their families as long as one civilian adult lives in the housing unit.  Although it is probably easier for the lay person to think of the CPS as a household survey as opposed to a survey of the civilian noninstitutionalized population, there are a few caveats associated with either classification.
Residents of the United States live in either households or group quarters (GQ).  The GQ population can be categorized as institutionalized or noninstitutionalized and civilian or military. People living in relatively homogeneous group quarters circumstances, such as soldiers in military barracks, patients in nursing homes, and incarcerated prisoners, are relatively easy to exclude from the civilian noninstitutionalized population. However, other population groups such as households with military members, college students in dorm rooms whose usual place of residence is a parental home, or the staff of prisons and hospitals who live in census defined special places are more difficult to classify.
Estimates of the Hispanic population reflected in the CPS somewhat understate the resident population reflected in censuses, since the CPS does not include people living in institutional group quarters such as nursing homes, and correctional institutions. 
4.2. CPS sample design - Selected aspects. 
The CPS sample design is fully described elsewhere.  However, it is important to note a few aspects of the design that affect CPS Hispanic data (particular effects are discussed in more detail below):
- The CPS is a multistage probability sample of housing units in the U.S. The Annual Social and Economic Supplement (ASEC) includes additional sample to increase the precision of derived estimates associated with the Hispanic origin population. Neither the basic (monthly) or ASEC sample specifically target groups for Hispanic detail such as Mexican, Puerto Rican, Cuban, etc.
- CPS sample data are weighted to universe levels through a multistage process. The initial stage is based on the inverse proportion of the sampling probabilities. The last stage involves a ratio adjustment process where survey estimates are controlled to independent demographic estimates based on selected characteristics such as age, sex, race, and Hispanic origin. The last stage weights are re-calibrated after every decennial census.
- CPS sample frame and stratification levels are based on the geographic distribution of the population as well as socioeconomic data drawn from the last census. Groups such as the total Hispanic origin population are targeted in the sample design strata and are therefore well represented from month to month although they are relatively non-randomly distributed across the United States.
- On the other hand, depending on the size and distribution of their populations, the samples of detailed Hispanic groups may fluctuate more than the total Hispanic figures owing to the rotation of CPS sample panels. (Table 1, Table 2; Figure 1 and Figure 2).
- Each monthly CPS sample contains eight rotation panels, and every household in the survey is assigned to a specific panel. Each panel is rotated in for 4 consecutive months, out for 8 months, and back in for 4 months over a 16 month period, and then replaced. In any given month, one of the household panels is interviewed for the first time, one for the second time, and so on, up to eight. The CPS design includes a 75 percent overlap in the sample addresses from month to month and a 50 percent overlap from year to year for the same month, a feature that reduces sampling error for month-to-month and year-to-year comparisons.
4.3. CPS sample weighting.
The CPS is a "controlled" survey through which the Census Bureau transforms sample counts into national population totals in several stages.  The initial stage of weighting is done at the household level when base-weights are assigned to sample cases (a weight equal to the inverse of the case's probability of selection). The next major step in the primary stage of weighting the sample data is to inflate the base-weights by an average of about 6 to 7 percent to account for non-interview households (units eligible for interview but not actually interviewed).
The second stage of weighting involves individual person cases. This step is designed to compensate for deficiencies resulting from survey under-coverage of the sample frame by controlling the first-stage weighted sample data to demographic estimates derived from combining census and administrative records data. Second-stage weights are based on three distributions derived independently of the CPS:
- State of residence;
- Age, sex, and Hispanic origin; and
- Age, sex, and race.
The independent values from the demographic estimates used to weight the survey are benchmarked to the previous census. 
5. How has the Census Bureau conceptualized and measured ethnicity using the CPS?
5.1. Hispanic origin questions.
The influx of refugees from Cuba beginning in the early 1960s, as well as subsequent changes in U.S. immigration law in 1965, substantially changed the composition of the U.S. foreign-born population.  In the late 1960s, it became apparent to demographers reviewing administrative data, such as vital statistics and immigration data, that the volume of residents with a non-European background had shifted dramatically. After examining their findings, OMB advised the Census Bureau to use the CPS to pilot test a subjective origin or "descent" question designed to measure the ethnic composition of the U.S. household population (Question examples in Appendix A).  To obtain a basis for comparing the validity of the “Origin” question, the CPS also asked questions about demographic characteristics known to be highly correlated with ethnic identity such as birthplace, parental birthplace, and language.
After the initial attempt to identify and measure the ethnic composition of the population using the November 1969 CPS, the Census Bureau decided to add a specific "Spanish origin" or "Descent" question to the 1970 decennial census questionnaire (Appendix A). As a result, the 1970 Census provided Spanish/Hispanic population data from several sources including : (1) a language question; (2) an origin or descent question; (3) Spanish surnames from a surname code list; (4) birthplace and (5) parental birthplace questions. 
Following the census of 1970, the Census Bureau continued to use the CPS origin/descent question fielded in 1969 to collect Spanish data, however, the birthplace questions were not included again in the CPS until 1994, when "place of birth" as well as "mother's place of birth" and "father's place of birth" questions were added to the core or basic CPS questionnaire. 
5.2. CPS Hispanic data changes since 2000.
5.2.1. The new Hispanic origin question
In January 2003, the CPS began to produce results from a set of new Hispanic origin questions added to the CPS in 2002 (Appendix A). Prior to this, CPS Hispanic data had been derived from the "origin or descent" question described above. That is, during the years 1971 to 2002, Hispanic data were not produced by a direct question about Hispanic ethnicity, but rather by combining selected responses to a more general ethnic question.  On the other hand, the new Hispanic question(s) specifically asks “Are you/Is....Spanish, Hispanic, or Latino?” Persons responding “yes” are then asked a subsequent question, “Are you/Is....Mexican, Mexican-American, Chicano, Puerto Rican, Cuban, Cuban-American, or some other Spanish, Hispanic, or Latino group?” thus naming the groups identified in the old descent question. A probe question is used to elicit more specific information about people responding affirmatively to the “Other” category.  The interviewer asks the probe question using a flash card containing a listing of 42 possible responses (Appendix A).
The Hispanic detailed groups historically listed in CPS data products from the Census Bureau have included census categories in use since the 1970 Census (Mexican, Puerto Rican, Cuban, Central or South American, and Other Hispanic). (Census data in Table A, CPS detail in Table 1, see Figures 1 and 2). Beginning January 2003, the new Hispanic origin questions included in the CPS made the addition of more specific Hispanic categories in Census Bureau Current Population Reports feasible.
Table A. Detailed Hispanic Categories from the Census: 1970 to 1990
|1990||13.4 million||1.1 million||2.6 million||21.9 million|
|1980||8.7 million||803,226||2.0 million||14.6 million|
|1970||4.5 million||544,600||1.4 million||9.1 million|
Source: Census Bureau (2002), Working Paper No. 56 (Gibson and Jung)
As a result, it is now possible to show the population totals as well as social and economic information for additional detailed Hispanic categories such as: "Dominican", "Salvadoran", "Other Central American", and South American" in CPS products (Table 2).
5.2.2. Results from Changes in Hispanic Question Wording tied to Nativity.
18.104.22.168 Natives more likely than foreign born to change reporting behavior.
A research file from the May 2002 CPS supplement, containing responses from the old "origin or descent" question along with responses to the new Hispanic origin question(s) for the same person, reveals that the new question elicits a greater response of Hispanic origin (37.3 million) than does the old question (35.5 million) while the reverse appears to be true for "Other Hispanic" which seems to be higher for the old question (2.4 million) than the new question (2.2 million). (Table 3-A.)
A more in-depth analysis of data from this matched file indicates that the increase in the number of Hispanic responses appears to be coming from the native population, about 22.8 million for the new question versus 21.0 million for the old question, for a difference of about two million compared with 14.6 million versus 14.5 million respectively for the foreign-born population.  ( Table 3-B and Table 3-C).
Although the questions seem to have produced no meaningful difference in the total number of Hispanics for the foreign-born, further investigation shows that the proportion of the foreign-born Hispanics who reported "Other Hispanic" in response to the old question declined from 4.3 percent to 1.9 percent in response to the new question. In comparison, native Hispanics reported more similar proportions (6.4 percent versus 5.9 percent).
The 2002 file also indicates that among the 2.2 million persons reporting "Other' Hispanic in response to the new CPS question, about 31.8 percent previously had reported "Not Hispanic" to the old CPS question. Most of the respondents (89.8 percent) who switched from "Not Hispanic" to "Other Hispanic" were native. In fact, 84.3 percent of the people who reported "Other Hispanic" to the new question were native.
22.214.171.124. Detailed Hispanic responses increased among the foreign born.
The new Hispanic question elicits a higher degree of reporting of specific Hispanic groups than does the "old" origin or descent question.  It also seems as though the new question format may have been more likely to increase detailed origin reporting among foreign-born Hispanics.
Given the shift described in Section 126.96.36.199 concerning the decrease in people reporting "Other Hispanic", evidence from the matched file also shows that the new Hispanic question(s) led more people to report a specific Hispanic origin group, than did the old origin question. Many of those reporting "Other Hispanic" in response to the old question, provided a more detailed response to the second of the new questions which allows the respondent 42 choices, several of which are non-Hispanic. For example, Table 3-A shows that among those who said they were "Dominican" in response to the new question 44.8 percent had provided the more general "Other Hispanic" response to the old question. This shift was more pronounced among the foreign born where 48.7 percent of the Dominicans in the new question had responded "Other Hispanic" to the old question (Table 3-B), compared with the natives, where only 38.4 percent of those identified as Dominican by the new question had responded "Other Hispanic" in the old question (Table 3-C).
The data in these tables support the conclusion that those who provided detail for the old question continued to provide detail for the new question, although consistency appears to be better for the foreign-born population. For example, the percent reporting Mexican origin consistently between the new and old questions was 96.1 percent for the foreign-born population and 90.1 percent for the native population. The corresponding percentages for Cubans were 91.7 percent and 81.5 percent, respectively. Puerto Ricans are not foreign born; so no comparison can be made along the nativity dimension, however, the consistency for all Puerto Ricans was 85.7 percent. 
188.8.131.52. Response validity and the new question.
Traditionally, statistical validity has referred to a measurement that is representative of, or an actual gauge for an observed phenomenon. By providing respondents greater choice, the new Hispanic origin question seems to improve the validity of responses by allowing interviewed subjects to better approximate their detailed Hispanic origin answers than did the old question. However, some caveats remain.
Table 4-A reveals that long-standing detailed Hispanic groups such as Mexican, Cuban, and Puerto Rican show high consistency in reporting birthplace and detailed Hispanic origin. In 2003, among those born in Mexico, 98.5 percent say they are Mexican in response to the new Hispanic question. Comparable figures are 93.4 percent for those born in Cuba and 92.8 percent for those born in Puerto Rico. However, not all the newly identified groups display comparable levels of consistency between birthplace and detailed Hispanic response. For example, among those born in the Dominican Republic, 88.7 percent report Dominican, while in contrast, among those born in El Salvador, a figure of 62.4 percent emerges.  The relatively lower proportion of foreign-born from El Salvador responding as Salvadoran raises questions about the responses for this Hispanic category.
184.108.40.206. Rethinking Hispanic.
A number of people reported as Hispanic in the "Yes-No" question (first in the series of new Hispanic questions shown in Appendix A), but in a follow-up question, some of those reporting Hispanic also indicated they were Portuguese, Haitian, Brazilian, or of some other group not traditionally identified by the Office of Management and Budget as an Hispanic category. Using unedited data, about 180,000 people reported in this manner in 2003. In all these cases, the response to the “Yes-No” question was changed in the edit to “No.” Using edited data, there were also about 287,000 people in 2003 who provided responses not listed in the Hispanic code list for CPS, such as “Mestizo,” “Raza,” or “Mixed,” and were therefore coded as “Other Other.” (Table 5)
These responses raise the issue of self-concept. While it was possible that some of the CPS respondents misunderstood the question, because a Census Bureau field representative conducted the CPS interview and many of these interviewers spoke the respondents’ language, confusion about the questions should have been minimized compared with the mail-out, mail-back census form. Furthermore, our research shows respondents’ or respondents’ parental birthplace may have led them to believe the terms “Hispanic” and/or “Latino” applied to them. Table 5 shows that among the 287,000 “Other Other” Hispanics in 2003, about 158,000 (55 percent) were born in the United States and about 129,000 (45 percent) were born elsewhere (primarily in Spanish-speaking countries). For 2004, among the 306,000 “Other Other” Hispanics, about 200,000 (65 percent) were born in the United States and about 106,000 (35 percent) were born elsewhere (again, primarily in Spanish-speaking countries).
Additional research needs to be conducted to understand why respondents who indicated they were Hispanic to the “Yes-No” question and then gave an explicitly non-Hispanic group in the follow-up question. One possible suggestion is that this is the only way these respondents can report a mixed ancestry. We might want to look at parental birthplace to see if one or both parents were born in a Spanish-speaking country, thus allowing for the possibility that the respondent wanted to express a multiple response. Regarding the “Other Other,” we have no additional data from the CPS, even with the parental birthplace data, to determine that the respondent is Hispanic. Overall, however, the number of these responses represents a relatively small share of the Hispanic population.
5.3. CPS 2001 sample expansion.
Following Census 2000, the Census Bureau began testing an expanded CPS monthly or basic sample. The primary goal of the ASEC expansion is the production of more precise as well as reliable state estimates of low-income children without health insurance (State Children’s Health Insurance Program or SCHIP). In July 2001, the Bureau of Labor Statistics (BLS) officially included the expanded sample in its labor force statistics.  The Census Bureau also increased the ASEC sample for minorities, and households with children living with a White householder.  The expanded ASEC sample in 2001 consisted of 78,000 interviewed households. Although the SCHIP sample expansion was specifically designed to improve state-based estimates of children’s health insurance status, other estimates have been improved as a result of the additional sample (Table 7 discussed below). 
5.4. The Census 2000 benchmark
We noted above, the Census Bureau uses independent demographic estimates to develop the CPS second stage weights and these demographic estimates are benchmarked to the last previous census. Table 1 shows the CPS Hispanic Origin totals as well as detailed groups series history, 1971 to 2004. Note that the CPS Hispanic origin estimates were benchmarked to censuses beginning in 1980 and again in 1990, and 2000 reflected in jumps in the plotted trend lines in Figure 1.
The 1990 census total shown earlier in Table A above represents the official census number. Demographic estimates used to develop second-stage weights benchmarked to 1990 were derived from a modified census base, sometimes called MARS for the “Modified Age-Race-Sex-Hispanic origin” distribution, where the category “Other” race has been proportionally distributed to four major race groups.  There was no immediate requirement for a fully developed MARS file for Census 2000.  Demographic estimates benchmarked to Census 2000 reflect change for five race groups: White; Black or African American; American Indian or Alaska Native; Asian; and Native Hawaiian or Other Pacific Islander. Prior to 2000, the Asian and Pacific Islander groups were combined.
In 2001, the Census Bureau introduced a new set of demographic estimates benchmarked to Census 2000. These new estimates currently form the basis of the CPS controls or second stage weights as described above. For evaluative purposes, the Census Bureau retrofitted the April 2000 census-based weights to basic survey data from October 1999 forward.  Monthly or basic CPS data weighted to population controls benchmarked to Census 2000 and earlier censuses are shown in Figure 1 and Figure 2.  The introduction of the 2000 controls increased the stated value of the basic March 2000 Hispanic population from 32.6 million (weights based on estimates benchmarked to 1990) to 34.7 million (weights based on estimates benchmarked to 2000), for a difference of about 2.1 million.
The introduction of the 2000 controls also resulted in an increase of 2.2 million Hispanics in the 2001 ASEC, as shown in Table 7, columns 1-3. Furthermore, the application of the new population controls introduced small changes in some of the stated sizes and proportions of selected characteristic-based subgroups found in the ASEC, as well as some of statistics derived from those numbers, as can be seen in Table 7, columns 1-3. 
5.5 Summary: What has happened to CPS Hispanic data over time?
Table 1 and Figures 1 and 2 show the CPS Hispanic population estimates 1971 to 2004. The final column in Table 1 shows important milestones in the CPS series over the period, up to and including the switch to the new Hispanic origin question in 2002.  Although Table 1 shows the years when census data were collected, Figure 1 graphically illustrates when the effects of updated weights based on census-based estimates were applied each decade.  Although the jumps in 1983 and 1993 are noticeable, the trend line is relatively smooth.  This smoothness reflects the application of the annual updated census-based population weights during the years between censuses. In the early years of the survey, the CPS Hispanic numbers were much more volatile. The application of census-based weights to the CPS estimates led to “control” of radical annual and monthly fluctuations as well as more precise estimates of the total Hispanic population. 
On the other hand, Figure 2 shows a somewhat different picture. Because the detailed Hispanic group samples are smaller than the total Hispanic sample and they are not controlled to census-based weights, they are much more prone to sampling variability.  Owing to the fact that detailed census information is only collected every 10 years, the Census Bureau has not attempted to develop and apply detailed Hispanic group census-based controls to CPS.
Table 6 reveals the precision of Hispanic group categories was improved, which may allow analysts to examine various characteristics of these groups and consider adding new groups to the core list. Using reporting of Hispanic origin and associated birth place, we saw in Table 5 that those born in "Mexico" reported "Mexican" 98.5 percent of the time in 2003 and 98.9 percent in 2004; those born in "Cuba", reported "Cuban" 93.4 percent in 2003 and 95.7 percent in 2004; people born in "Puerto Rico" reported "Puerto Rican" 92.8 percent and 95.8 percent in 2003 and 2004, respectively. 
Table 2 revealed that each of the "old" groups (Mexican, Cuban, Puerto Rican) consistently exhibits a population in excess of 1 million people. Six-months of data from the CPS Basic survey also show that other groups such as Dominican, Central American and South American have also shown populations above one million in recent years. The category Salvadoran has not shown a population of one million consistently and as we noted above, people born in El Salvador do not identify themselves as Salvadoran with the same levels of consistency as some other groups.
As a result of our findings, we recommend the following for CPS/ASEC:
- The new Hispanic/Latino categories list should include (new items are indicated with an asterisk): Mexican, Puerto Rican, Cuban, Dominican*, Central American*, and South American* and Other Hispanic. The social, demographic, and economic characteristics for the new groups are sufficiently different from each other and the other Hispanic groups that they merit being shown separately.
- “Salvadoran” should not be shown separately at this time or until we understand better why so many people born in El Salvador do not report Salvadoran origin.
- Additional Hispanic groups should be shown separately when they exceed the 1 million threshold for at least 6 consecutive months (Basic or monthly CPS) and two CPS ASEC and ACS cycles, as well as exhibit internal consistency reflected in the correlations between place of birth and detailed identification.
- We recommend more research into the responses where: 1) people reported they were Hispanic in the “Yes-No” question and then reported a non-Hispanic response and 2) people reported they were Hispanic in the “Yes-No” question and then essentially reported a response that was uncodeable. Although the number of these responses is small, we need to understand better why respondents report in this manner. This is especially important for other data collection efforts such as the decennial census and the American Community Survey where the primary method of data collection is through self-enumeration via a mail-out/mail-back form and the respondents who choose to respond by mail do not have the benefit of an experienced field representative to help answer the question.
- Because the CPS includes information on the second generation not available from other data sources it affords a unique opportunity to report on population trends and should be used as a basis for analysis. We know from past research on immigration the importance of tracking how well succeeding generations have fared in making their way in society. We see the tremendous value of showing data for the above-mentioned groups by birth place and parental birth place, which currently can only be obtained from the CPS files. We propose periodic reports showing social, demographic, and economic characteristics for the Hispanic population by these detailed groups by first, second and third generation available and where sample permits.  The resulting report, we believe, will be received with great interest by our data users.
We plan to continue our research to demonstrate the quality of CPS estimates for the selected Hispanic groups we propose for inclusion in Current Population Reports beginning with the CPS 2006 Annual Social and Economic Supplement (ASEC) products.