U.S. Bureau of the Census
Washington, D.C. 20233-8800
POPULATION DIVISION TECHNICAL WORKING PAPER NO. 31
An earlier version of this paper was presented at the American Community Survey Symposium, Census Bureau, Suitland, MD, March 25, 1998 and at the annual meeting of the Population Association of America, Chicago, IL, April 2-4, 1998.
This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limited review than official Census Bureau publications. This report is released to inform interested parties of research and to encourage discussion.
The Bureau of the Census is expanding two of its primary programs -- the sample survey measurement program and the intercensal population estimates program. On the survey front, testing for the American Community Survey (ACS) is currently underway. Intercensal estimates activities include the production of annual population estimates of counties by age, sex, race and Hispanic origin. Traditionally, the Census Bureau integrates survey results with estimates by introducing the population estimates as independent controls to the sample survey results.
This paper identifies many of the issues involved in integrating survey results and population estimates, including methodological differences in: (1) temporal concepts; (2) residence concepts; and, (3) race and ethnic definitions. This paper also examines concrete examples of traditional weighting issues. Using results from the four 1996 ACS test sites, this paper compares the first stage ACS data to independent population estimates, yielding useful insights to the degree of ACS weighting needed. This paper concludes by summarizing the crossroads between the ACS and intercensal population estimates, and discusses some alternatives and enhancements to the integration of these two programs.
In response to the growing demands for current, continuous, and timely demographic measures for small areas, the Bureau of the Census is expanding two of its primary programs--the sample survey measurement program and the intercensal population estimates program. Presently, testing is underway for the American Community Survey (ACS), a monthly household survey designed to provide continuous demographic characteristics for counties, places, and other small political areas. Recent intercensal population estimates program activities include the addition of annual population estimates of counties with age, sex, race, and Hispanic origin detail.
Traditionally, the Census Bureau integrates survey results with estimates by introducing the population estimates as independent controls to the sample survey results. Many of the Census Bureau surveys, for instance the Current Population Survey (CPS) and the Survey of Income and Program Participation (SIPP), are controlled to match independently derived intercensal population estimates.
These recent activities introduce a number of significant challenges and issues for both Census Bureau programs. This paper will identify many of the issues involved in integrating the ACS data and intercensal population estimates considering the existing methodological differences in: (1) temporal concepts; (2) residence concepts; and, (3) race and ethnic definitions.
This paper will examine concrete examples of traditional weighting issues. Using results of the 1996 test of the ACS conducted in Multnomah County, OR; Rockland County, NY; Brevard County, FL; and, Fulton County, PA, this paper will compare the first stage ACS data to independent population estimates. Examining results first for the total population, and then dissagregated by age, sex, race, and Hispanic origin characteristics may yield useful insights to the degree of weighting needed in the ACS data.
Finally, this paper will conclude by discussing some alternatives and enhancements to the integration of the American Community Survey with the intercensal population estimates program.
This paper is organized into a number of sections:
Section II -Provides an overview of the design, implementation, and goals of the American Community Survey.
Section III -Summarizes the methodology used to produce the 1996 county estimates with age, sex, race, and Hispanic origin detail.
Section IV -Describes the procedure of controlling the ACS results to the independent population estimates. To illustrate this procedure Table 1 presents the ACS results after both weighting stages compared to the population estimates.
Section V -Outlines issues regarding integration of survey data and population estimates. Specifically: (1) residence concepts; (2) temporal concepts; and, (3) race and ethnicity rules are explored in detail.
Section VI - Presents tables comparing the ACS results and population estimates for each county with age, sex, race, and Hispanic origin detail.
Section VII - Summarizes the findings and closes with a discussion of the crossroads, alternatives, and enhancements to the integration of the ACS and population estimates.
II. THE AMERICAN COMMUNTY SURVEY
As the newest development in the sample survey measurement program, the ACS is part of the Continuous Measurement System. The general idea for a program of "continuous measurement" goes back to the suggestion for an "annual sample survey" proposed by Census Bureau employee Philip Hauser in 1941. In the late 1980s suggestions for "continuous measurement" resurfaced and this time were incorporated as one of the options researched by the 2000 Census Research Staff. In its earliest stages there were no design details, only a general idea to replace the long form with an intercensal data collection program "spread out in some fashion over the decade." 1 Today, the Census Bureau describes Continuous Measurement (CM) as the reengineering of the method for collecting the housing and socioeconomic data traditionally obtained from the decennial census "long form." The Continuous Measurement program includes the use of the ACS and intercensal estimates. 2
The American Community Survey testing is currently underway, implementation is being conducted in four phases:
Once testing is complete, the American Community Survey is designed to produce estimates of housing, social, and economic characteristics every year for all states, as well as for all cities, counties, metropolitan areas, and population groups of 65,000 persons or more. For smaller areas, it will take two-to-five years to sample the same number of households as sampled in the decennial census. For example, in rural areas, city neighborhoods, or in places with population groups of less than 15,000, it will take five years to accumulate a sample the size of the decennial census. Eventually, the multi-year estimates of characteristics will be updated annually for every government unit, for components of the population, and for census tracts and block groups. The ACS is proposed to replace the census "long form" in the 2010 decennial census.
The 1996 American Community Survey
Results from the 1996 ACS represent the focus of this research. The scope of the ACS in this phase was limited to housing units (e.g. excluding group quarters populations), either occupied or vacant, in four sites:
Brevard County, Florida
Multnomah County, Oregon
A large metropolitan county that is part of a multiple county PMSA. Available data is organized by county and entire site. The entire site includes all of Portland city which is located primarily in Multnomah County but also extends into Washington and Clackamas Counties. Vice versa, data for the county does not include the parts of Portland city which are outside the Multnomah county borders. 3
Rockland County, New York
Fulton County, Pennsylvania
Responding to inquires concerning how the four test sites were chosen, the Census Bureau stated that they looked for sites that addressed critical operational issues and met the evaluation objectives of the demonstration period. Geographic balance was sited as an important consideration so that each of the Regional Offices would acquire experience with the ACS. The Census Bureau considered the current survey workload in a particular site to see if existing field representatives could be used rather than having to hire new ones. The percentage of city-style addresses in potential sites as well as the percentage of post office boxes, and the sites' land area, were also considered. Finally, the availability of local experts who were willing to use the data and assist the Census Bureau in comparing the ACS and Census 2000 data for each site was a factor. 4
In 1999-2001, the number of county sites in the sample will be increased to thirty-seven comparison sites and eight phase-in sites in 1999 only. The purpose of this comparison phase is to collect several kinds of information via the ACS to understand the differences between the 1999-2001 ACS and the 2000 census long form. Comparison counties have been identified which are believed to include various situations in which differences may be prominent. To explain, sites were selected to have at least one location in each of twenty-four strata representing combinations of county population counts, difficulty of enumeration, and 1990-1995 population growth. The selection also attempted to balance areas by region of the country, and sought to include several sites representing different characteristics of interest such as:
The purpose of the comparison counties is to give a good tract-by-tract comparison between the 1999-2001 ACS cumulated estimates and the Census 2000 long-form estimates, and to use these comparisons to identify both the causes of differences and "diagnostic variables" that tend to predict certain kinds of differences.
National Comparison Sample
In 2000-2002, pending Congressional approval of funding, plans are to add a national sample of 700,000 housing units per year to the ACS. This will allow the Census Bureau to provide estimates for all states and for geographic areas or population groups of 250,000 persons or more. From the national sample, it will be possible to deliver direct comparison information to show how data from the American Community Survey compare with the data from the census long form for all states, large cities, and large sub-state areas. For areas with fewer people, such as small counties, small towns, or census tracts, statistical modeling will give indirect information telling how the ACS would typically compare to the census long form "for an area like this." The model-based comparison will use information from both the national sample and the comparison counties, rather than just from the sample from each small area.
Finally, in 2003 the American Community Survey will be implemented in every county of the United States with an annual sample of three million housing units. Once the survey is in full operation, ACS data will be available every year for area and population groups of 65,000 or more beginning in 2004. For small areas and population groups of 15,000 or less, it will take five years to accumulate information to provide accurate estimates, meaning that updated information for areas such as neighborhoods will be available starting in 2008 and every year thereafter. 5
The American Community Survey meets the needs of data users for timely data that provide consistent measures for all areas. For many geographic areas decennial sample data are out-of-date almost as soon as they are published, about two years after the census is taken, and their usefulness declines every year thereafter. Yet, billions of government and business dollars are divided among jurisdictions and population groups each year based on their social and economic profiles in the decennial census. The American Community Survey can solve this problem as it is formulated to identify rapid changes in an area's population and give an up-to-date statistical picture when data users need it, as opposed to once every decade. Some of the proposed applications for ACS data include the ability to: track the well-being of children, families and the elderly; determine where to locate new highways, schools, and hospitals; show a large corporation that a town has the workforce the company needs; evaluate programs such as welfare and workforce diversification; and, monitor and publicize program results. For more information about the American Community Survey visit the ACS website at http://www.census.gov/cms/www/
III. POPULATION ESTIMATES PROGRAM
The Census Bureau estimates program, on the other hand, produces population estimates for the nation, states, counties, and places as part of its ongoing program to quantify changes in the population size and distribution since the last census. Central to the purpose of this research are the population estimates produced for counties. The recent release of 1996 county level population estimates with age, sex, race, and Hispanic origin detail are part of a newly developed project which is now in an intermediate stage. Below I provide an outline of the methodology used to derive the intercensal population estimates.
County Estimates Methodology
Most recently the Census Bureau has released estimates of the resident population of the counties in the United States by age (ages 0 to 84; 85 and over), sex (male; female), race (White; Black; American Indian, Eskimo and Aleut; Asian and Pacific Islander), and Hispanic origin (Hispanic or non-Hispanic) for July 1st of each year from 1990 to 1996. These estimates are consistent with:
The county estimates are developed in a two-step process. First, state level estimates with age, sex, race, and Hispanic origin detail are developed using the cohort-component method whereby each component of population change -births, deaths, domestic migration, and international migration is estimated separately for each birth cohort by sex, race, and Hispanic origin. The cohort-component method is based on the traditional demographic accounting system:
P1=P0 + B - D + NDM + NMA where:
P1= population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net domestic migration during the period
NMA = net migration from abroad during the period. 6
In the second stage, county estimates are then derived from the state estimates using a two-step mathematical technique commonly referred to as the ratio method. The 1990 census data for each county by age, sex, race, and Hispanic origin are the starting point. First, these 1990 data are aggregated to county totals and adjusted to agree with the updated (in this case 1996) county estimates. Second, the adjusted county data by age, sex, race, and Hispanic origin are aggregated to state totals by age, sex, race, and Hispanic origin and adjusted to agree with the state estimates for these groups. Applying the ratio method in this manner is often referred to as "raking" the data.
An additional refinement added to the production of this set of estimates, is the separate estimation of the group quarters (GQ) population. The GQ population are those individuals residing in non-standard living arrangements such as correctional institutions and nursing homes. Their demographic characteristics are often very different from the rest of the population of the county in which they reside, which is why it is useful to estimate them separately. Unless otherwise noted, any county population estimates incorporated in this paper will not include the group quarters population. 7
This summary is meant to serve only as a brief description of the recent developments in the sample survey program and the intercensal estimates program. Additional information can be gleaned from the Census Bureau's population estimates page at http://www.census.gov/population/www/estimates/popest.html.
An overview of both the ACS and population estimates is necessary before proceeding to the next section which provides a description of how and why these two programs are integrated.
IV. INTEGRATION: POPULATION ESTIMATES AS ACS CONTROLS
One of the most obvious ACS and population estimates crossroads is the Census Bureau's tradition of introducing population estimates as controls to sample survey results. This process is often referred to as "calibrating," "weighting" or, in this case ccontrolling," the ACS results for counties to independently derived county population estimates. In addition to the ACS, many other Census Bureau surveys, for instance the Current Population Survey (CPS) and the Survey of Income and Program Participation (SIPP), are calibrated to national population estimates by age, sex, race, and Hispanic origin and to estimates of the population aged sixteen and over for states and New York City and Los Angeles.
The process of weighting the ACS results is complex, only a brief summary is provided here. Two sets of weights were assigned: (1) a weight to each sample housing unit record; and, (2) a weight to each sample person record. Estimates of person characteristics are based on the person weight; estimates of family, household or housing unit characteristics are based on the housing unit weight. Characteristic estimates are made by summing the weights assigned to the persons, households, and families or housing units possessing the characteristics in the tabulation area. Initially, each person in an occupied housing unit received the housing unit weight as their person weight, at this point everyone in the household had the same weight. Person weights were then individually adjusted based on each persons' age, sex, race, and Hispanic origin to match county population controls by age, sex, race, and Hispanic origin independently derived by the Population Estimates program. The estimation procedure used to assign the weights was performed independently for each of the 1996 ACS sites. 8
It is the aim of this paper to explore concrete examples of traditional weighting issues. Comparing the ACS results and population estimates may yield useful insight to the degree of weighting needed in the ACS data (see Table 1). To ascertain the degree of weighting needed an index of the ratio of the ACS results (after housing unit weighting but before second stage weighting ) to the population estimates is calculated. A value over 100 indicates that the population estimate is higher than the ACS estimate; conversely, a value under 100 indicates that the ACS estimate is higher than the population estimate.
Indexes over one hundred consistently show that the ACS housing weighted estimates are below the population estimates (see Table 1). Ruling out the possibility of sampling error as the primary explanation, one possibility may be that the universe of housing units the ACS results are weighted to in the first stage does not represent the true universe of housing units in each county. Two factors may explain why this is the case. First, the county Master Address File (MAF) from which the ACS sample was selected may not represent the true universe of units which actually exist in each county. Second, the MAF may represent the true universe of units, but people within the units were unknowingly missed; this outcome may suggest response bias resulting in under or over coverage and will be addressed later.
Each month a sample was selected from the national Master Address File (MAF). The MAF was initially constructed by a computer match of the U.S. Postal Service (USPS), Delivery Sequence File (DSF), the 1990 Census Address Control File (ACF), and the Topologically Integrated Geographic Encoding and Referencing (TIGER) files. The MAF can be created in an automated fashion for all areas that have a city-style address system where the mail is delivered using these addresses. For areas that do not have a city-style address system, such as Fulton county, the Census Bureau will create a MAF by conducting an address listing operation.
While no statistical testing has been undertaken, the wide variability in indexes between counties (101-106, respectively) may indicate that a combination of the above explanations is appropriate (see Table 1). In other words, in each county, housing units may have been missed, as well as, persons within captured housing units may have inadvertently been missed. Another explanation may be that the quality of the MAF varies by locale. Additionally, it is important to recognize that thus far this analysis has assumed that the population estimates represent "truth" for each county, which is clearly a dangerous assumption. Below, ACS results and population estimates with age, sex, race, and Hispanic origin will be compared county by county. Perhaps these characteristics will suggest explanations, in addition to those outlined here, which will account for this disparity.
In this paper Fulton county is considered to be a special case. Due to its size, and the fact that no county-wide address system exists, it may be dangerous to place a high level of confidence on the ACS results. Throughout, Fulton county data will only be presented for the reader's information.
|Table 1.||First and second stage weighted 1996 ACS results1 and 1996 population estimates2 by county: Total population|
|County||American Community Survey||Population Estimate||Index3|
|Housing Unit Record Weighting||Person Record Weighting|
V. INTEGRATION ISSUES AND CHALLENGES
A number of challenges and issues have accompanied the Bureau's expansion of both the sample survey measurement program and the intercensal population estimates program. Specifically, integration of the ACS results and independent population estimates reveal methodological differences pertaining to: (1) residence concepts; (2) temporal concepts; and, (3) race and ethnic definitions. Since the primary purpose of this paper is to assess the comparability of the ACS data and intercensal population estimates it is important to highlight these differences now.
First consider residence concepts as they are defined in each data source. The ACS survey as it was implemented in the four 1996 test sites relies on a "current residence" concept. Although very close to a pure de facto, "who slept here last night" concept, there is one main difference; the ACS aims to include everyone who is currently living or staying at a unit, with the exception of people who "usually" or "really" live somewhere else (e.g. their de jure residence) and are gone from that usual residence for two months or less. To introduce consistency in the assignment of persons to a residence the ACS established the "two-month" rule.
The ACS two-month rule defines the current residence of all persons, with only three apparent exceptions. First, children below the college level away at school are considered residents of their parental home. College students' current residence is established by the two-month rule. Second, children who live under joint custody agreements and move often between the separate residences of their parents are considered to be current residents of the sample unit if they are staying there when the contact with the unit is made. Finally, in the instance of commuter workers, persons who stay in a residence close to their work and return regularly to another residence, usually with family, are considered to be current residents of the family residence, not the work-related one. 9
On the other hand, intercensal population estimates rely on the residence concept of the decennial census, where persons are required to have a de jure "usual residence." This is defined as the place they live and sleep most of the time or the place they consider to be their usual home. Although implied, but never explicitly stated on the census form, "usual residence" is assumed to be the place a person spends six months or more of the year. 10 Since the 1950 census, college students have been enumerated at the place of their college rather than where their parents live and where they may return to during holidays and summer. 11
For the majority of the population, their de facto and de jure residence will be the same. There are, however, certain segments of the population where the de facto and the de jure residence will be different. Most notable are the "snowbirds." A snowbird is a generic label for a person who lives in one place for an extended portion of the year and another for the remainder. For all intents and purposes a snowbird is a person who resides at a seasonal residence for at least some portion of the year. The decennial census has typically handled the "snowbird situation" by determining which residence is the "usual" address as of the census date and assigning the household to that location, even if they are staying at the "nonusual" location at the time of contact. The ACS, on the other hand, will count the snowbirds where they are found.
Although the number of "snowbirds" is not great, the phenomenon is geographically specific and presents a challenge in integrating ACS and population estimates at the county level. The ACS does recognize that appreciable differences may exist for areas where large numbers of people spend several months of the year in units that are not their primary residence, for example Florida, Arizona, and in beach or mountain vacation areas. A working group within the Population Division at the Census Bureau is currently investigating this possibility.
Another methodological variation between the ACS and intercensal population estimates that warrants discussion is the reference period applied to characteristics. To illustrate, the ACS annual survey results constitute "average" annual estimates. On the other hand, intercensal population estimates are based on the characteristics at one particular point in time.
The annual ACS results represent the cumulative results of the twelve month interview cycle. For areas with a seasonal component, variations are included in the annual ACS picture. The intercensal population estimates represent a picture at one point in time. Although the intercensal estimates uses July 1st as its reference point the snapshot date is not exact.
Going back to the estimates methodology, the intercensal estimates begin with the decennial census population. For the county totals and state estimates the annual components of population change are added and subtracted. Because the population estimates are heavily grounded to the decennial census the estimate begins with the snapshot of the usual residence on census day, April 1st. For the 1990 year, the population is moved forward to July 1st using estimates of the components for the April-July period. Once the population is moved forward to July 1st, the remaining subnational estimates are estimated for each July 1st date using annual estimates of the components of population change.
Although the components of birth, death and international migration represent the actual July to July period, the estimates of internal migration are not as neatly tied to the actual July 1st to July 1st period. Because the internal migration component is an estimate heavily based on tax return information received from January through August the reference period is not as straight forward. However, in the intercensal estimates environment we assume these annual estimates of internal migration apply to the July 1st to July 1st period.
In sum, likening the ACS to a video and intercensal estimates to a still photograph simplifies the explanation of reference period differences. The ACS results can be thought of as a video that runs over the course of a twelve month period which displays the demographic, social, and household characteristics of the population. As a sample survey running in different phases, the ACS results are an "average" of the population's characteristics over a twelve month period. Intercensal estimates, on the other hand, are analogues to a still picture taken at one particular point in time, July 1st. In theory the estimates derived by both sources should be similar, both display characteristics of the population over approximately the same period in the past. The purpose of this paper is to test this relationship using the 1996 ACS test site data and the 1996 intercensal population estimates.
Race and Ethnicity Definitions
Finally, understanding the crossroads in terms of the definitions of racial and ethnic characteristics is necessary to the purpose of this research. Examination of both data sources reveal similarities as well as differences in definitions.
In general, the racial and ethnic definitions utilized by the Bureau of the Census reflect self-identification, and therefore cannot be assumed to represent any clear-cut scientific definition of biological stock. The data represent self-classification by people according to the categories with which they most closely identify. On the decennial census form, and during sample surveys, persons are instructed to select the one response category which best describes their racial identity and ethnic origin group. The Census Bureau recognizes that the categories include both racial and national origin or socio-cultural groups.
The racial and ethnic category classifications used by the Census Bureau generally adhere to the "Race and Ethnic Standards for Federal Statistics and Administrative Reporting" set forth in Statistical Policy Directive No. 15, issued by the Office of Management and Budget (OMB). This Directive provides common language, and promotes uniformity and comparability of race and ethnicity data. 12 The minimum race and ethnic categories designated by OMB are:
White includes persons who indicated their race as "White" or reported entries such as Canadian, German, Italian, Lebanese, Near Easterner, Arab, or Polish
Black includes persons who indicated their races as "Black or Negro" or reported entries such as African American, Afro-American, Black Puerto Rican, Jamaican, Nigerian, West Indian or Haitian.
American Indian, Eskimo or Aleut includes persons who classified themselves as:
American Indian includes persons who indicated their race as "American Indian," entered the name of an Indian tribe, or reported such entries as Canadian Indian, French-American Indian, or Spanish-American Indian.Information on tribe is based on self-identification and therefore does not reflect enrollment in or any designation of Federally- or State-recognized tribe.
Eskimos includes persons who indicated their race as "Eskimo" or reported entries such as Arctic Slope, Inupiat, and Yupik.
Aleut includes persons who indicated their race as "Aleut" or reported entries such as Alutiiq, Egegik, and Pribilovian.Asian and Pacific Islander includes persons who reported in one of the Asian or Pacific Islander groups listed on the questionnaire or who provided write-in responses such as Thai, Nepali, or Tongan.
Asian includes "Chinese," Filipino," "Japanese," Asian Indian," Korean," "Vietnamese," and "Other Asian."
Chinese includes persons who indicated their race as "Chinese" or who identified themselves as Cantonese, Tibetan, or Chinese American. In standard census reports, persons who reported as "Taiwanese" or "Formosan" are included here with Chinese.
Filipino includes persons who indicated their race as "Filipino" or reported entries such as Philipino, Philipine, or Filipino American.
Japanese includes persons who indicated their race as "Japanese" and persons who identified themselves as Nipponese or Japanese American.
Asian Indian includes persons who indicated their race as "Asian Indian" and persons who identified themselves as Bengalese, Bharat, Dravidian, East Indian or Goanese.
Korean includes persons who indicated their race as "Korean" and persons who identified themselves as Korean American.
Vietnamese includes persons who indicated their race as "Vietnamese" and persons who identified themselves as Vietnamese American.
Other Asian includes persons who provided a write-in response such as Bangladeshi, Cambodian, Indonesian, Laotian, Pakistani, Sri Lankan, Amerasian, or Eurosian.
Pacific Islander includes persons who indicated their race as "Pacific Islander" by classifying themselves into one of the following groups or identifying themselves as one of the Pacific Islander cultural groups of Polynesian, Micronesian, or Melanesian.
Other race includes all other persons not included in the "White," "Black," "American Indian, Eskimo, or Aleut," and the "Asian or Pacific Islander" race categories described above.
Hawaiian includes persons who indicated their race as "Hawaiian" as well as persons who identified themselves as Part Hawaiian or Native Hawaiian.
Samoan includes persons who indicated their race as "Samoan" or persons who identified themselves as Chamorro or Guam.
Guamanian includes persons who identified their race as "Guamanian" or persons who identified themselves as Chamorro or Guam.
The above outlined race categories are mainly consistent across both the 1990 decennial census form and the questionnaire used in the four 1996 test sites. The ACS questionnaire had one important difference. In addition to the racial categories included in the cecennial census, respondents in the ACS were given the opportunity to designate themselves as "multiracial." Multiracial persons where also asked to supply a write-in response, similar to the procedure for choosing "other" race. 13
Although the intercensal estimates are strongly tied to the decennial data, for the intercensal population estimates program some modifications were introduced to the age and race data. The construction of the modified age, race, sex (MARS) file was necessary to ensure comparability to other data sources. From the decennial census data the race statistics were collapsed into four categories (white; black; American Indian, Eskimo and Aleut; and, Asian and Pacific Islander). The two ethnicity groups remain constant, Hispanic and non-Hispanic. The age data were modified to correspond with the April 1, 1990 census date. Overall the "modified" data counts remain consistent with the 1990 counts of the census as enumerated, and are used as the base to construct annual intercensal population estimates. The following paragraphs describe the construction and demand for this file.
In the 1990 census there were just under ten million reports of "other race" which needed to be assigned to one of the four race categories when collapsed in the MARS file. Indicating other race meant that these people were not included in one of the fifteen racial categories listed on the census form and therefore could not be collapsed into one of the four categories. The existence of this group is inconsistent with the race categories defined by OMB in Directive No. 15. Such "non-specified" race persons are not found in data sources other than the census. In order to serve the needs of some portions of the user community, and to construct intercensal estimates, it is necessary to assign each of these persons to a specified race. The methods for doing this in the MARS file are outlined below.
After evaluating many alternatives, the following race assignment rule was used, namely to assign each "other race" person to the specified race reported by a nearby person with an identical response to the Hispanic origin question. Specifications of this Race Assignment rule include:
First, that the specific Hispanic origin of each "other race" person in the 1990 census was taken into account when assigning them to a specified race. This was considered appropriate because over ninety-five percent of the "other race" persons were of Hispanic origin. Their Hispanic origin response was used, whether or not it had been allocated, in order to preserve the race distribution within each type of origin. The specific Hispanic origin responses were "not Spanish/Hispanic," "Mexican," "Puerto Rican," "Cuban," and "other Spanish/Hispanic."
Second, virtually every person who reported both a specified race and an origin was included in the "donor pool" of eligible persons. The sole exception was the exclusion of several non-specific American Indian codes from the donor pool since: (1) preliminary 1990 research suggested questionable reporting in the American Indian category; and, (2) previous research showed that a high proportion of such persons were much less likely to be American Indians than those who actually provided a specific tribe response as instructed on the census form. These codes were: 548-American White; 549-American Black; 597 American Indian (no tribe reported); 598-American Indian (tribal responses not elsewhere classified), and 973-FOSDIC circle with no write-in response. These were excluded because of evidence from the 1980 census that misreporting of race was much higher in these codes than it was in codes representing specific American Indian tribes. Consistent with advisory committee recommendations, any person assigned to the American Indian race through allocation was give code 973 rather than a specific tribal code.
Third, the assignment of a specified race was made on an individual basis. That is, no effort was made to minimize racial heterogeneity within households. Any such attempt would have made it difficult to assign race in a manner which approximated the specified-race distribution reported by persons with the same Hispanic origin response.
Fourth, the race, origin, or sex of some persons also changed as a result of the assignment of a different age to them during the application of the age modification procedures. Their changed age sometimes caused the person to be allocated a different relationship and/or sex which resulted in the person receiving their race or origin from a different person in the household, since those items were allocated according to a hierarchy of relationships.
Fifth, the results of the race modification procedures were overridden in four counties where the American Indian population grew by more than 100 percent and also became at least one percentage point more of the county's population: Adams county, WA; Harmon county, OK; Clark county, and Washington county, ID.
Finally, in most census allocations procedures, acceptable data from eligible persons (donors) are far more common than are the cases where the value is assigned to persons without the characteristics (the donees). This means information from any given donor is rarely used more than once. Such large donor-to-donee ratios were not unusual here. However, there were a number of occasions where those needing a specified race outnumbered those who reported the same origin as well as a specified race. 14ACS
In terms of comparability then, the 1990 decennial census had a response category termed "other race." In the ACS this category is instead termed "some other race." As part of the effort to provide research useful to the OMB's review of Statistical Directive No. 15, the ACS also included a "multiracial" category on the race question. For weighting purposes we collapsed the full ACS race categories into the common four categories, white, black, AIEA and API to be consistent with the MARS file. For simplicity, for weighting purposes only, we chose to collapse the "some other race" and "multiracial" categories, in the ACS, with the white category. Results may show that this technique will need to be explored more fully in the future.
In sum, it is clear that there are important methodological differences between the American Community Survey, as implemented in the four 1996 test sites, and the intercensal population estimates which rely largely on the 1990 decennial census. In terms of temporal and residence rules the Census Bureau maintains that few substantial variations should appear in the two data sources as a result of these concept variations.
In addition to the differences in residence, temporal and race/ethnicity concepts we need to remember that the ACS results are products of sample surveys. Using sampling methodology the data in the ACS products are estimates of the actual figures that would have been obtained by interviewing the entire population. The estimates from the chosen sample also differ from other samples of housing units and persons within those housing units. The possibility of sampling errors arise due to the use of probability sampling, which is necessary to ensure the integrity and representativeness of sample survey results.
In addition to sampling error, other types of errors may appear during any of the various complex operations used to collect and process survey data. For example, operations such as editing, reviewing, or keying data from questionnaires may introduce error into the estimates. These and other sources of error contribute to the nonsampling error component of the total error of survey estimates. Nonsampling errors may affect the data in two ways. Errors that are introduced randomly increase the variability of the data. Systematic errors which are consistent in one direction introduce bias into the results of a sample survey. The Census Bureau protects against the effect of systematic errors on survey estimates by conducting extensive research and evaluation programs on sampling techniques, questionnaire design, and data collection and processing procedures. In addition, an important goal of the American Community Survey is to minimize the amount of nonsampling error through nonresponse for sample housing units. One way of accomplishing this is by following-up on mail nonrespondents during the CATI and CAPI phases.
Standard error is a measure of the deviation of a sample estimate from the average of all possible samples. Sampling errors and some types of nonsampling errors are estimated by the standard error. The sample estimate and its estimated standard error permit the construction of interval estimates within a prescribed confidence that the interval includes the average result of all possible samples. Direct estimates of the standard errors were calculated for all estimates which will be provided below. The standard errors, in most cases, are calculated using standard variance estimates software using a methodology that takes into account the sample design and estimation procedures.
In the next section the question of comparability will be raised; first stage weighted ACS results (after household weighting but before person weighting) for the four 1996 test sites will be compared to corresponding intercensal population estimates. Age, sex, race, and Hispanic origin characteristics will be explored to understand differences.
Crossroads between the American Community Survey and intercensal population estimates can best be understood by comparing actual results. ACS results and intercensal population estimates by county for the total population were compared above (see Table 1). That analysis showed us the process of weighting, as well as the total amount of weighting needed by locale. Now, the ACS results and population estimates will be further dissagregated by age, sex, race, and Hispanic origin. This analysis may provide insight to the degree of weighting needed by characteristic, interweaving the methodological differences discussed above.
Table 2 compares the 1996 ACS results and population estimates by age (five year age groups) and gender. Although statistical testing of ACS results and population estimates to establish significant differences within counties has not yet been undertaken, it is worthwhile to explore logical explanations for these findings. It is believed that the majority of the differences apparent in Table 2 can be described by the non-response of certain groups in surveys and differences in residence and temporal concepts.
Nonresponse bias may be an important factor in at least Brevard and Rockland counties. Providing insight to this issue is Chakrabarty's (1994) research exploring the effects of nonresponse bias in the coverage of the April 1990 Current Population Survey and the 1990 census. Chakrabarty's results indicate that:
Returning to Table 2, and concentrating on the percent distributions of the population by age, Brevard county shows possible survey nonresponse for males at ages 20-24 (ACS 4.0% v. estimate 5.3%); ages 25-29 (ACS 5.3% v. estimate 6.8%); and ages 30-34 (ACS 7.7% v. estimate 8.4%). Possible nonresponse is also apparent for females age 20-24 (ACS 4.0% v. estimate 5.0%) and ages 25-29 (ACS 5.0% v. estimate 6.4%). These findings are consistent with the Chakrabarty's research in terms of age and gender characteristics.
Although not to the same magnitude, nonresponse is perhaps also a factor in Rockland county. For males ages 20-24 and 25-29 the ACS estimates are below the population estimates. Results show that similar problems do not arise for women. Part of Chakrabarty's research suggests undercoverage for males, especially black males, in MSAs (versus nonMSAs). As a metropolitan county that is part of the New York, NY Primary Metropolitan Statistical Area this may be an important factor.
The other phenomenon which may explain ACS and population estimate differences, especially for the population sixty five and over, are differences in methodologies as they pertain to residence and temporal concepts. Table 2 shows that in Brevard county differences in the percent distribution for persons sixty-five and over is nearly three percent. Overrepresentation of women in the ACS compared to the population estimate is also apparent beginning even earlier than for men, at age sixty. Recognition of differences in residence and temporal concepts becomes important when trying to capture residents such as the elderly who may live most of the year in one residence and winter in another. This group was previously referred to as "snowbirds". Application of a two month rule (ACS) versus a usual residence/6 month rule (population estimates) may yield different results. Research suggests that determinants of migration (either seasonal or permanent) for the elderly population are noneconomic, primarily focusing on the opportunity for recreational development and other amenities, as well as mild temperatures (Heaton et al. 1981; Mueser and Graves 1995; Murdock et al. 1984). Based on these criterion Brevard county may be especially attractive to persons sixty-five and over at least sometime during the year.
Similarly, higher percentages of persons sixty-five and over are captured by the ACS in Rockland county. The percent differences in the distribution for both males and females follow the same patterns as in Brevard county. Clearly Rockland county, NY does not fit the normal characteristics of a locality with amenities and a mild temperature where a large segment of the elderly population might want to winter. This finding cannot be easily explained.
Table 3 compares the ACS results and population estimates by race. In general, the coverage of all racial groups except blacks is considered to be good. Survey undercoverage similar to that described by Chakrabarty (1994) may be responsible for the smaller percentages of blacks found in all counties except Rockland.
Unlike any other county, Rockland shows an ACS result for the black population 1.1% higher than the population estimate. Considering the research outlined above (Chakrabarty 1994), an overcoverage of blacks is not typical. Salvo and Dahl 16 have undertaken some research which helps to explain this finding. Tabulations of the ACS data not included here show that Rockland has relatively more persons that checked the "other race" or "multiracial" checkbox and wrote in Caribbean which were coded black. Considering that Rockland is a large metropolitan area near New York City, it has been hypothesized that perhaps these persons are immigrants from the Caribbean. This is a likely hypothesis considering that the ACS does not have a specific race code to classify these persons. If this is in fact the case and these persons were classified as black they may inflate the number of blacks in the survey relative to the population estimate.
Table 3 shows the ACS "other" race category dissagregated for those indicating they were "multiracial" and the remainder of the category. In each county the percentage of the population in the "other" category is very small, ranging from 2.2% in Rockland to 1.1% in Brevard. Again it is important to review the allocation procedures applied to both the ACS results and the population estimates and consider their impact on these distributions.
As discussed above, the population estimates use as their base the MARS file. In this file the race statistics for all the possible race categories were collapsed into four categories (white; black; American Indian, Eskimo and Aleut; and, Asian and Pacific Islander.) Each person that indicated "other" race in the 1990 census was reassigned a race after taking into consideration their Hispanic origin. This procedure was designed so that based on the person's Hispanic origin (either Hispanic or non-Hispanic) they were reassigned the race of a person close to them. Ultimately this procedure reassigned these persons' race to match the race distribution of persons that reported their race, crossed by Hispanic origin. Assigning a person a race based on their closest neighbor is usually referred to as "hotdecking."
For weighting purposes only, the ACS data had to be collapsed into the four racial categories. The allocation procedure used to reassign persons indicating "other" race and "multiracial" was a little bit different in the ACS. Examination of some of the initial returns indicated that the percentage of the population indicating "other" and "multiracial" was minimal. But, these racial groups had to be reassigned to one of the four race groups to ensure data comparability, namely with the population estimates. For these reasons, all persons in the ACS "other" race category were reassigned to the white race category.
Table 4 shows the crosstabulation of race and Hispanic origin. The purpose of showing these data is to investigate the allocation procedure applied to the ACS results. In Brevard and Multnomah counties reassigning the other group to the white category instead of initiating a technique similar to the "hotdecking" procedure used in the MARS file would change the distribution little. The reason for this is that the percentage of the population that is white of either Hispanic or non-Hispanic origin is rather large.
The racial distribution indicates that Rockland county is slightly more diverse than the other counties. Blacks and Asian and Pacific Islanders represent a larger percentage of the racial distribution than in any other county. Additionally, Rockland had the largest percentage of individuals indicating "other" race compared to any other county, this may be further evidence of greater racial diversity. These findings may indicate that this allocation procedure is doing "injustice" to the apparent racial diversity in Rockland.
VII. CONCLUSIONS AND DISCUSSION
Implementation of the new American Community Survey along with recent advancements in the population estimates program have recently raised a significant number of challenges and issues for the Census Bureau. The purpose of this paper was to address a number of these challenges and issues through investigation of methodology and analysis of actual data. In the end, the goal of this paper is to discuss the crossroads between the ACS and the population estimates program as a context for suggestions about alternatives and enhancements for their integration.
Integration of the ACS and population estimates stems from the Census Bureau's tradition of applying independently derived population estimates as controls to sample survey data. In this case, the ACS results, in the last stage of weighting, were controlled to equal the population estimates. This procedures relies on the notion that population estimates are the best source of "truth" about population counts during intercensal periods. With this assumption in mind, one aim of this paper was to examine actual results from the 1996 ACS test sites compared to 1996 population estimates to draw conclusions about the need for weighting.
Results for the total population indicate that weighting is needed across all counties to increase the ACS results (after household weighting) to equal the population estimates. Some explanations for the survey bias relate to the accuracy of the MAF and the undercoverage of persons. The accuracy of the MAF is open to question. Since the MAF is constructed from a number of sources including information from the U.S. Postal Service (USPS), the 1990 Census Address Control File (ACF), and the Topologically Integrated Geographic Encoding and Referencing (TIGER) files the quality of the MAF is only as good as the weakest link. Response bias is another possible explanation. Although the housing unit may be included in the MAF and be sent a questionnaire we could miss some persons in the housing unit. Not everyone will complete or be included on the returned questionnaire.
The accuracy of the population estimates is another source of possible differences. Since the analysis thus far is based on the idea that the population estimates represent "truth" the idea that the quality of the estimate may vary by locale has not be raised. Taken together, it is believed that these explanations explain why the ACS results need weighting. However, the population estimates are subject to error. Results from a 1990 evaluation show that at the end of a ten year estimation period there is an average error of 3.6% in the county population estimates (see Davis).
Comparison of the ACS results and the population estimates with age, sex, race and Hispanic origin detail provide insight to weighting issues by characteristics. This analysis is essential to ascertain whether the degree of weighting needed in the ACS data differs by any of the characteristics. Data with age and sex detail show that common nonresponse bias, as well as residence and temporal differences applying to the population sixty and over, account for the largest amount of ACS/population estimate discrepancy.
Turning to the race and Hispanic origin characteristics, unusual patterns are found for blacks. In Brevard and Multnomah counties the undercoverage of blacks is consistent with previous research. Rockland county shows the opposite; the ACS results show an overcoverage of blacks compared to the population estimate. This finding is related to differences in ACS and population estimate racial definitions. This outcome may suggest the need for definition comparability between the two.
The overrepresentation of blacks according to the ACS in Rockland county introduces another issue. It is hypothesized that the blacks overrepsented in the ACS are actually black immigrants from the Caribbean. As a sample survey the ACS would capture these persons as long as they immigrated into Rockland county over the last twelve month period. The population estimate would capture these persons slightly differently due to its methodology. Remember that county population estimates by age, sex, race, and Hispanic origin are developed with a ratio approach, calibrating the initial 1990 age, sex, race distribution of the county to updated count tabulations and state estimates derived with components of change for age, sex, race, and Hispanic origin. Thus, the influx of immigrants from the Caribbean to Rockland county would affect the New York state estimates.
Information on migration flows at the state level are currently provided through a project which uses administrative records supplied by the Internal Revenue Service. State flows are dissagregated to show net domestic and net international migration by race and Hispanic origin. The Census Bureau is currently exploring whether this data source can be used to supplement the already existing county migration flows with race and Hispanic origin characteristics. In this case the ACS data was able to inform the population estimates about a migration flow with a unique impact for the black population. This is one example of how the ACS might enhance population estimates.
Also pertaining to Rockland county, another important discovery of this research concerns the allocation procedures used to reassign the race of persons which indicated themselves as "other" race or "multiracial" on their ACS questionnaire. Reassignment of race to one of the four racial categories (white; black; American Indian, Eskimo, Aleut; Asian or Pacific Islander) is necessary for comparability with the population estimates. As described above, the population estimates use as their base the MARS file. A "hotdecking" procedure reassigns persons to the race of another person close to them based on the Hispanic origin response they gave. Simply put, persons will be redistributed based on being either Hispanic or non-Hispanic according to the racial distribution which already exists. On the other hand, the ACS allocation procedure used to reassign persons indicating "other" race and "multiracial" was to put them all into the white category again based on their Hispanic origin response, this is done for weighting purposes. This procedure was designed based on the fact that the number of persons is extremely small, and that in all instances whites represent the majority category.
In Rockland, blacks and Asian and Pacific Islanders represent a larger percentage of the racial distribution than in any other county and the largest percentage of individuals indicating "other" race compared to any other county. While the number reassigned to white is small, the question arises concerning how much "injustice," if any, is done to the other racial groups; it is important to note that re-calculation is used for weighting purposes only. However, remember the ACS also collected data on the social characteristics of the people. If the number of persons reassigned to white became large, the social characteristics the survey reports may become distorted. The integration of the ACS and population estimate, operating from different methodologies, may suggest that instead of relying on the allocation procedure currently used in the ACS a "hotdecking" procedure similar to that used in the population estimates may be appropriate.
Overall, this paper has succeeded in highlighting crossroads between the ACS and population estimates as a result of their integration by the Census Bureau. Examining methodologies and weighting procedures, no major obstacles where found in the roadway which connect the two. As integral components of the Census Bureau's Continuous Measurement System both will continue to enhance one another as the ACS adds comparison site in 1999-2001 and moves to full implementation in 2003 and beyond.
However, the above analysis does indicate a few bumps in the roadway which connects the ACS and population estimates which the Census Bureau may look to reconcile in the future. The ability of the population estimates to reflect migration flows apparent in the American Community Survey data may indicate one way the ACS can enhance, at least the migration component, of the population estimates. The future application of research currently being conducted with administrative records such as the IRS data may smooth some of these bumps.
Finally, since the ACS implemented a race questions similar to that which will be used in the 2000 census, the results are a wealth of information. The above analysis suggests some bumps in the roadway due to the differential allocation of persons indicating "other" race and in the case of the ACS "multiracial". Currently, these differences appear to be minor, the percentage of persons reassigned to white in the ACS is most likely not large enough to distort the overall social characteristics by race. However, if the number of persons classifying themselves in this way were to grow this problem may expand from a bump to a major potthole. These issues of reconciliation by race are important and are a major topic for the preparation of population estimates during the next year.
Chakrabarty, Rameswar P. 1994. Coverage of the Current Population Survey Relative to the 1990 Census. Unpublished paper from the Statistical Research Division. U.S. Bureau of the Census.
Heaton, Tim, William Clifford and Glenn Fuiguitt. 1981. Temporal Shifts in the Determinants of Young and Elderly Migration in Nonmetropolitan Areas. Social Forces, 60(1) September:41-60
Hough, George and David A. Swanson. 1997. Toward an Assessment of Small Area CM Data: An Initial Comparison of 1996 ACS Returns with 1990 Long Form Returns for Tracts in the Portland Test Site. Paper presented in the invited session "The American Community Survey - Uses and Issues" at the Annual Meeting of the American Statistical Association, Anaheim, California.
Mueser, Peter and Philip Graves. 1995. Examining the Role of Economic Opportunity and Amenities in Explaining Population Redistribution. Journal of Urban Economics, 37:176-200.
Murdock, Steve, Banoo Parpia, Sean-Shong Hwang, and Rita Hamm. 1984. The Relative Effects of Economic and Noneconomic Factors on Age-Specific Migration, 1960-1980. Rural Sociology, 49(2):309-318.
Salvo, Joseph J. and Arun Peter Lobo. 1997. The American Community Survey: Nonresponse Follow-Up In the Rockland County Test Site. Paper presented at the Annual Meeting of the American Statistical Association, Anaheim, California.
|Table 2.||Comparison of 1996 ACS results1 and 1996 population estimates2 for Brevard County, FL; Multnomah County, OR; Rockland County, NY; Fulton County, PA: Five-year age groups by gender (34k)|
|Table 3.||Comparison of 1996 ACS results1 and 1996 population estimates2 for Brevard County, FL; Multnomah County, OR; Rockland County, NY; Fulton County, PA: Race (11k)|
|Table 4.||Comparison of 1996 ACS results1 and 1996 population estimates2 for Brevard County, FL; Multnomah County, OR; Rockland County, NY; Fulton County, PA: Race by Hispanic Origin (19k)|