Skip Main Navigation Skip To Navigation Content

1993
Research
Conference on
Undercounted
Ethnic Populations


May 5-7, 1993
Radisson Hotel
555 East Canal Street
Richmond, Virginia 23219

Proceedings

Graphic of people sitting around a table

U.S. Department of Commerce
Economics and Statistics Administration
Bureau of the Census
Washington, DC 20230


WHAT THE CENSUS BUREAU'S COVERAGE EVALUATION
PROGRAMS TELL US ABOUT
DIFFERENTIAL UNDERCOUNT

BY

HOWARD HOGAN AND GREGG ROBINSON1


1. INTRODUCTION

When one considers the enormous size and diversity of the United States population, the 1990 census achieved a remarkably low undercount. The national census count differed from the true population by less than two percent. Nor was this result an anomaly. As measured by the net undercount, census taking accuracy has shown an historic improvement from over 4 percent in 1950, to near 3 percent in 1960 and 1970 to a level of below 2 percent for the 1980 and 1990 censuses. Indeed, before World War II, undercounts of 6 and 7 percent were not uncommon.

However, underlying the steady improvement in the national average undercount is a persistent differential undercount. The undercount of Black Americans has been more than 3 percent higher than the national average for every census since World War II. The undercount of Black males has been 5 or more percentage points higher than the national undercount for these four censuses.

Other differentials in coverage have been shown to exists. Hispanic Americans, Native Americans and Asian and Pacific Islanders are all counted less well than non-Hispanic Whites. These differentials are related to differential undercounts by social and geographic groups.

Almost everything known about the size of undercount and the differential undercount comes from the Census Bureau's own program of coverage evaluation. The Census Bureau has historically used two approaches to measuring the undercount. One method uses birth and death records, immigration records and previous censuses to estimate the true population. This estimate is compared to the census count to measure the difference. This method is called Demographic Analysis. It is described in Section 2.

The Census Bureau also conducts special surveys to measure the undercount. A scientific sample of census blocks are reinterviewed independently of the census enumeration. The results of these interviews are checked against the census records on an individual basis to see who was missed and who was counted in error. The method the Census Bureau uses is called the Post-Enumeration Survey, and is described in Section 3.

It is useful at this point to define some of the concepts and terms we use to describe the undercount. We can define for any population group:

Consider, for example, the population 65 years and older. Of those truly 65 and over, some will be missed by the census. This error lowers the census count. On the other hand, some people, who died before Census Day may be included, counted in error. Other people who are actually 65 and over may be counted in two places, perhaps once at their winter home and once at their summer house. These erroneous inclusions tend to inflate the census count. The difference between those missed and those counted in error is called the net coverage error.

Net coverage error is not the only error which can cause the census count to differ from the true population. In our example, some people who actually are over 65 will be counted in the census, but tabulated as under 65. This will reduce the census count for those over 65 in exactly the same way as if the people were missed. However, some people who are actually under 65 may report their age as over 65. This error will inflate the census count for the group 65 and over. Errors in reporting characteristics for those who are rightfully included in the census, just not placed in the correct category, are called content error. The difference between those misclassified in and those misclassified out is called net content error.

The difference between the true population (P) and the census count (C) is called the net undercount (u). Traditionally, the net undercount is divided by the (estimated) true population and multiplied by 100 to form the percent net undercount, r:

  u = P - C (1)
  r = (u/P)*100 (2)

Of course, it is possible that the census count is larger than the true population. In this case there is a net overcount. Because of the way it is calculated, net overcounts are usually expressed an negative net undercounts. Even professional statisticians find this a bit confusing and annoying at times.

In the example above, the group was defined by age. The group could be defined by sex, by race, by ethnic group, by census block, city or state, or by any other characteristic measured by the census.

A few things should be clear. Unless no one (or, at least, very few people) were erroneously included, it is inaccurate to equate the net undercount with the number of persons missed by the census. The fact that the net undercount was less than two percent does not mean that the census included 98 percent of the people and missed less than two percent of the people.

Secondly, any person misclassified out of one group, is misclassified into another group. So, for the nation as a whole, there is no content error: net undercount equals net coverage error. Secondly, for groups defined with few classification errors, male and female offer good example, it is fair to ignore classification error. For other groups, farm workers may be one example, one must clearly take both coverage and content into account when discussing the undercount.

The differential undercount is defined in one of two ways. It is sometimes measured as the difference between the undercount for a group and the undercount for some other group or for all other groups. Thus, one may talk of the differential between Blacks and Non-blacks or between Blacks and, say, non-Hispanic Whites. Another way to measure the differential is as the difference between the undercount for the group and the net national undercount, including that group. For relatively small groups, for example, Native Americans or Asian and Pacific Islanders, the difference between the two measures is small. For relatively large groups, for example all males, the measures will produce strikingly different results. We use both approaches in this paper, but endeavor to make clear which one is used in any particular discussion.

The effects on the data of content error and coverage error are the same. But, they have important implications when discussing causes and thus solutions. Content error must be addressed by developing the right questions, the right categories, the right questionnaires, and the correct coding and imputation methods. Coverage error occurs, on the one hand, because whole occupied housing are missed, or people are missed within enumerated housing units, or because people are missed who are not attached to any housing unit. On the other hand, duplicate or fictitious occupied housing units, persons within housing units, or people not attached to housing units can be included in the census. Different solutions are needed.

The next section discusses one of the principal methods of measuring the undercount, Demographic analysis.

2. DEMOGRAPHIC ANALYSIS

2.1. Description of the Method

In general, the demographic method of estimating coverage involves developing estimates for the population at the census date by the analysis of various types of demographic data essentially independent of the census, such as birth, death, and immigration statistics, as well as emigration estimates, and Medicare data. The difference between the estimated population (P) and the census count (C) measures the net census undercount, u, and net undercount rate, r (see equations 1 and 2). Demographic analysis represents a macro-level approach to measuring coverage, where analytic estimates of net undercount are derived by comparing aggregate sets of data or counts. This approach differs fundamentally from the PES, which represents a micro-level approach where estimates of coverage are based on case-by-care matching with census records for sample of the population.

2.1.1. Component Ages Groups

The particular analytic procedure used to estimate coverage nationally for the various demographic subgroups depends primarily on the nature and availability of the required demographic data. Two principle demographic techniques were used to produce the demographic analysis estimates for 1990, one for the population under age 65 and another for the population 65 and over.

(1) Ages under 65. The demographic analysis estimates for the population below age 65 in 1990 are based on the compilation of historical estimates of the components of population change: births (B), deaths (D), immigration (I), and emigration (E). Presuming that the components are accurately measured, the population estimates (P) are derived by the basic demographic accounting equation applied to each cohort:

  P0-64 = B - D + I - E (3)

The actual calculations are carried out for single-year age cohorts. For example, the estimate of the population age 40 on April 1, 1990 is based on births from April 1949 to March 1950 (adjusted for underregistration), reduced by deaths to the cohort in each year between 1950 and 1990, and incremented by estimated immigration and emigration of the cohort over the 40-year period.

(2) Age 65 and Over. Administrative data on aggregate Medicare enrollments were used to estimate the population age 65 and over (P) in 1990,

  P65+ = M + m , (4)

where M is the aggregate Medicare enrollment and m is the estimate of underenrollment. Although Medicare enrollment is generally presumed to be quite complete, adjustments to the basic data must be used to account for groups known or suspected to be omitted.

2.1.2. Development of Historical Estimates for Multiple Censuses

The foundation of the demographic method is the logical consistency and relation of the underlying demographic data. With the use of components of change (births, deaths, net immigration) the estimated population for a birth cohort can be carried forward through time to derive estimates of net undercount in a series of censuses as the cohort ages (e.g., age 0-4 in 1950, age 10-14 in 1960, 20-24 in 1970, 30-34 in 1980, and 40-44 in 1990). Similarly, an older cohort in 1990 based on Medicare data can be carried backward in time to derive estimates for the cohort at younger ages (e.g., 65-69 in 1990, 55-59 in 1980, 45-49 in 1970, 35-39 in 1960, etc.). In this way, consistent estimates of net undercount for 1940 to 1990 based on demographic analysis are produced.

These multiple series of net undercount estimates for cohorts across censuses are linked through the components of population change. This linkage of the estimates provide a consistent basis to judge changes in patterns of coverage over time and to assess the plausibility of the demographic estimates themselves (see Passel, 1991).

2.2.3. Limitations of the Demographic Estimates

The aggregate administrative data and estimates that are incorporated in equations 3 and 4 are corrected for various types of errors. Many assumptions go into this estimation process, some of which can be validated and some which cannot.

The overall accuracy of the demographic estimates depends on the quality of the demographic data and corrections. Research has been conducted in the past few years to develop methods for assessing the uncertainty of the demographic coverage estimates (see Robinson et al, 1991). This work demonstrates that the estimates of net undercount for particular race, sex, or age groups based on demographic analysis may be subject to considerable uncertainty for measuring the exact levels. But they are subject to less variability in terms of measuring differences in undercount according to age, sex, and race and measuring changes in net undercount between censuses. This greater confidence in statements describing differences in undercount between groups is important to make, because coverage differentials are the focus of this paper.

Finally, it should be noted that the principal demographic estimates for race, sex, and age groups measure net undercount in the census. They don't tell us about the separate effects of net coverage error (omissions, erroneous inclusions) or net content error.

2.2. Historical Trends, 1940-1990

Table 1 presents historically-consistent estimates of percent net undercount for the decennial censuses from 1940 to 1990. Two significant observations stand out. First, the demographic estimates document the long term decline in net census undercounts over the last 50 years. The net undercount in the 1990 census is estimated to have been under 2 percent, well below the estimated 5.4 percent in 1940. The estimated undercount has declined for both Blacks (from 8.4 in 1940 to 5.7 percent in 1990) and Nonblacks (5.0 to 1.3 percent). For all groups, the net undercount in 1990 was higher than in 1980, but below 1970 levels.

Table 1. Demographic Analysis Estimates of Percent
Net Undercount, by Race: 1940-1990

  1940 1950 1960 1970 1980 1990
Total 5.4 4.1 3.1 2.7 1.2 1.8
             
Black 8.4 7.5 6.6 6.5 4.5 5.7
Nonblack 5.0 3.8 2.7 2.2 0.8 1.3
             
Difference
(Black - Nonblack)
3.4 3.6 3.9 4.3 3.7 4.4

The second observation is that despite the overall declines in net undercount, the undercount rate of Blacks has remained persistently higher than the rate of Nonblacks in each census between 1940 and 1990. In fact, the excess of the net undercount rate of Blacks has hovered in the range of 3.4 to 4.4 percentage points over the last six censuses (see last row of above table). Another way to view the differential undercount is in terms of amount of net undercount. Although the Black population comprises less than 13 percent of the total population, it accounts for almost 40 percent of the total net undercount. As we will show, the differential undercount is most acute for Black adult men and Black children.

2.3. Age, Sex, and Race Patterns of Undercount

Appendix Table 1 and Figure 1 display the detailed estimates of percent net undercount for race, sex, and age groups in the 1990 census. Figure 2 compares the 1990 coverage patterns with estimates for 1980 and 1970 (Black males only). In terms of level of percent undercount, the most notable pattern is the consistently high levels of undercount for Black men between the ages of 25 and 64, where the estimated net undercount has ranged between 10 and 15 percent. For both black males and females, the undercount rates for ages 0-4 and 5-9 are relatively high - about 8 percent - though not as high as the rates for Black adult men. In contrast to high the net undercount estimates for Black adult men are the relative low undercount rates of Black adult women. In fact, a net "overcount" of almost 8 percent is measured for Black women aged 65-69. This reflects a case where net content error (persons of other ages reporting into age 65-69) has as much an effect as coverage error on the observed net undercount.

For Nonblack males and females, the net coverage patterns exhibit relatively low levels of net undercount (Figure1). In fact, the percent net undercount estimates for Nonblack females straddle the zero undercount line for most age groups.

In considering the estimates of net coverage for race, sex, and age groups that have been discussed, attention must be given to the fact that the demographic estimates are really approximations of the exact level of net undercount for a given group. There is considerable uncertainty in the detailed estimates by race, sex, and age (see Robinson et at, 1993). For Black age groups in particular, there is a wide range within which the "true" net undercount rate may fall. Nonetheless, it is clear from the alternative range of estimates for 1990 that the demographic estimates of percent undercount for Black adult males remain relatively high (over 6 percent) under any reasonable "uncertainty" assumption. With the exception of Black males and females under age 10, net undercounts consistently above even 2 percent are not found for any other race-sex-age group when the total range of uncertainty in the estimates is taken into account.

Table 2 gives a perspective of how the relatively high net undercounts of Black adult men and Black children contribute disproportionately to the well-known differential undercount of all Blacks. In 1990, the Black undercount of 5.7 percent was really a weighted average of two very different net undercount levels: the estimated 11.2 percent net undercount of Black adult men and 8.0 percent undercount of Black children, compared to 2.0 percent for other Black groups (males 10-19, 65+, females over age 9). Contrasted to the total undercount rate of 1.8 percent in 1990, the undercount of Black adult men and Black children was very disproportionate (9.4 and 6.2 percentage points higher, respectively), while the net undercount of other Blacks did not exhibit any appreciable differential.

Table 2. Demographic Analysis Estimates of Percent Net
Undercount, by Race, Sex, and Age: 1960-1990

Race, sex, age 1960 1970 1980 1990
Total 3.1 2.7 1.2 1.8
         
Black 6.6 6.5 4.5 5.7
Black males 20-64 13.4 13.1 11.3 11.2
Black 0-9 (male & female) 5.4 8.1 7.0 8.0
Other Black 4.4 2.9 0.4 2.0
         
Difference from total        
         
Black 3.5 3.8 3.3 3.9
Black males 20-64 10.3 10.4 10.1 9.4
Black 0-9 (male & female) 2.3 5.4 5.8 6.2
Other Black 1.3 0.2 -0.8 0.2

These dichotomous patterns of relatively "high" and "low" undercounts among Black subgroups in 1990 repeats the pattern from 1960-1980. Indeed, crude historical estimates of percent net undercount that extend back to 1880 demonstrate the intractable nature of the undercount for certain groups, especially adult Black men (Robinson and Hogan, 1990).

2.4. Why Demographic Analysis Cannot Produce Estimates for Hispanic, Asian or American Indians

The demographic analysis method provides a useful record of trends in undercoverage over time and differentials between specific subgroups (e.g., differences by age, males versus females, Blacks versus Nonblacks). The method is limited, however, in that it cannot tell us how coverage varies for other key demographic groups, such as Hispanics, Asians, or American Indians. The reasons are twofold.

First, rather long historical sets of administrative and demographic data on births, deaths, and international migration are needed (e.g., 1935-1990) to develop the demographic coverage estimates (see equation 3). These data sets essentially do not exist for groups other than Blacks, Whites, and "All other races" combined.

Second, the interpretation of any demographic estimates that can be derived for these groups can be affected by inconsistencies in the reporting of race/origin in the administrative data and in the census. For example, Passel (1992) produced apparent large net "overcounts" for American Indians in the 1980 census using demographic techniques; he attributes these spurious results to the presence of classification errors arising from the tendency of many persons reporting a race in the census (e.g., American Indian) that differed from their race assigned in the administrative birth data (e.g., White). In this case, the presence of net content errors are so large that they "swamp" the effect of net coverage errors. For Blacks and Nonblacks--the principal groups discussed in this paper--the data inconsistencies are not large enough to alter the interpretations regarding differential undercounts (though the exact magnitude of the differentials could be affected).

3. THE POST-ENUMERATION SURVEY

3.1. Overview of the PES Approach

The 1990 PES consisted of two parts. The first part was a sample of the population, known as the P sample. The proportion of the P sample that was included in the census is an estimate of the proportion of the total population that was included in the census. The second part consisted of a sample of the census enumerations used to estimate the proportion of erroneous census enumerations. This sample is known as the E sample. These enumerations were checked against the census itself to determine the extent of duplication. They were also checked in the field to determine the extent of fictitious enumerations, inclusions by the census of people born after the census reference day, and the extent to which people were counted in the wrong location.

3.1.1. Stratification

The population was divided in poststrata based on geography, race, origin, housing tenure, age and sex. The poststrata were based (roughly) on the following hierarchy:

Race (4)
Black, Non-Black Hispanic, Asian and Pacific Islander, and Non-Hispanic White and Other
Housing Tenure (2)
Owner, Non-owner
Unbanization (3)
Urbanized areas with population greater than 250,000

Other urbanized and urban areas

Rural

Region (4)
North East, South, Midwest, West

A separate group for American Indians on reservations was created.

Each poststrata group was divided by age and sex into estimation poststrata. Research suggested an age grouping: 0-17, 18-29, 30-49, 50 and over. Finally, there seemed to be no reason to calculate separate estimates for girls and boys, 0-17. Demographic analysis had never shown a sex difference for this group, and earlier PES estimates had shown little difference in undercount between these groups. Therefore, the final poststrata have only seven, giving a total of 357 cells for which direct estimates were made.

Because of operational and other constraints, the PES excluded some population groups. These included the population in institutions, those living on the street or in shelters, group quarters military, and the population in remote and rural Alaska.

3.1.2. The Dual System Model

The PES was based on the so-called dual-system model. The dual-system model used to estimate the true population classifies each person as being either included or not in the census enumeration, as well as being either included or not in the PES

CENSUS ENUMERATION
PES IN OUT TOTAL
In   N11 N12 N1+
Out N21 N22 N2+

Total N+1 N+2 N++

All cells are, in theory, observable except for those people missed by both systems(N22). The model assumes independence between inclusion in the census and the PES. If the PES is a representative sample of the total population, then the population of PES people who are census enumerated (N11 / N1+) should equal the population of all people who are enumerated ( N1+ / N++). Then by simple algebra, we can estimate

N++ = (N1+) (N1+) / N11

This is called the dual-system estimator (DSE).

In order to estimate the cells of the dual-system model, the PES conducted an independent listing of each sample block, an initial interview, an initial match to the census, a follow-up interview of problem cases, and a final match. The estimation steps included missing-data adjustment, weighting and dual-system estimation. These steps are discussed in detail in the Technical Appendix.

After computing the dual-system estimates for all poststrata, the estimated population can be compared to the census count.

3.2. Net Undercounts from 1990

The 1990 PES measured a net national undercount of 1.6 percent, which is somewhat lower than that measured by demographic analysis.

Table 3 gives the corrected results by race and tenure. The undercount for Non-Hispanic Whites and Others is relatively low (less than one percent) while the undercount for Blacks and Hispanics is relatively high (4.4 to 5.0 percent). The undercount rate estimated for Asians (2.4 percent) lies in between. A new finding based on the PES is that tenure is as important in explaining undercount as is race. For example, the 4.2 percentage point difference in the net undercount rates of renters (4.3) and owners (0.1) is of the same magnitude as the 3.7 percentage point difference in the undercount rate of Blacks (4.4) and Non-Hispanic Whites (0.7). This result, if it is supported by other research, has important implications for planning the next census. The spread between Asian or Hispanic owners and Asian and Hispanic renters may tend to be disproportionately recent immigrants while Asian or Hispanic owners may be drawn from more established communities. At this time, one can only speculate whether the difference between tenure groups is because of tenure itself (i.e., renters tend to move more often) or because owners and renters are drawn from different groups.

Table 3. Percent Undercount by Race/Ethnicity and Owner/Renter

(Estimates pertain to the PES universe)

  Total Owner Renter
Total 1.6 0.1 4.3
       
Non-Hispanic White 0.7 -0.3 3.1
Black 4.4 2.3 6.5
Hispanic 5.0 1.8 7.4
Asian 2.4 -1.5 7.0
Indian 12.2 n/a n/a

Appendix Table 2 gives the new undercount estimates for the poststrata groups, together with their estimated standard error. The patterns by race and tenure are evident there. There is also a regional pattern, with the undercounts for the West and South being somewhat higher than those for the Northeast and Midwest. The Non-Urban areas often have higher estimated undercounts than the urban areas, but they also often have high standard errors which make interpretation difficult.

A few cells are of particular interest. The estimates for non-Hispanic Whites in both large urban and other urban areas in the Northeast are negative, i.e., net overcounts of 2.1 (standard error = 1.1) and 1.1 (.5). These are on the margin of significance at the 5 percent level. These numbers are applied to very large groups which together comprise approximately 20 million people, and produced an estimated overcount of 376,000. Comparing these cells to nearby cells for other regions does not seem to show that these estimates are far out of line.

The PES presents some evidence that the differential undercount for Blacks and Hispanics is not new. The 1980 coverage evaluation program produced estimates of the undercount for Blacks and Hispanics. There were several estimates produced for each group based on a range of assumptions. The Census Bureau was quite concerned about the bias of the estimates. The twelve sets are based on different data and different assumptions in an attempt to show the sensitivity of the estimates to possible violations of assumptions. The Census bureau decided that none of these was "the best." However, we can look at the differential undercount implied for each set of estimates. Subtracting the estimated national undercount removes any uniform bias from the sets, but will leave any bias that is different between groups.

Appendix Table 3 gives the estimate differential undercounts for Hispanics as well as Blacks from the 1980 estimates. A differential similar to that found in 1990 is shown. The 1980 program allowed the production of only the most tentative measures of undercount for Asian and Pacific Islanders and for American Indians. The estimates were, in general, consistent to what was measured in 1990.

3.3. Gross Omissions

The PES was designed to measure the net undercount by group and to provide the data to adjust for that net undercount. It also provides data on the gross census errors: gross omissions and gross erroneous inclusions. However, one must take care in interpreting these data: some of the measures and concepts are appropriate only when considered in terms of the way they produce net estimates. In addition, all of these data are subject to sampling error, which for some groups and categories is quite large.

The PES estimates the proportion of the population not enumerated at their correct census day residence. Table 4 gives the distribution of nonmatches by category (for nonmovers).

Table 4. Types of Nonmatches: Percent of Total Persons

  Within
household
missed
Whole household missed Census
processing
error
Address
included
Address
missed
Total 1.8 2.5 1.3 .3
         
Non-Hispanic 1.3 1.9 1.3 .2
    White and other        
Black 4.3 5.9 1.3 .4
Hispanic 3.3 4.1 1.6 1.0
Asian & Pac. Islander 2.6 3.8 .6 .5

Several features are interesting. First, the PES nonmatches include a high proportion of within household omissions. The next feature is the high number of, missed households at enumerated addresses (col. 2). A missed household within enumerated addresses can happen different ways. The housing unit could be enumerated as vacant. Another family in the building may have been enumerated in place of the missed household, as sometimes happens in older buildings without clearly marked apartment numbers. The enumerator may have created a fictitious household as a replacement. Another way would be if the enumerator failed to get a complete interview, causing the family to either be imputed in the census or classified as "Unmatchable." Each of the last three ways would create an erroneous enumeration which would, to some extent, offset the omissions.

3.4. Gross Erroneous Enumerations

The revised PES data show some 14 million census erroneous enumerations, which together with 2 million census imputations are subtracted from the census counts before applying the dual-system estimator. How should one interpret this number? The next table gives the weighted distribution of erroneous enumerations by type. Some 28 percent are census duplicates. Under most definitions, these would be considered erroneous. About two and a half percent are estimated fictitious, again clearly erroneous.

Table 5. Measured Erroneous Enumeration by Type

  Percent
of total
Percent of
erroneous
enumerations
Total EE 5.8 100.0
     
    Duplicate 1.6 28.2
    Fictitious .2 2.6
    Geocoding error .3 6.0
    Other counting error 2.2 38.0
    Unmatchable 1.2 20.8
    Imputed EE's .3 4.5

The PES estimated that about 6 percent of the erroneous enumerations were people who were enumerated outside the search area, i.e. two or more blocks away. The block counts are clearly off, but if these persons were missed in the correct block (which we do not know), then as blocks are aggregated, the coverage errors cancel.

Most of the "Other Counting Errors" are enumerations of people who moved into the address after Census Day. If they were missed at the correct location, this may be the only place these people were enumerated. This type of error is often, but not always, paired with census omissions of the actual Census Day residents (and hence tends to cancel at aggregated geographic levels).

The "unmatchable" cases represent census enumerations without names. The PES required sufficient identifying information so that the person could be matched or followed up. Without this information, they were coded "Unmatchable." Many of these enumerations refer to real people who actually lived at the address, although others may be duplicates, fictitious, etc. The PES gives no direct information. Finally, the PES imputed roughly half a million erroneous enumerations. The imputation program only predicts a probability of the enumeration being erroneous. Summing these probabilities gives an estimate of the number, but no indication of the probable cause.

4. CONCLUSION

The demographic analysis approach and the PES method are derived from separate estimation paradigms. These methods differ, as do their data sources and assumptions. Neither measures the undercount perfectly. Each is subject to errors from a number of sources--errors which are different for each method. However, in spite of their different measurement approaches and error structures, the two methods tell essentially the same story with respect to the undercount in general and the differential undercount in particular.

Both methods point to a net national undercount of just under two percent. Both methods show an undercount for the Nonblack population of just over one percent. The PES puts the Black undercount at over 4 percent while the demographic analysis method puts it at almost 6 percent. Thus, while the methods differ as to the full size of the Black-Nonblack differential, both methods demonstrate its existence.

The demographic analysis method helps us focus on the role that age and sex plays in the undercount, especially for Blacks. Indeed, most of what demographic analysis measures is concentrated among Black children and adult Black males. Demographic analysis also demonstrates that the problem has existed for every census since 1880.

The PES helps us understand the role of social geography. It shows the Black undercount concentrated in larger urbanized areas and, to a lesser extent, in rural areas. Blacks living in rented housing units are especially poorly counted.

The PES also gives us estimates for other groups. It demonstrates level and patterns of Hispanic undercount similar to that for Blacks. It shows intermediate levels for Asian and Pacific Islander, and quite high levels for American Indians living on reservations and tribal lands. The PES also tends to show that these patterns also exited in the 1980 census.

Even for the population where the overall measured undercount is low--Non-Hispanic Whites--the PES shows some differentials. In particular, the undercount of Non-Hispanic Whites living in rental units is high relative to those who own their units.

The proof of the existence of a differential undercount is clear. Problems stand out in clear relief. What is not so clear is how this problem can be solved in the future. What changes can we make in the way the census is taken to lessen the differentials? The Census Bureau seeks the insight of scholars, local officials, and community leaders to aid us in this search.



1. Howard Hogan is currently Assistant Division Chief for Research and Methodology, Business Division and Gregg Robinson is Chief, Population Analysis and Evaluation Staff, Population Division, Bureau of the Census. The views expressed are attributed to the authors and do not necessarily reflect those of the Bureau of the Census.


TECHNICAL APPENDIX

ESTIMATION STEPS IN THE PES

The primary sampling unit for the 1990 PES was the block cluster composed of either a block or a collection of blocks. A scientific probability sample of 5290 block clusters was chosen. The same blocks were sampled for both the P sample and the E sample. The P sample consisted of all people living in the sample blocks at the time of the PES interview. The E sample consisted of all census enumerations coded to the sample blocks, whether or not they actually belonged there. The PES sample excluded people living in institutions (jails, nursing homes), military living in barracks or on ships, and people living in homeless shelters or on the street.

PES field work began before Census Day (April 1, 1990) when permanent Census Bureau staff visited each sample block to make a list of all housing units and group quarters. The PES household interviewing was scheduled to start in June. However, census non-response follow-up was still being conducted in many areas. Therefore, the PES interviewing had to be delayed. The end of interviewing as shifted accordingly. PES interviewing was complete in most areas by the end of July and finished everywhere by early September. Interviewing was conducted mainly by temporary employees who had worked on the census enumeration. In order to increase the independence of the PES from the census, they were not allowed to work in areas that they had previously enumerated.

For the purpose of the dual-system estimate, a person was considered enumerated by the census if his or her name was listed on a census record that was included as part of the count of the population. A person was considered omitted from the census if he or she should have been part of that count but was not. The matching rules classified persons as enumerated only if they were counted at the location where they should have been counted, according to the information they provided. For example, if people moved between April 1 and the end of census follow-up, they might be missed at their correct Census Day address but erroneously counted at their new address. The PES design would consider the people as missed by the census. The enumerations at the new address would be classified in the E sample as erroneous. In this example there would be both omissions and erroneous enumerations. If both addresses were in the same poststratum, the errors would tend to cancel.

An exact address match was not required. If a person reported that he lived at a given address, then the matching classified him as correctly enumerated if he was counted anywhere in the block where the address was located. It also classified him as correctly enumerated if he was counted in a ring of surrounding blocks. This ring of blocks whose census records were searched for a match was known as the search area. The search area was limited to one ring of adjacent blocks in urban areas or two rings in more rural areas. If a census operation coded the address outside the correct search area, the matching counted the person as missed by the census. Census enumerations that were outside the search area of the true location were classified as erroneous so that the overall estimate of net undercount will not be inflated.

Some cases lacked sufficient information to determine whether the person was enumerated. These cases were called "Unresolved" and were imputed. Examples of P-sample unresolved cases are records without names or interviews where the Census Day address is not reported.

The E sample measures the proportion of erroneous census enumerations. The design considers an enumeration as correct if it is determined not to be a duplicate and if, according to the information provided, the person should have been counted either in the sample block or in one of the surrounding blocks that make up the search area. Erroneous enumerations include: census duplicates, census fictitious enumerations, people who were born after Census Day or died before Census Day, people counted in the wrong location, and census enumerations with insufficient information to allow both matching and follow-up reinterview.

An important category of erroneous enumerations were people who moved from outside the search area into the sample block after Census Day and were subsequently counted there in the census. All such people were considered to be erroneously enumerated. However, under the search area concept, if they merely moved from one address within the search area to another, they were to be considered correctly enumerated so long as they were counted only once.

Dual-system estimates were made for each of the 357 poststrata, assuming independence of inclusion in the census and PES. Note that in the dual-system model, the marginal total, N+1 , is the number of distinct and identifiable people in the census. This differs from the official census count which includes duplicates, fictitious cases, and other erroneous inclusions as well as imputations. The proportion of census data-defined cases that are erroneous is measured by the E sample.

formula

= formula

where

formula = Dual-system estimate of population
formula = Weighted P-sample total (= N1+)
formula = Census count
formula = Number of whole persons census imputations
formula = Weighted estimate of E-sample erroneous enumerations
formula = Weighted E-sample total
formula = Weighted estimate of P-sample matches (= formula)

Note

formula = formula

The estimated total population formula is then compared to the census count to compute the net coverage error.


REFERENCES

Hogan, Howard (1993) "The 1990 Post-Enumeration Survey: Operations and Results," Journal of the American Statistical Association, Vol. 88, No. 423, pp. 1047-1060.

Passel, Jeffrey S. (1992) "Beyond Demography: The Growing American Indian Population, 1960-1990," paper presented at the American Statistical Association, Boston.

_____________ (1991) "Age-Period-Cohort Analysis of Census Undercount Rates for Race-Sex Groups, 1940-1980: Implications for the Method of Demographic Analysis," Proceedings of the Social Statistics Section, American Statistical Association, pp. 326-331.

Robinson, J. Gregory; Ahmed, Bashir; Das Gupta, Prithwis; and Woodrow, Karen A. (1993) "Estimation of Population Coverage in the 1990 United States Census Based on Demographic Analysis", Journal of the American Statistical Association, Vol. 88, No. 423, pp. 1061-1077.

Robinson, J. Gregory and Hogan, Howard (1990) "Differential Coverage in the United States Census of Population: An Historical Review," Proceedings of Statistics Canada Symposium 90, pp. 67-78.


APPENDIX TABLE 1


Amount and Percent Net Undercount, by Age, Sex, and Race: 1990

(Numbers in thousands. Base of percents is estimated population. A minus sign denotes a net overcount)

Age Black Nonblack
Male Female Male Female
Amount Percent Amount Percent Amount Percent Amount Percent
All
ages
1,338 8.5 498 3.0 2,142 2.0 706 0.6
                 
0-4 140 8.6 129 8.2 224 2.7 224 2.8
5-9 114 7.7 108 7.5 216 2.7 218 2.8
10-14 57 4.1 55 4.0 39 0.5 53 0.7
15-19 -2 -0.2 6 0.4 -176 -2.3 -120 -1.7
20-24 78 5.7 34 2.5 -66 -0.8 -50 -0.6
                 
25-29 192 12.7 75 4.9 444 4.5 199 2.1
30-34 207 14.0 52 3.5 380 3.8 67 0.7
35-39 148 11.9 29 2.2 231 2.6 18 0.2
40-44 103 10.6 15 1.5 113 1.4 -54 -0.7
45-49 87 11.9 19 2.4 169 2.7 40 0.6
50-54 72 12.0 10 1.6 143 2.8 22 0.4
                 
55-59 63 12.1 0 0.0 140 3.0 17 0.4
60-64 48 10.3 -15 -2.9 120 2.6 4 0.1
65-69 8 2.1 -36 -7.7 95 2.2 -48 -0.9
70-74 11 4.1 -14 -3.8 60 1.9 -4 -0.1
75+ 11 3.2 30 4.4 7 0.2 120 1.5


APPENDIX TABLE 2

ESTIMATES FOR REVISED POST-STRATA GROUPS


  PERCENT UNDERCOUNT STANDARD ERRORS
ALL NE S MW W ALL NE S MW W
Non-Hispanic                    
  White & Other                    
    Owner                    
        Large Urbanized
          Areas
  -2.13 0.68 -0.26 -0.34   1.08 0.71 0.39 0.65
        Other Urban   1.08 0.52 -0.10 0.62   0.49 0.42 0.40 0.58
        Non-Urban   -0.54 0.18 -0.71 0.29   0.70 0.69 1.18 0.69
    Non-owner                    
        Large Urbanized
          Areas
  1.16 2.56 2.33 3.18   1.39 1.48 1.61 1.62
        Other Urban   3.41 3.20 1.23 4.49   1.51 1.74 1.09 1.34
        Non-Urban   6.52 6.23 2.85 6.08   4.20 1.71 1.51 1.81
  Black                    
    Owner                    
        Large Urbanized
          Areas
  1.63 2.16 0.81 6.10   1.91 0.90 0.87 1.91
        Other Urban 1.34         0.98        
        Non-Urban 3.52         1.90        
    Non-owner                    
        Large Urbanized
          Areas
  8.37 6.27 5.99 9.96   1.61 1.90 1.68 2.72
        Other Urban 4.15         1.18        
        Non-Urban 4.62         5.33        
Non-Black Hispanic                    
    Owner                    
        Large Urbanized
          Areas
  0.67 2.53 -4.33 2.89   4.45 0.90 2.58 0.87
        Other Urban 0.94         1.64        
        Non-Urban 2.73         2.69        
    Non-owner                    
        Large Urbanized
          Areas
  6.72 9.34 6.64 5.91   3.51 2.59 3.26 1.84
        Other Urban 6.60         2.74        
        Non-Urban 15.80         5.01        
                     
Asian & Pacific Islander                    
    Owner -1.45         1.50        
    Non-owner 6.96         2.52        
                     
Reservation Indians 12.22         4.73        


APPENDIX TABLE 3

Estimated Differential Undercount as Estimated
by 1980 Post-Enumeration Program

(Differential undercount represent difference between the percent net
undercount estimate for the group from the estimate for the total population)

Estimates set Black Non-Black
Hispanic
2-8 5.0 3.6
3-8 4.7 3.5
2-9 5.8 4.3
3-9 5.5 4.2
14-9 2.8 1.7
2-20 5.9 4.2
3-20 5.7 4.2
14-20 3.0 1.7
5-8 2.8 4.9
10-8 2.5 3.4
5-9 3.6 5.7
14-8 2.1 1.0*
Approx s.e    
  (Sets 2,3,14) 0.6 0.8
  (Sets 5,10) 0.6 1.0

* Not significant at 10% confidence level.
See Fay, Passel and Robinson (1988) for a description of each set of estimates


Figure 1.  Percent Net Undercount: 1990 (by Race, Sex, and Age)


Figure 2. Percent Net Undercount Black Males: 1990, 1980, 1970

Authors: Howard Hogan & Gregg Robinson (Population Division)
Created: July 3, 2000