Estimated reading time: 14 minutes
One of the primary methods of evaluating the quality of a census is comparing the results to other population benchmarks. The U.S. Census Bureau has used two key population benchmarks to assess the quality of the 2020 Census results prior to release: the 2020 Demographic Analysis (DA) and the Vintage 2020 Population and Housing Unit Estimates, and we’ve made this information publicly available.
We know that many data users want to know how accurately the 2020 Census counted certain population groups, but it’s still too early to fully answer that question. We will get a better answer to those questions after we release results for the Post-Enumeration Survey (PES), along with more detailed DA coverage results.
In this blog, we explain which population comparisons are currently possible, and provide guidance on how to interpret differences between demographic benchmarks and the 2020 Census results.
While the estimates now available offer views on the quality of the 2020 Census, there are limitations to the information they provide and the comparisons that can currently be made. Therefore, data users should exercise caution in how they interpret differences.
As a federal statistical agency, we are committed to data quality and to being transparent about that quality. In other recent blogs, we’ve summarized additional ways that we are measuring the quality of the 2020 Census.
Many people want to evaluate how well specific race groups were counted in the 2020 Census by comparing the race data from the 2020 Census to the DA results or the population estimates. We caution against making these comparisons right now because the available benchmarks do not have the same race categories as the census.
The 2020 Census, like all recent censuses, included an option for people to identify as Some Other Race. However, the Some Other Race category is not included in the administrative records used to produce the DA and population estimates.
To assist with making this comparison, the Census Bureau will reconcile the race categories from the census with those that appear in data from the administrative records that are used to produce the DA and population estimates. From this, we will create a file, called the 2020 modified race file, that will allow for comparisons between the 2020 Census results and the DA or population estimates (the files showing modified race data that we created following the 2010 and 2000 Censuses are available on the Modified Race Data page).
To create the modified race file, we use a statistical process to reclassify the Some Other Race values from the census into the five racial categories included in the 1997 Office of Management and Budget’s standards, either alone or in combination with another race category.
The modified race file will include the following variables:
The modified race file requires demographic characteristics that are not yet ready from the 2020 Census. We tentatively plan to release these characteristics in the Demographic and Housing Characteristics File (DHC) in 2022. Once the DHC is available, we plan to produce and release the 2020 Census modified race file.
With this additional demographic information and the modified race file, we will be able to make more accurate comparisons, resulting in a better picture of 2020 Census quality for certain groups.
Though comparisons by race are not currently possible, there are other comparisons that can be made. We walk through these in the remaining sections of this blog.
DA is one of two programs that the Census Bureau uses to assess the accuracy of the census by estimating coverage error. Coverage error occurs when certain population groups are overcounted (people were counted more than once) or undercounted (people were missed) in the census.
The other coverage check is based on results from the Post-Enumeration Survey, whose first results are planned for release early next year.
For the 2020 DA, we produced national-level estimates of the U.S. population as of April 1, 2020, using current and historical vital records, data on international migration, Medicare enrollment records, and other data. The DA estimates are produced by age, sex, broad race categories, and Hispanic origin.
Hispanic origin information was first available on the birth and death records for all states in 1990. Therefore, the Hispanic DA estimates are only available for the population born after 1990 or the population ages 0 to 29 on April 1, 2020.
We produced a range of estimates — low, middle and high — to account for uncertainty in the input data and methods used to produce the final DA estimates. The range was developed by varying the assumptions we used to produce the population components. Each of the series is a reasonable demographic estimate for the population.
The DA estimates are completely independent of the 2020 Census. In fact, we released the estimates on December 15, 2020, while the first census results were not released until April 26, 2021. Now that some census results on age and Hispanic origin are available through the 2020 Census Redistricting Data Summary Files, we can compare them to the 2020 DA estimates.
At this point, there are four groups from the 2020 Census counts that we can evaluate using the DA estimates:
We can make this comparison now because we have comparable data available for each of these groups. As a reminder from the section above, we don’t have comparable data available yet to evaluate the counts for different racial groups — either in total or by age.
Table 1 reports the 2020 Census count and 2020 DA estimates for these populations. Except for the population under 18 years of age, the census count was within the DA range of estimates.
The DA results can be compared to the census counts to produce estimates of net coverage error. Net coverage error combines both undercounts and overcounts for the same population. This means that if a population had a large undercount and an equally large overcount, we would show that as a net coverage error of zero. However, groups that are consistently undercounted in the census usually do not have large overcounts.
Net coverage error is calculated using the following equation:
Net coverage error reports the difference between the census counts and the DA estimates as a percentage of the DA estimate.
Again, this measure includes both overcounts and undercounts for the same population, so a –5 percent net coverage error wouldn’t mean that the 2020 Census missed 5 percent of the population, but that the final census count was 5 percent lower than the DA estimate. (Along with net over- and undercounts, the Post-Enumeration Survey will estimate the proportion of people and housing units potentially missed or counted erroneously in the census.)
Table 2 reports the DA net coverage errors for the 2020 Census results that have already been released.
When we release the DHC next year, it will contain even more demographic detail from the 2020 Census than has already been released. Using the DHC and the modified race file, we will be able to produce the DA net coverage estimates for 5-year age groups by sex and the DA race categories:
The available DA race categories are limited because DA estimates rely on historical records and measures of race that have changed over time.
The additional age detail in the DHC file will also allow us to calculate the net coverage error for young children. We look forward to examining and sharing those 2020 Census net coverage error results with the public. We plan to release a report on the DA net coverage error estimates when they become available next year.
The Census Bureau also produces annual population estimates for the nation, states, counties, cities and towns; and estimates of housing units for the nation, states and counties. The population estimates are referred to as postcensal estimates because they use the most recent census results as a base, and account for demographic change to that base during the decade. The Vintage 2020 population estimates use the 2010 Census as their base.
The official population estimates are used each year to allocate billions of dollars in federal funding, as controls for household surveys collected by the Census Bureau and other federal agencies, and as denominators for vital rates and other population-based statistics.
Increasingly, data users are comparing the population estimates to the results of the 2020 Census to try and understand the quality of the census results. While the annual population estimates can help us understand demographic trends in the 2020 Census data, we advise data users to be cautious when making comparisons.
There are three sources of potential error that may explain the differences between the population estimates and the 2020 Census results:
When data users compare the population estimates to the 2020 Census counts, it is important to remember that differences could be caused by all three sources of error and not just data quality issues in the 2020 Census.
Figure 1 shows a county map with percent differences for the total population under 18 years of age between the 2020 Census and the Vintage 2020 population estimates. For the counties shaded blue, the census was higher than the estimates. For the counties shaded orange, the census counts were lower than the estimates.
The map illustrates some regional patterns. For example, the 2020 Census counts for the population under the age of 18 were higher than the estimates in much of the Northeast and upper Midwest, while the census counts for this population were lower than the estimates in many counties in the South, Southwest and Great Plains.
Figure 2 shows a county map with percent differences for the Hispanic or Latino population under 18 years of age between the 2020 Census and the Vintage 2020 population estimates. We use the same scale as in Figure 1 to allow comparisons between the maps.
The regional patterns are similar to the previous map, but we see more counties with differences exceeding +/- 10 percent between the 2020 Census counts and the population estimates. In general, the percent differences between the census and the population estimates tend to be larger when focusing on specific characteristic groups. We see this when comparing the map of total youth to the map of Hispanic or Latino youth.
Some of the extreme values are in areas with lower Hispanic or Latino populations. For these, relatively small numeric differences can turn into large percent differences because the population used as the denominator to calculate the percent difference is small. To overcome this small population problem, we encourage data users to look at both numeric and percent differences when comparing the 2020 Census to the Vintage 2020 population estimates, and to use their own judgment based on knowledge about historical trends or local conditions when assessing the quality of census data for small population groups.
Similar comparisons could be made between the population estimates and the census for the population ages 18 years and older, by Hispanic origin and at different levels of geography. For all of these, we advise data users to be cautious when interpreting the results. Differences between the two are not necessarily caused by errors in the census. As we noted, there are three potential sources of error that that can cause differences between the census and the estimates. The Census Bureau will carefully examine these differences and explore possible explanations as part of the Estimates Evaluation project, which will inform research and methodological improvements for the estimates over the course of the decade.
As mentioned above, the Census Bureau will also use the PES to estimate coverage error in the 2020 Census. For this approach, we survey a small number of households shortly after the census data collection is completed and match the results to the census to create estimates of over- and undercounts.
The PES provides some additional information about coverage error in the census that DA does not. Both DA and the PES produce estimates of net coverage error, but PES also produces estimates of the components of coverage error including omissions and erroneous enumerations. In addition, the PES uses the same race categories as the 2020 Census, so they can produce coverage results for more detailed race groups and they do not need to use the modified race file.
Once complete, the PES will provide: (1) national net coverage estimates for specific race groups, and (2) state-level estimates that, along with the DA, will provide a more complete picture of coverage for the 2020 Census.
Data users may want to use survey estimates as demographic benchmarks when evaluating the 2020 Census results. The American Community Survey, Current Population Survey, Survey of Income and Program Participation and other household surveys collected by the Census Bureau are controlled to the population estimates to address coverage issues in the survey data.
The survey estimates will be very close to, if not the same as, the population estimates. Therefore, we advise caution when comparing survey estimates to the 2020 Census results for the same reasons previously discussed in this blog.
Both the 2020 DA estimates and the Vintage 2020 population estimates can be used as demographic benchmarks for evaluating certain aspects of the 2020 Census results. We have presented some of the comparisons that are possible using the data that are currently available to the public. The comparison of the Vintage 2020 population estimates and the 2020 Census should not be considered “coverage errors” in the census — just differences between the estimates and the census. DA is an official coverage measure and, although limited in scope, the net coverage errors reported above are on par with what we have seen in previous decades.
Data users should recognize that even though the files contain race data, variations in how race is coded limit evaluations of the 2020 Census using the DA and population estimates. We will not be able to evaluate race in the 2020 Census using the Vintage 2020 estimates or the 2020 DA until the 2020 modified race file is produced.
Further, for the comparisons that are currently possible, data users should be cautious when they interpret differences. Errors in the 2020 Census are only one possible source of difference between the census and the population estimates. Errors from the 2010 Census — the base for the Vintage 2020 estimates — and errors in the estimation process may also contribute to the differences. Similarly, for comparisons using DA, data users should remember that these are intended to measure net coverage error, which includes both over- and undercounts for the same population. Differences here can tell us that the final census count was either higher or lower than the DA estimate, not that the census missed people or counted some people more than once.
At this time, we can only produce DA net coverage error estimates for a limited number of population groups. More comparisons are possible using the Vintage 2020 estimates, but these too are limited by the detail of 2020 Census data currently available. We are looking forward to 2022 when we can release coverage estimates by more demographic detail and expand the range of comparisons against the population estimates.