The release of the first 2020 Census counts is a major and much-anticipated milestone for the U.S. Census Bureau. Yet this initial glimpse at the results of the 2020 Census also inspires thoughtful questions about data quality.
In particular, how do we know these population counts are accurate? What evidence can the Census Bureau provide that these counts are high quality, especially when considering their many significant uses, including determining the representation of our nation’s people in the U.S. House of Representatives?
Data quality is multidimensional — it encompasses completeness, accuracy, reliability, reasonableness, validity, and much more — and so approaching it from multiple angles produces a more insightful and holistic picture of a dataset.
We use many techniques to evaluate the quality of the 2020 Census that fall within two major categories:
In this blog, we will look at just one of these methods: comparing the census counts to other measures of the population. Specifically, we will examine how the 2020 Census results compare to population benchmarks.
We provide even more detail on this topic in the report “A Preliminary Analysis of U.S. and State-Level Results From the 2020 Census” released today.
Comparing the 2020 Census results against other sources of data enables us to analyze differences. It’s possible the differences come from errors either in the census or in the other data sources. However, some differences are simply the result of different ways of collecting or generating the data.
The key is to determine if a difference is expected or plausible, and this conclusion will depend largely on what is different in the data and the specific benchmark under consideration.
We actually begin these comparisons during data processing. During data processing, we assemble a team of experts to analyze the data and examine the findings. With the aid of cutting-edge technology, they apply their extensive knowledge of population characteristics and trends to help us scrutinize differences and assess whether they are reasonable, or perhaps outliers warranting further investigation.
Now that we’ve finalized the first results, we’re sharing how the national and state-level totals compare with three benchmarks in particular:
First, let’s examine our nation’s historical population trends and how the 2020 Census results fit in with those trends.
As of April 1, 2020, the number of people living in the United States was 331,449,281. This is the 24th census count, which means we can look across time to see how much we’ve grown and how quickly we’ve done so.
For the purposes of this blog, we’re specifically looking at the U.S. resident population (referred to as “population”) which represents the total number of people living in the 50 states and the District of Columbia.
During our nation’s earlier years, growth from census to census occurred at a faster rate than over subsequent decades. (More information is available in Table 1 of the report.)
Our latest population total represents further growth of 22.7 million, or 7.4%. This is only slightly lower than what we saw from 2000 to 2010. Given the continuation of the demographic trends described above in conjunction with recent decreases in the level of net international migration, this is the type of growth population scientists would expect to see.
The second population benchmark we’ll examine is the population estimates.
Each year, the Census Bureau’s Population Estimates Program produces the official estimates of the population for many levels of geography, including the nation, states, counties, cities and towns. For the nation, states and counties, we also produce estimates by demographic characteristics: age, sex, race and Hispanic origin.
We typically develop the estimates beginning with the latest decennial census as a base and then using current data on births, deaths and migration to measure change to the population over time. Each year’s release includes a time series of data starting with the most recent census year and running up to the latest year of data available, also known as the “vintage year.”
For example, we created Vintage 2020 estimates, starting with the 2010 Census as the base and using data on births, deaths and migration to create estimates of the population for 2010 through 2020. This enables us to compare the April 1, 2020, estimates against the 2020 Census counts released today.
It is a longstanding practice to compare the results of the census against the official estimates of the population. However, because the Vintage 2020 estimates use the 2010 Census as a base, they are not independent of the census, which must be considered when using them as a benchmark.
More commonly, differences between the census and the estimates are used to evaluate the estimates methodology and inform improvements and research for the Population Estimates Program over the next decade.
Nevertheless, examining how close the census counts are to the estimates for April 1 can still be an informative exercise, particularly when we consider how the census and the estimates differ from one another over the decades.
In 2020, the difference between the April 1, 2020, population estimate and the census count was 2.1 million, or 0.6%. This is in comparison to the difference of 295,000 (0.1%) in 2010. Although the degree of difference is greater than last decade, it is still quite small.
We also see similarities when comparing differences across the decades at the state level. Whereas some states have greater differences for 2020 (e.g. New Jersey, Rhode Island, New York, and Vermont), others have estimates that are closer to the census in 2020 than they were in 2010 (e.g. Wyoming, Georgia, West Virginia, and Nevada). In 2020, the census count for Oregon was just 152 people lower than its April 1, 2020 estimate—a difference of only 0.004%, and the smallest numeric or percent difference between the census and estimate for any state in 2020 or 2010.
The overall range of percent difference is also very similar: in 2010, the range extended from -3.9% to 4.9%, and the percent differences between the estimates and the 2020 Census range from -3.3% to 4.5%. Whereas the majority of states in both 2010 and 2020 had census counts within 1% (positive or negative) of their population estimate, we see change at the region level: in 2010, the census counts for all four regions were within 0.5% of their population estimates, whereas in 2020, this was only true for the South and West.
We will take a deeper dive into these variances in much finer detail through a project known as Estimates Evaluation. We will use a range of statistical methods to determine whether specific elements of the estimates methodology — either how we measure the components of change or how we fit the components together to come up with an estimate of the population — are responsible for the differences.
Finally, let’s compare the census results with the 2020 Demographic Analysis (DA) estimates of the population on April 1, 2020.
The DA estimates are similar to the Vintage 2020 estimates in some ways, but they have a key difference: they are developed entirely from administrative and survey records. That is, they are not built from any census data in the way that the Vintage 2020 estimates use the 2010 Census as their base.
This independence makes them a particularly appropriate tool for evaluating the 2020 Census, and in fact, this is the purpose for which the DA estimates were designed.
The 2020 DA estimates are available by age, sex, and broad race and Hispanic origin groups at the national level only since the data and methods used to produce them limit the demographic and geographic detail of these estimates. The available detail still provides valuable insight into the quality of the census.
In particular, using the 2020 Census count of the national population released today, we can estimate net coverage error for the total population. (Later, we will also release estimates from the Post-Enumeration Survey on net coverage error and its components.)
Net coverage error represents the percent difference between the census count and DA estimate, and because it is a net measure, it simultaneously accounts for the influence of overcounts and undercounts. It is calculated as follows:
Net coverage error = (Census count - DA estimate)/DA estimate * 100
The 2020 DA included three separate series of estimates of the resident population on April 1, 2020: low, middle and high. This range of plausible estimates was developed by varying the assumptions about the population components used to produce the estimates so as to reflect any uncertainty in our data sources and methods.
The 2020 Census count of 331,449,281 falls between the low (330.7 million) and middle (332.6 million) estimates, with corresponding estimates of net coverage error for the total population of 0.22 and -0.35 respectively. This implies an overcount for the low series and an undercount for the middle series.
In future analyses, we will explore the relationship between the assumptions used to develop each DA series and the resulting estimates of net coverage error.
As a point of comparison, the 2010 Census count fell between the middle and high middle estimates. The net coverage error calculated based on the 2010 DA and the 2010 Census was 0.13 for the revised middle series (indicating a slight overcount) and -0.42 for the high middle series (indicating an undercount).
When additional results of the 2020 Census become available, we will also be able to calculate net coverage error and examine patterns of coverage across demographic groups within the 2020 Census and across multiple censuses over time. The results of these analyses will be released in a report expected in 2022.
From our analyses, we see that the national 2020 Census count is in line with the three sets of benchmarks we’ve examined today:
This analysis comparing 2020 Census results with other measures of the population represents just one of the many efforts underway to evaluate the quality of the 2020 Census.
In fact, today we are also releasing information that examines how we conducted the census through a variety of operational quality metrics. More information is available in the Examining Operational Quality Metrics blog.
The broad collection of quality assessments that have been and will be conducted will enable the public to draw conclusions about the accuracy of the 2020 Census counts and their fitness for use. They will also serve another important purpose by informing the planning for the 2030 Census — which has already begun!