U.S. flag

An official website of the United States government

Skip Header


Census Bureau Releases 2020 Census DHC and Demographic Profile

May 25, 2023:  Today, the U.S. Census Bureau released the 2020 Census Demographic and Housing Characteristics File (DHC) and Demographic Profile. These data products provide the next round of data available from the 2020 Census — adding more detail to the population counts and basic demographic and housing statistics previously released for the purposes of congressional apportionment and legislative redistricting. See today’s news release for details. 

Today’s release reflects disclosure avoidance settings developed in close collaboration with the data user community beginning with the release of the baseline demonstration data in 2019. Data user analysis and feedback at each iterative stage of development has helped ensure the release of data products that balance the nation’s need for useful data and our legal obligation to protect the confidentiality of respondents.

New Metrics on 2020 Census Data

Today’s release also includes a new set of metrics to help data users assess the impact of the Disclosure Avoidance System (DAS) on the 2020 Census data.

With each iteration of 2020 DAS development, we’ve released metrics comparing differences between the published 2010 Census data and the comparable 2010 Census data protected by the 2020 DAS. 

This new set of metrics directly compares the 2020 Census DHC with the 2020 Census Edited File (CEF), which reflects the completed census data prior to application of disclosure avoidance. This provides data users with a direct assessment of the impact of disclosure avoidance on accuracy in the DHC. Two types of metrics are included in this release:

  • Mean absolute error (MAE): MAE gives insight on accuracy, showing the range where the data point falls on average.

  • Mean error (ME): ME gives insight on statistical bias. It shows whether the published data point tends to go higher or lower than the enumerated count and by how much.

Notable Findings from New Metrics Include:

The most notable finding is that the metrics based on running the 2020 DAS on 2010 data are an excellent proxy for assessing disclosure avoidance variability in the published 2020 data. For example, as shown in the first row of data in the Table 1, the average county’s published total population is within 2 persons of the enumerated total population – no matter whether the counts run through the DAS are from the 2010 Census demonstration data or the 2020 data. 

Table 1: Mean Absolute Error for Various Characteristics and Geographic Levels

Another notable finding is related to the mean error metric. Table 2 shows Mean Error for County Household Sizes. The first row of data shows that the published count of 1-person households is 0.14 lower on average than the enumerated count. The last row of data shows that the published count of 7-or-more person households is 0.06 higher on average than the enumerated count. Both of these metrics are very close to zero, which show the minimal amount of statistical bias present in these data.

Table 2: Mean Error for County Household Sizes

Note that the Census Bureau doesn’t allow negative numbers in published tables, and we control all totals to the state populations that were used for apportionment (which were not processed using the DAS). The bias for very small populations is usually in the positive direction (the published count is slightly larger than the enumerated count). The bias is usually in the negative direction for larger populations (the published number is slightly smaller than the enumerated count). This is similar to results from other national statistical offices, such as the United Kingdom’s Office of National Statistics, where the systems they use to add noise before releasing statistics also control counts to fixed population totals. 

Guidance on Using the Data

When using DHC and Demographic Profile data, the Census Bureau encourages data users to aggregate small populations and geographies to improve accuracy and diminish implausible results. For more information, see our recent blog, What to Expect: Disclosure Avoidance and the Demographic and Housing Characteristics File

The DHC and Demographic Profile are protected using the differentially private TopDown Algorithm (TDA), the same algorithm designed to protect confidentiality for the redistricting data.  The TDA was specifically designed to ensure that the impact of the added statistical noise necessary to protect confidentiality decreases relatively as smaller geographies are combined to form larger geographies. The same is true for small populations. For example, combining single-year age groups reduces noise. Learn more in Disclosure Avoidance and the 2020 Census: The TopDown Algorithm.

Visit the Decennial Technical Documentation page for detailed information about these data products, including geographic terms and concepts, definitions of subject characteristics, data collection and processing procedures, Hispanic origin and race codes, and more.

Upcoming Noisy Measurement File Releases 

Noisy Measurement Files (NMFs) are the intermediate output of the DAS TDA prior to the post-processing that ensured internal and hierarchical consistency. They give researchers and data scientists the opportunity to independently process the files, complete analysis and conduct valuable assessments of the confidentiality protections. 

In April, the Census Bureau released the 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (NMF) as demonstration data (access via ICPSR and Harvard Dataverse). We will release the 2010 DHC NMF on June 15.  

We are also releasing NMFs for the 2020 Census. We plan to release the 2020 Census redistricting data NMF with the 2010 DHC NMF on June 15. We are targeting a Fall 2023 release for the 2020 Census DHC NMF.

We’ll provide additional information as plans are finalized.

Still to come: 2020 Census Privacy-Protected Microdata File (PPMF)

The Census Bureau also plans to release a 2020 Census Privacy-Protected Microdata File (PPMF), which replaces the former Public Use Microdata Sample (PUMS). The PPMF presents the data as rows of privacy-protected individual records (called microdata) rather than the aggregated totals found in a table-based format. Unlike the 2010 PUMS, which produced data based on a 10% sample of households, the PPMF produces privacy-protected data for the entire population and housing. We will announce specific timing and provide additional guidance once our planning is complete.

June 15 Webinar: Working with Noisy Measurement Files for the Redistricting and DHC Data Products 

We are hosting a webinar on Thursday, June 15, for researchers and others interested in accessing and using the NMFs to generate unbiased estimates and confidence intervals for the effect of disclosure avoidance on census data. Please save the date and log in information below:

Log-In Details: 

  • Date: June 15, 2023    
  • Time: 3:00 – 4:30 p.m. ET    
  • WebEx link    
  • Webinar/Access number (if needed): 2763 911 3394  
  • Webinar password (if needed): Census#1  

You can find recordings, transcripts, and slides for all disclosure avoidance webinars on the series webinar page.   

Page Last Revised - May 25, 2023
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header