Developing the DAS: Demonstration Data and Progress Metrics

Customizing protections for each data product is an iterative process that requires data user engagement and feedback. As we produce demonstration data and performance metrics during Disclosure Avoidance System (DAS) development, we’ll post that information here.

Demographic and Housing Characteristics File (DHC) Development

About DHC Development

The census data product formerly known as “Summary File 1” is now the Demographic and Housing Characteristics File (DHC). Please follow our newsletter for updates and to learn how to contribute to its development. Review the 2020 Census Data Product Planning Crosswalk to better understand the proposed changes to this data product. 

A subset of DHC tables were included in early iterations of DAS demonstration data. In the Redistricting Data section below, see:

  • 2010 Demonstration Data Products Baseline 2019-10-29
  • DAS Development Update 2020-05-27

Detailed Demographic and Housing Characteristics File (Detailed DHC)

About Detailed DHC Development

Improvements in the design, processing and coding of the 2020 Census allow the release of data for almost five times as many detailed race and ethnic groups than were possible in 2010.  Development of disclosure protections to accommodate this improvement are underway. See the “Detailed DHC” tabs in the 2020 Census Data Product Planning Crosswalk.​

2020 Census Redistricting Data (P.L. 94-171)

Beginning in October 2019, the Census Bureau released a series of demonstration data products that applied iterative development versions of the 2020 Census Disclosure Avoidance System (DAS) to published 2010 Census Data. The first two demonstration data sets focused simultaneously on both redistricting and Demographic and Housing Characteristics data (DHC, known in earlier censuses as Summary File 1, or “SF1”). In August 2020, pandemic-triggered operational delays required the Census Bureau to prioritize development focus on the redistricting data to attempt to meet the statutory data release deadline. Demonstration data from September 17, 2020, forward focused solely on the redistricting data. (See Development Timeline)

The Census Bureau produced Detailed Summary Metrics and Privacy-Protected Microdata Files (PPMFs) to assist with data user analysis. IPUMS NHGIS converted the PPMFs into tabular format for ease of use.  Data users evaluated each iteration and provided feedback that helped shaped the algorithm and settings throughout the development process. On June 8, 2021, the Census Bureau’s Data Stewardship Executive Policymaking Committee chose the final settings for production of the redistricting data. The data were released August 12, 2021.

Note that while the data in the Privacy-Protected Microdata files, the underlying untabulated microdata files used to generate the Detailed Summary Metrics, look like individual records, they are all privacy-protected through the application of differentially private statistical noise.

On June 8, 2021, The U.S. Census Bureau’s Data Stewardship Executive Policy Committee (DSEP) selected the settings and parameters for the Disclosure Avoidance System (DAS) for the 2020 Census redistricting data (PL-94-171).

This is the sixth and final set of Privacy-Protected Microdata Files (PPMFs) for the redistricting data that allow data users to compare the effect of the Disclosure Avoidance System settings on previously published 2010 Census data. These and previous PPMFs are only intended to demonstrate the redistricting data, not the Demographic and Housing Characteristics File (DHC) or other 2020 Census data products.

There are two sets of Privacy-Protected Microdata Files (PPMFs), record layouts, and Detailed Summary Metrics in this release:

  • One set with a global privacy-loss budget ("epsilon") of 12.2 (10.3 for persons and 1.9 for housing units, approximating the anticipated final PLB level).  In the FTP directory, those files include “12-2” in the file names (e.g, ppmf_20210428_eps12-2_P.csv and ppmf_20210428_eps12-2_U.csv).
  • A second set with the global-privacy loss budget ("epsilon") of 4.5 (4.0 for persons and 0.5 for housing units, as used for prior demonstration data). In the FTP directory, those files include “4-5” in the file names (e.g, ppmf_20210428_eps4-5_P.csv and ppmf_20210428_eps4-5_U.csv).

We encourage data users to closely analyze this demonstration data. Feedback received by May 28, 2021, will be considered. Email feedback to:; include “April PPMF” in the subject line.

Particularly useful feedback would describe:

  • Fitness-for-use: Based on your analysis, would the data needed for your applications (redistricting, Voting Rights Act analysis, estimates, projections, funding data sets, etc.) be satisfactory?
    • How did you come to that conclusion?
    • If your analysis found the data to be unsatisfactory, how incrementally would accuracy need to change to improve the use of the data for your required or programmatic use case(s)?
    • Have you identified any improbable results in the data that would be helpful for us to understand?"
  • Privacy: Do the proposed products present any confidentiality concerns that we should address in the DAS?
  • Improvements: Are there improvements you’ve identified that you want to make sure we retain in the final design? Be specific about the geography and error metric for the proposed improvement.

