New Papers, Code, and Findings for 2020 Census Data Products

October 24, 2024

October 24, 2024: The recent release of the Supplemental Demographic and Housing Characteristics File (S-DHC) marked the completion of the full slate of 2020 Census data product releases, but our efforts to help data users better understand and work with the data continue.

We’re also looking ahead to the 2030 Census data products. In the coming months, we plan to share more information about our 2030 Census research on disclosure avoidance, data product planning, data tools, and public engagement opportunities. 

In the meantime, below are the latest 2020 Census data resources:

Estimating Confidence Intervals

A new set of resources allows experienced users to estimate the statistical properties and disclosure avoidance-related uncertainty for 2020 Census statistics tabulated from the Privacy-Protected Microdata File. This methodology factors in the impact of the TopDown Algorithm’s post-processing steps. An earlier prototype of the dataset, using 2010 Census data as a proxy, is available on the Registry for Open Data on AWS. In this centralized repository, datasets are made publicly available via the Amazon Web Services (AWS) cloud platform.

New Scientific Papers on the Detailed DHC-A and Detailed DHC-B Algorithms

Two new papers describing the mathematical algorithms used to protect respondent confidentiality for the Detailed DHC-A (the SafeTab-P algorithm) and Detailed DHC-B (SafeTab-H algorithm) are now available online. SafeTab-P: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A) updates a previously released paper on the Census Bureau’s exploratory research into various algorithm design possibilities. The updated version reflects the final methodology and parameter settings as chosen by the Data Stewardship Executive Policy Committee and implemented for the official production and dissemination of the Detailed DHC-A.

SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B) describes the algorithm and settings used for the Detailed DHC-B data product.

An additional paper on the PHSafe algorithm used to protect the S-DHC will be released in the coming weeks.

Source Code for the Detailed DHC-A and Detailed DHC-B

A unique benefit of a differentially private system is that it can be transparent. It is designed to allow disclosure of the algorithmic code and parameters that underpin the random assignment of noise infusion.

The codes used to produce the Detailed DHC-A (released September 21, 2023) and Detailed DHC-B (released August 1, 2024) are now available through our GitHub page (See: Detailed DHC-A Code | Detailed DHC-B Code). These releases enable experienced data scientists to analyze the algorithms used to protect those data products.

Understanding of Disclosure Risk and Confidentiality Protection Evolves through Ongoing Research

One of the major advantages of the formal privacy framework used for the 2020 Disclosure Avoidance System (DAS) is its ability to provide mathematically provable guarantees for the “worst-case” disclosure risk of our publicly released statistical products. In the context of the 2020 DAS, that worst-case guarantee is represented by the privacy-loss budget (PLB) accounting parameter rho. The larger the value of rho, the higher the worst-case disclosure risk could be. Lower values of rho, by contrast, indicate stronger confidentiality guarantees.

The overall level of disclosure protection across all of the 2020 Census data products is represented by the combined rhos of each of the individual releases. Differences across the individual algorithmic frameworks used to protect each of these distinct 2020 Census data products, however, can lead to slightly different mathematical definitions of rho’s worst-case guarantee. As such, ensuring a common understanding of rho across these products requires conversion of the individual algorithmic frameworks’ parameters into a uniform measurement scale. 

As the mathematical research behind formal privacy’s confidentiality guarantees evolves, so too does our understanding of how to measure and understand disclosure risk and the parameter rho. Recent advances in this research have determined that the SafeTab-P algorithm used to protect confidentiality in the Detailed DHC-A product, published in September 2023, provided stronger guarantees than the initial conversion of the algorithm-specific privacy-loss parameter into its broader 2020 DAS equivalent suggested.

When the PLB for the Detailed DHC-A was set, and at the time of the product's release, the theoretical derivation of rho based on the parameters used for SafeTab-P to create the Detailed DHC-A was 19.776. Due to the improved privacy-loss accounting, we have been able to revise this value downwards to half of its previous value (9.888). This means that the Detailed DHC-A provided stronger worst-case confidentiality protections, for the same level of accuracy, than was previously thought.

This new finding has no impact on the published data or its underlying accuracy. It is only a reflection of a refined understanding of PLB measurement and conversion between the mathematical definitions used within different formally private algorithms.

To learn more about the PLB and how it was used to allocate protections across data results, refer to the PLB section of the Detailed DHC-A’s technical documentation. For a more technical discussion of the Detailed DHC-A’s PLB (that includes the mathematical proof behind this finding), refer to the new SafeTab-P paper. The new mathematical proof is captured in “Theorem 2.”

Page Last Revised - October 25, 2024

New Papers, Code, and Findings for 2020 Census Data Products

New Papers, Code, and Findings for 2020 Census Data Products

Estimating Confidence Intervals

New Scientific Papers on the Detailed DHC-A and Detailed DHC-B Algorithms

Source Code for the Detailed DHC-A and Detailed DHC-B

Understanding of Disclosure Risk and Confidentiality Protection Evolves through Ongoing Research

Understanding of Disclosure Risk and Confidentiality Protection Evolves through Ongoing Research