U.S. flag

An official website of the United States government

Skip Header


Census Bureau Sets Key Parameters to Protect Privacy in 2020 Census DHC

The U.S. Census Bureau’s Data Stewardship Executive Policy Committee (DSEP) has selected the final settings and parameters for the Disclosure Avoidance System (DAS) for the 2020 Demographic and Housing Characteristics File (DHC), the next major release from the 2020 Census. Based on these settings, Census Bureau staff are now preparing the final data product for publication in May 2023. 

The DHC includes, unchanged, the six tables that were released in August 2021 as part of the final Redistricting Data (P.L. 94-171) Summary File. It also includes tables that support the Demographic Profiles and general statistical descriptions of the population and housing in the United States and the Commonwealth of Puerto Rico. 

Feedback-Driven Improvements 

The selections, approved October 20, 2022, reflect feedback from data users on a series of demonstration data products and accompanying metrics that applied iterative versions of the DAS to 2010 Census data. The final settings and parameters reflect notable improvements based on data user feedback from the most recent demonstration data released August 25, 2022. The feedback included recommendations from our federal advisory committees, from American Indian and Alaska Native tribal consultations as well as 11 comments received during the public feedback period that ended September 26. The comments, both in summary and full-text form, are available on the Disclosure Avoidance Modernization page.

Feedback-driven improvements include the following:

  • Improving overall accuracy, with a focus on single year of age and family type for all three types of school districts (elementary, secondary, and unified). The Census Bureau worked throughout DHC development to increase overall accuracy for these data, within the constraints of disclosure protections. 

    By adjusting the algorithm and selectively allocating small amounts of additional privacy-loss budget, the final, DSEP-approved settings and parameters achieved notable gains in some areas, such as age, relationship, tenure by race, and presence of own children under 6.

    Improvements in other areas, such as group quarters population by age, were limited because of parameters previously established for the redistricting data.

  • Improving accuracy for school-age children in elementary, secondary, and unified school districts and tabulation block groups. All types of school districts and tabulation blocks are not on the hierarchical “geographic spine." [1] Accuracy was improved by adding them to the list of geography types that the DAS spine optimization algorithm brings closer to the spine. This spine adjacency allows those geographies to benefit from the spine’s additional inherited accuracy.

  • Improving accuracy for cross-tabulated data on age for minors and relationship to householder at the county and tract levels. Data users cited a need for improved accuracy for these data, which we addressed through a combination of adjustments to the DAS algorithm and increases to the privacy-loss budget allocations for related query groups.

  • Improving accuracy for cross tabulations at the county and tract levels that include householder race, same-sex married and unmarried partners, or own children by age category. This was addressed by increasing the privacy-loss budget allocations of all three query groups. Additional changes to the algorithm improved accuracy further for “own children by age category” data.

  • Reducing inconsistencies between person and housing unit tables. Data users generally acknowledged that inconsistencies between the person and housing unit files are an intentional feature of uncertainty from the differential privacy mechanism in the DAS that is necessary to protect confidentiality. However, as with the feedback from the previous demonstration data product, several continued to express concerns about the number of inconsistencies, particularly regarding age data and data in the housing unit file.

The final settings reduce the overall number of inconsistencies through the improvements outlined above, which include housing unit table improvements. Greater accuracy for tables in the housing unit file not only improves the accuracy of those tables; it also reduces the number of person and unit file inconsistencies.

The chosen settings and parameters cannot satisfy every request for additional data accuracy and granularity, as meeting those additional requests could undermine the ability to implement a functional disclosure avoidance system. Each of the improvements above were selected after a careful analysis of the gains in accuracy the various options would achieve for the targeted queries, as well as any unintended negative impact they would have on other queries.

As with the Demonstration Data Products, DSEP also reviewed the effectiveness of the confidentiality protections after these modifications. Both the analytical and empirical effective confidentiality protections were comparable to those provided in the August demonstration data product.

Next Steps

The DAS team will now use the approved settings and parameters to begin standard data review, quality assurance, and tabulation procedures for the 2020 Census data. The final product slated for release in May 2023 will consist of more than 7.5 billion published statistics across more than 7 million individual geographies. About 86 percent of the statistics will be for the approximately 6.4 million census blocks.

As with the release of the Redistricting Data (P.L. 94-171) Summary File, we will release final metrics and demonstration data showing the impact of the production settings on 2010 Census data. We’ll release the metrics prior to the May release of 2020 Census data, and the demonstration data after. The metrics and production-settings demonstration data will allow data users to see improvements made after the release of the second demonstration product and results from the final settings of the DAS applied to the 2010 Census data. 

The Census Bureau will also release the DAS production code base. This is an advantage of an algorithm-based system. Unlike the confidential swapping methods used in previous censuses, the 2020 DAS algorithm allows this level of transparency without risking the exposure of protected data. We will provide more information about that release at a later date.

As always, please contact us via 2020DAS@census.gov with any questions. 

[1] Learn more about the “geographic spine” for census data in the handbook “Disclosure Avoidance for the 2020 Census: An Introduction.”

Page Last Revised - September 22, 2023
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header