JUNE 9, 2021 —The U.S. Census Bureau’s Data Stewardship Executive Policy Committee (DSEP) announced it has selected the settings and parameters for the Disclosure Avoidance System (DAS) for the 2020 Census redistricting data (PL-94-171). The DAS uses a mathematical algorithm to ensure that the privacy of individuals is sufficiently protected while maintaining high levels of accuracy in the statistics we produce.
The Census Bureau released the first “beta” version of the DAS in October 2019, and released further demonstration data products in May, September, and November 2020, and in April 2021. During this process, independent experts and stakeholders, along with data users, have provided extensive feedback to help shape each subsequent test product and to inform the decisions.
After reviewing feedback from the data user community regarding the April 2021 demonstration data, the committee approved a revised algorithm that makes notable improvements in the accuracy of the population counts for places, Minor Civil Divisions, American Indian and Alaska Native tribal areas, and for race and ethnicity statistics, and ensures the accuracy of data necessary for redistricting and Voting Rights Act enforcement.
The approved DAS production settings reflect a total privacy-loss budget for the redistricting data product (represented by “ε,” the Greek letter “epsilon”) of ε=19.61, which includes ε=17.14 for the persons file and ε=2.47 for the housing unit data. The increased privacy-loss budget over the levels reflected in the April 2021 demonstration data— which will lead to lower noise infusion than that in the April 2021 demonstration data—was primarily allocated to the total population and race by ethnicity queries at the block group level and above.
Our Disclosure Avoidance team will use these parameters to prepare the TopDown Algorithm for final system integration testing in anticipation of the DAS application phase of our data processing and related quality assurance checks that will begin later this month. The data will be run and quality checked multiple times prior to release, which are yet further steps in the process that will culminate in the states receiving the final redistricting numbers by August 16.
“The decisions strike the best balance between the need to release detailed, usable statistics from the 2020 Census with our statutory responsibility to protect the privacy of individuals’ data,” said Ron Jarmin, acting director of the U.S. Census Bureau. “They were made after many years of research and candid feedback from data users and outside experts – whom we thank for their invaluable input.”
The 2020 DAS algorithm injects carefully calibrated statistical “noise” to obscure individual data responses. The 2010 and other recent censuses also injected statistical noise into the data, but in a less precise and more ad hoc manner, primarily using a data-swapping methodology. Recent research has confirmed that today’s superior computational technologies have rendered the methods used in 2010 and earlier censuses ineffective against reidentification attacks. The Census Bureau’s recent blog, Modernizing Privacy Protections for the 2020 Census: Next Steps, discusses the privacy challenges that led to the change.
The chosen global privacy-loss budget of ε= 19.61 is exponentially higher than the ε= 12.2 budget used in the April 2021 demonstration data. In making its decisions, DSEP gave significant consideration to the feedback we received from our data users who analyzed the April 2021 demonstration data. That feedback, and steps taken to address those comments, include the following:
These improvements – as well as other adjustments to the system – were then verified against a broad suite of accuracy measures to ensure that they successfully addressed the feedback we received. We are not able to satisfy all stakeholder feedback. For example, some data users recommended nearly perfect accuracy in block-level data, which we are unable to achieve because it would undermine the ability to implement a functional disclosure avoidance system. We are both legally and ethically bound to protect the privacy of the data provided by and on behalf of our respondents.
In September, the Census Bureau anticipates releasing a final set of demonstration data that applies the privacy-loss budget and settings from today’s decisions to the 2010 Census P.L. 94-171 redistricting data. Demonstration data allow data users to compare a DAS-protected version of 2010 Census results with the published 2010 Census results.
The Census Bureau will also release the DAS production code base. This is a benefit of this Census’ algorithm-based system—unlike the confidential swapping methods used in previous Censuses, the 2020 DAS algorithm allows this level of transparency without risking the exposure of protected data.
Details of the settings and technical parameters for the 2020 DAS will be shared in the coming weeks. Background information is available at census.gov.