Skip Header

Modernizing Disclosure Avoidance: What We've Learned, Where We Are Now

March 13, 2020
Written by: Dr. John M. Abowd, chief scientist and associate director for Research and Methodology and Dr. Victoria A. Velkoff, associate director for Demographic Programs

The Census Bureau is modernizing the way we will protect respondent privacy in 2020 Census data products. This blog, part of a series planned to keep data users informed of progress and developments, focuses on how we are addressing feedback received on the 2010 Demonstration Data Products.

In our last blog, we discussed the importance of ongoing engagement with our data users as we modernize the privacy protections for the 2020 Census. This month, we want to highlight some of the feedback we have received and how we are using it to improve the Disclosure Avoidance System (DAS) that will be used for the 2020 Census.

In October 2019, we released a set of demonstration data products using data from the 2010 Census that we ran through an interim version of the DAS. The reason: to demonstrate that our system can effectively protect privacy at the scale of the 2020 Census and to ensure that the data we release are of the same high quality that our data users have come to expect. In its current iteration, the DAS does very well at ensuring the data’s fitness-for-use for some important use cases but falls short in others. So, we know much more needs to be done to identify and improve those aspects of the data that are not yet up to par. Releasing these demo products allowed us to crowdsource that process.

At a December 2019 workshop hosted by the Committee on National Statistics (CNSTAT), data users from a wide variety of fields presented their own assessments of the fitness-for-use of the demonstration data products. Some of that feedback was good news. Quentin Brummet (NORC at the University of Chicago), for example, assessed the data’s likely impact on the design and conduct of demographic surveys that use census data as a sampling frame. He found that the “[e]ffects of DP noise on survey costs in the context of sample design are projected to be minimal,” while expressing concern over the possible effects on population benchmarks, including the controls in the American Community Survey.

Much of the feedback identified areas where the DAS still needs to be improved. Joe Salvo (New York City Department of City Planning) illustrated the data’s limitations for emergency planning operations. Randall Akee (University of California, Los Angeles) highlighted a number of problematic impacts of the demonstration products on data about populations living on American Indian reservations. David Van Riper (Minnesota Population Center) showed significant discrepancies in demographic data for cities and other entities that are not included in the DAS geographic hierarchy or “spine.” Nicholas Nagle (University of Tennessee) discussed the many direct uses of population counts in statutory funding formulas and ensuring fitness-for-use in such applications. Many other data users identified similar limitations and shortcomings. To put it succinctly, the resounding message was that this interim version of the DAS is generating significant error in the data that we need to resolve prior to the production of the 2020 Census Data Products.

Errors observed in the demonstration data products stem from two sources. One is the differential privacy mechanism itself (this is the noise necessary to protect privacy). The other, and more substantial, is the post-processing performed on the “noisy” privacy-protected data to put it into the format (non-negative, integer values) necessary for the Census Bureau’s tabulation process.

The privacy-preserving error of the differential privacy mechanism is inherently unbiased and easy to control through policy decisions governing the selection and allocation of the privacy-loss budget. However, the post-processing error, which workshop participants clearly demonstrated, introduces unacceptable and problematic data biases and distortions, and it requires structural and statistical changes to the DAS to mitigate.

Over the last three months, the science and engineering teams supporting the DAS have been hard at work on this issue. They have identified several promising solutions, including changes to the geographic hierarchy used within the DAS, alternative estimation techniques to correct for the known biases of Non-Negative Least Squares optimization and multiphase estimation of key statistics during post-processing. Determining which (or which combination) of these solutions works best will require empirical analysis with objective measurement and evaluation using an array of fitness-for-use measures that reflect the priority-use cases for decennial census data.

The Census Bureau is compiling the fitness-for-use measures that we will use to evaluate these improvements based on feedback received at the CNSTAT meeting, from our July 2018 Federal Register Notice, and from our continuous dialogue with the data user community. To ensure we are using the right measures to evaluate the data’s fitness-for-use, we will be consulting with our advisory committees, participants from the CNSTAT workshop, and our other data user partners. Once we finalize these metrics, we will regularly report on our progress in this blog.

The 2010 Demonstration Data Products were an interim product in the design and completion of the DAS. We greatly appreciate our partners from across the country who used those products to help us identify where the DAS needs to be improved. Over the coming months, we look forward to continuing that engagement with our data users as we work to improve the DAS.

The Census Bureau has a tradition and public expectation of producing high quality statistics about the United States. We are dedicated to designing a disclosure avoidance system that allows us to meet the quality standards data users expect from Census Bureau data products, while still abiding by the pledge of confidentiality that underlies the 2020 Census.

Back to Header