Today, the Census Bureau is releasing demonstration data products designed to help our data user community better understand the disclosure avoidance system for the 2020 Census and its impact on data quality and protection.
As discussed in previous blogs, our decision to deploy a modernized disclosure avoidance system for the 2020 Census was driven by research showing that methods we used to protect the 2010 Census and earlier statistics can no longer adequately defend against today’s privacy threats.
The new system uses differential privacy as its core methodology. Developed by cryptographers and computer scientists, it is a “formally private” solution that is based on mathematical algorithms to provably safeguard respondent confidentiality.
With the new method, the uncertainty (noise) is added to the statistics in the tables themselves. The traditional method added uncertainty only to some records and variables deemed at higher risk, and prior to tabulation. The new method allows us to precisely control the amount of uncertainty that we add according to privacy requirements. And, by documenting the properties of this uncertainty, we can help data users determine if published estimates are sufficiently accurate for their specific applications. In this manner, we can determine the data’s “fitness for use.”
The alternative to a disclosure avoidance system based on differential privacy is adding substantially more noise using our older methods and substantially suppressing small population and geographic area data. This is not a viable solution. So much noise would be required that our published data would be unfit for most uses. There is currently no other statistical technique that can reliably assure the confidentiality of the underlying data while simultaneously assuring the highest quality statistical product for our data users.
Even with the new safeguards, protecting privacy requires us to restructure many of the statistical tables that we publish. We have shared the proposed changes to census data products at previous briefings; the release of today’s demonstration files gives you a chance to determine how well the new disclosure avoidance system meets your needs.
The demonstration files apply the 2020 Census disclosure avoidance system to the 2010 Census confidential data — that is, the unprotected data from the 2010 Census that are not publicly available. Note that published 2010 Census data were protected using standard statistical disclosure avoidance procedures such as swapping, coarsening and suppression. The demonstration files are protected using a privacy-loss budget chosen for illustrative purposes only.
We encourage data users and data scientists to examine the products and provide feedback as we continue to develop and fine-tune the disclosure avoidance system. We are releasing these demonstration files to encourage independent analyses from the data user community.
Because these data are widely used in ways that go beyond the Census Bureau's needs, we want to ensure they’re fit for as many data users' needs as possible. We’re also inviting the privacy community, both policy and technical experts, to join in this discussion.
Please visit our web pages dedicated to this topic and to the demonstration data products to access the demonstration files, learn more about our plans for 2020 data products, and learn how to provide feedback.
Lastly, the Committee on National Statistics (CNSTAT) is hosting a special workshop on the demonstration data products on Dec. 11-12, 2019, where presenters will share their analyses of the products and the potential impact of the DAS on their data uses. Visit CNSTAT closer to the date for more information and viewing options.