In my last blog, I described the Census Bureau’s work to modernize how we protect respondent data in the statistics we publish. Today, I discuss how these confidentiality protection methods also ensure that the statistics we publish are accurate and fit for the use for which they are designed.
The Census Bureau has a dual mandate to produce quality statistical information while protecting the confidentiality of respondent data. Most of the statistics we release include margins of error to help the user determine if published estimates are suitable, or accurate enough, for a given application, something we call “fitness-for-use.” From a scientific perspective, differential privacy is the first disclosure avoidance tool that allows us to simultaneously protect data confidentiality while ensuring fitness-for-use in our published data products.
To help prevent anyone from tracing statistics back to a specific respondent, we alter the underlying statistical tabulations before publication. This process is called “noise injection.” It has been a key feature of our confidentiality protection systems for decades. However, this process is a delicate balancing act. Enough noise must be added to protect confidentiality, but too much noise could damage the statistic’s fitness-for-use.
Using our previous disclosure avoidance systems, we are not able to share details of the noise added to the data in order to protect confidentiality. This means users have no idea how much any estimate may be altered from its measured value due to the application of historical disclosure avoidance procedures.
The new formally private disclosure avoidance procedures I discussed in my last blog on August 17 will be much more transparent in regard to their impact on data quality and fitness-for-use. All of the statistical properties of the noise we use to protect confidentiality in our new systems will be public information. Any user will be able to assess fitness-for-use directly. To be as clear as possible, noise injection is not new—it was done in every decennial census since 1980 as well as to many other Census Bureau data products. What’s new with the adoption of modernized disclosure avoidance methods is the open acknowledgement of how the addition of noise impacts the published data.
In this era of Big Data, simply adding more noise using our older methods is not a workable solution. So much noise would be required that our published data would be unfit for most uses. With an enhanced confidentiality system based on differential privacy, we can precisely control and tailor the amount of noise that we add to the data. This amount will be based on a specific balance of privacy control and accuracy that we set in advance. By documenting the properties of this noise, we can be transparent about fitness-for-use while never sacrificing respondent confidentiality. Documenting the effects of injecting noise is comparable to the way we provide margins of error for our current statistical products.
We know that the nation needs timely, accurate information to make informed decisions. Our goal is to ensure that the public trusts us with their data and values the statistics that we produce. Adopting our advanced confidentiality protection system helps us to meet that goal while marking a significant milestone in the evolution of data protections — for both the Census Bureau and the nation.
Related note: Data users have until September 17, 2018, to submit comments to the Federal Register notice “Soliciting Feedback From Users on 2020 Census Data Products.” Your responses will help us create and prioritize products for the 2020 Census as we work to implement improved confidentiality protection measures.