Synthesizing Housing Units for the American Community Survey

Written by:
Working Paper Number: Disclosure Avoidance #2018-03

Abstract

The Census Bureau is charged with collecting and disseminating data while protecting the privacy of respondents. The Census Bureau must protect against several types of unauthorized disclosure of data, a task that has become more difficult in recent years. One promising line of research is the creation of synthetic data, derived from a model to mimic the original data while protecting against unauthorized disclosure. We created synthetic data for the housing variables in the American Community Survey (ACS), using standard regression methods and Classification and Regression Trees (CART). Our metrics showed that the accuracy of the synthetic data was fairly high for some variables but lower for other variables. We have not proved that our methods satisfy any formal privacy criterion, although future research does aim to have this property.

Page Last Revised - October 28, 2021