Synthetic Microdata for Establishment Surveys Under Informative Sampling

Skip Navigation

Synthetic Microdata for Establishment Surveys Under Informative Sampling

July 26, 2019

Written by:

Hang J. Kim, Joerg Drechsler, and Katherine J. Thompson

RRS2019-07

Abstract

Download RESEARCH REPORT SERIES OR STUDY SERIES [PDF - <1.0 MB]

Many agencies are currently investigating whether releasing synthetic microdata could be a viable dissemination strategy for highly sensitive data, such as business data, for which disclosure avoidance regulations otherwise prohibit the release of public use microdata. However, existing methods assume that the original data either cover the entire population or comprise a simple random sample from this population, which limits the application of these methods in the context of survey data with unequal survey weights. This paper discusses synthetic data generation under informative sampling. To utilize the design information in the survey weights, we rely on the pseudo likelihood approach when building a hierarchical Bayesian model to estimate the distribution of the finite population. Then, synthetic populations are randomly drawn from the estimated finite population density. We present the full conditional distributions of the Markov chain Monte Carlo algorithm for the posterior inference with the pseudo likelihood function. Using simulation studies, we show that the suggested synthetic data approach offers high utility for design-based and model-based analyses while offering a high level of disclosure protection. We apply the proposed method to a subset of the 2012 U.S. Economic Census and evaluate the results with utility metrics and disclosure avoidance metrics under data attacker scenarios commonly used for business data.

You May Be Interested In

Page Last Revised - October 28, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top