Customized noise distribution; EM algorithm; Partially synthetic data; Synthetic data; Tuning parameter.
When statistical agencies release microdata to the public, a major concern is the control of disclosure risk, while ensuring utility in the released data. Often some statistical disclosure control methods such as data swapping, multiple imputation, top coding, and perturbation with random noise, are applied before releasing the data. This article provides a comprehensive comparison of two methods, namely, multiple imputation and noise multiplication, for drawing inference about some useful parameters under the exponential, normal and log- normal models. The comparison is provided under two scenarios: (1) the entire data set is replaced by multiply imputed or noise multiplied data, and (2) only the top part of the data is similarly replaced. The latter scenario arises, for example, when top coding is used for disclosure control, especially with income data. Methodology is developed for the analysis of noise multiplied data under both scenarios. Under the situation where only the large values in the dataset are noise multiplied, data analysis methods are developed and compared under two types of data releases: (i) each released value includes an indicator of whether or not it has been noise perturbed, and (ii) no such indicator is provided. The comparison study shows that data analyses under the multiple imputation and noise multiplication methods can provide similar results in terms of accuracy of statistical inferences; and that noise multiplication can provide either more or less accuracy than multiple imputation by appropriately adjusting the variance of the noise generating distribution. Extensive simulation results provide guidance as to how the noise variance affects accuracy of inference in several parametric settings. A comparison using data from the 2000 U.S. Current Population Survey highlights the similarities of the methods. Detailed tables summarizing simulation results and some technical derivations are available online as supplementary material.
Martin Klein, Thomas Mathew, and and Bimal Sinha. (2013). A Comparison of Statistical Disclosure Control Methods: Multiple Imputation Versus Noise Multiplication. Center for Statistical Research & Methodology Research Report Series (Statistics #2013-02). U.S. Census Bureau. Available online at <http://www.census.gov/srd/papers/pdf/rrs2013-02.pdf>.
This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.