U.S. Department of Commerce

Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2013/02
Skip top of page navigation

A Comparison of Statistical Disclosure Control Methods: Multiple Imputation Versus Noise Multiplication

Martin Klein, Thomas Mathew, and Bimal Sinha

KEY WORDS:

Customized noise distribution; EM algorithm; Partially synthetic data; Synthetic data; Tuning parameter.

ABSTRACT

When statistical agencies release microdata to the public, a major concern is the control of disclosure risk, while ensuring utility in the released data. Often some statistical disclosure control methods such as data swapping, multiple imputation, top coding, and perturbation with random noise, are applied before releasing the data. This article provides a comprehensive comparison of two methods, namely, multiple imputation and noise multiplication, for drawing inference about some useful parameters under the exponential, normal and log- normal models. The comparison is provided under two scenarios: (1) the entire data set is replaced by multiply imputed or noise multiplied data, and (2) only the top part of the data is similarly replaced. The latter scenario arises, for example, when top coding is used for disclosure control, especially with income data. Methodology is developed for the analysis of noise multiplied data under both scenarios. Under the situation where only the large values in the dataset are noise multiplied, data analysis methods are developed and compared under two types of data releases: (i) each released value includes an indicator of whether or not it has been noise perturbed, and (ii) no such indicator is provided. The comparison study shows that data analyses under the multiple imputation and noise multiplication methods can provide similar results in terms of accuracy of statistical inferences; and that noise multiplication can provide either more or less accuracy than multiple imputation by appropriately adjusting the variance of the noise generating distribution. Extensive simulation results provide guidance as to how the noise variance affects accuracy of inference in several parametric settings. A comparison using data from the 2000 U.S. Current Population Survey highlights the similarities of the methods. Detailed tables summarizing simulation results and some technical derivations are available online as supplementary material.

CITATION:

Martin Klein, Thomas Mathew, and and Bimal Sinha. (2013). A Comparison of Statistical Disclosure Control Methods: Multiple Imputation Versus Noise Multiplication. Center for Statistical Research & Methodology Research Report Series (Statistics #2013-02). U.S. Census Bureau. Available online at <http://www.census.gov/srd/papers/pdf/rrs2013-02.pdf>.

Source: U.S. Census Bureau, Center for Statistical Research & Methodology, Research and Methodology Directorate

Published online: January 23, 2013
Last revised: January 23, 2013

 


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Research and Methodology Directorate | Center for Statistical Research & Methodology | (301) 763-3215 (or chad.eric.russell@census.gov) |   Last Revised: September 11, 2013