Measuring Identification Risk in Microdata Release and Its Control by Post-randomization

Skip Navigation

Measuring Identification Risk in Microdata Release and Its Control by Post-randomization

May 02, 2016

Written by:

Tapan K. Nayak, Cheng Zhang, and Jiashen You

CDAR2016-02

Abstract

Download [PDF - <1.0 MB]

Statistical agencies often release a masked or perturbed version of survey data to protect respondents' confidentiality. Ideally, a perturbation procedure should protect confidentiality without much loss of data quality, so that released data may practically be treated as original data for making inferences. One major objective is to control the risk of correctly identifying any respondent's records in released data, by matching the values of some identifying or key variables. For categorical key variables, we propose a novel approach to measuring identification risk and setting strict disclosure control goals. The general idea is to ensure that the probability of correctly identifying any respondent or surveyed unit is at most ξ , which is pre- specified. Then, we develop an unbiased post-randomization procedure that achieves this goal for ξ > 1 / 3. The procedure allows substantial control over possible changes to the original data and the variance it induces is of a lower order of magnitude than sampling variance. We apply the procedure to a real data set, where it performs consistently with the theoretical results and quite importantly, shows very little data quality loss.

Others in Series

Working Paper

Likelihood-Based Finite Sample Inference

July 01, 2014

Likelihood-based finite sample inference based on synthetic data under the exponential model is developed in this paper.

Working Paper

Emerging Applications of Randomized Response Concepts

May 02, 2016

Randomized response (RR) was introduced as a technique for protecting respondents' privacy in survey interviews regarding sensitive characteristics.

Working Paper

A Concise Theory of Randomized Response Techniques for Privacy

July 28, 2016

A variety of randomized response (RR) procedures for privacy and confidentiality protection have been proposed, studied and compared in the literature.

View All

Related Information

Disclosure Avoidance

Page Last Revised - October 8, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top