Post-randomization for Identification Risk Limited Microdata Release from General Surveys

Skip Navigation

Post-randomization for Identification Risk Limited Microdata Release from General Surveys

October 05, 2018

Written by:

Cheng Zhang and Tapan K. Nayak

RRS2018-11

Abstract

Download RESEARCH REPORT SERIES OR STUDY SERIES [PDF - <1.0 MB]

Before releasing survey data, statistical agencies usually perturb the original data to keep each survey unit's information confidential. One significant concern is identity disclosure, which occurs when an intruder correctly identifies the records of a survey unit by matching the values of some key (or pseudo-identifying) variables. Nayak, Zhang and You (2018) developed a post-randomization method for a strict identification risk control in releasing survey microdata. The procedure also well preserves the observed frequencies and hence statistical estimates in case of simple random sampling. We show that in general surveys, the procedure may induce considerable bias in commonly used survey weighted estimators. We propose a modified procedure that better preserves weighted estimates. The procedure is illustrated and empirically assessed with an application to a publicly available U.S. Census Bureau data set.

You May Be Interested In

Page Last Revised - October 28, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top