Post-randomization for Identification Risk Limited Microdata Release from General Surveys

Written by:
RRS2018-11

Abstract

Before releasing survey data, statistical agencies usually perturb the original data to keep each survey unit's information confidential. One significant concern is identity disclosure, which occurs when an intruder correctly identifies the records of a survey unit by matching the values of some key (or pseudo-identifying) variables. Nayak, Zhang and You (2018) developed a post-randomization method for a strict identification risk control in releasing survey microdata. The procedure also well preserves the observed frequencies and hence statistical estimates in case of simple random sampling. We show that in general surveys, the procedure may induce considerable bias in commonly used survey weighted estimators. We propose a modified procedure that better preserves weighted estimates. The procedure is illustrated and empirically assessed with an application to a publicly available U.S. Census Bureau data set.

Page Last Revised - October 28, 2021