While re-identification of sensitive data has been studied extensively, with the emergence of online social networks and the popularity of digital communications, the ability to use public data for re-identification has increased. This work begins by presenting two different cases studies for sensitive data reidentification. We conclude that targeted re-identification using traditional variables is not only possible, but fairly straightforward given the large amount of public data available. However, our first case study also indicates that large-scale re-identification is less likely. We then consider methods for agencies such as the Census Bureau to identify variables that cause individuals to be vulnerable without testing all combinations of variables. We show the effectiveness of different strategies on a Census Bureau data set and on a synthetic data set.
Aditi Ramachandran, Lisa Singh, Edward H. Porter, Frank Nagle. (2012). Exploring Re-identification Risks in Public Domains. Center for Statistical Research & Methodology, Research and Methodology Directorate Research Report Series (Statistics #2012-13). U.S. Census Bureau. Available online at <http://www.census.gov/srd/papers/pdf/rrs2012-13.pdf>.
[PDF] or
denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader®
available free from Adobe.
This symbol
indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.