Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research

Written by:
Working Paper Number: CES-26-06

Abstract

The U.S. Census Bureau’s Person Identification Validation System facilitates anonymous linkages between survey and administrative records by assigning Protected Identification Keys (PIKs) to person records. While PIK assignment is generally accurate, some person records are not successfully assigned a PIK, which can lead to sample selection bias in analyses of linked data. Using the American Community Survey (ACS) and the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) between 2005 and 2022, we corroborate and extend existing findings on the drivers of PIK assignment, showing that the rate of PIK assignment varies widely across socio-demographic subgroups. Using earnings as a test case, we then show that limiting a survey sample of wage earners to person records with PIKs or successful linkages to W-2 wage records tends to overestimate self-reported wage earnings, on average, indicative of linkage-induced selection bias. In a validation exercise, we demonstrate that reweighting methods, such as inverse probability weighting or entropy balancing, can mitigate this bias.

Page Last Revised - January 29, 2026