When files are acquired and transmitted to Census, they are initially accessible only by a small staff responsible for inventorying the contents of the file, conducting basic Quality Control checks, and removing sensitive Personally Identifying Information (PII). This staff works in a secured physical environment and on a highly-restricted computing cluster that is behind the Census firewall.
The processing and de-identification staff confirms that the received files are exactly as described in the legal agreement. Census is never permitted to receive more than has been specified in the applicable agreement. The staff also confirms that the variables and documentation have a basic integrity that will allow us to use them.
Next, a data linkage team replaces sensitive PII with a unique key that can be used to link the records to other databases held at Census. The probabilistic linkage process relies on variables such as name, address, date of birth, and Social Security Number. These PII are used to link the incoming file to a “reference file” comprised of censuses, surveys, and other federal records. The reference file contains PII from these other files and a Protected Identification Key (PIK), which uniquely identifies each record. When a linkage can be made between the incoming file and the reference file, the PIK is appended to the incoming file.
More information on this process is available in this paper.