Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2007/22
Skip top of page navigation

Initial Results from a Nationwide BigMatch Matching of 2000 Census Data

Michael Ikeda and Edward Porter

KEY WORDS: Census Unduplication, Across Response Matching, Record Linkage


A nationwide unduplication operation is being considered for the 2010 Census. One potential problem is the possibility of finding large numbers of false positives, especially when matching above the county level. To help evaluate the extent of this problem, the Census Bureau's BigMatch program performed a matching of person records across all Census addresses, using data from the 2000 Census.

This report provides an overview of the matching methodology and of the results of an exploratory analysis of the matching output. As expected, most of the problem with apparent false matches seems to be concentrated in the most common surnames and the most common Hispanic surnames, especially for matches outside the state. In contrast, for given names there does not appear to be a strong effect of name frequency on false matches.


Source: U.S. Census Bureau, Statistical Research Division

Created: December 29, 2007
Last revised: December 29, 2007

[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or |   Last Revised: October 08, 2010