Additional Results from a Nationwide Matching of 2000 Census Data
Michael Ikeda and Edward Porter
KEY WORDS: Census Unduplication, Within Response Modeling, Record Linkage
A nationwide unduplication procedure is being considred for the 2010 Census. One
potential problem is the possibility of finding large numbers of false positives, especially when
matching above the county level. To help evaluate the extent of this problem, the matching and
modeling procedures are being run on the data from the 2000 Census.
This report provides an overview of the results from Within Response Modeling, which evaluates
households with multiple links, and of an analysis of the resulting Residual Person links. As
expected, name frequency does not seem to have much effect for links accepted in Within
Response Modeling, while most of the problem with apparent false matches in the Residual
Person links seems to be concentrated in the most common surnames and the most common
Hispanic surnames, especially for matches outside the state. In contrast, for given names there
does not appear to be a strong effect of name frequency on false matches.
Source: U.S. Census Bureau, Statistical Research Division
Created: March 5, 2008
Last revised: March 5, 2008
This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.