Automatically Estimating Record Linkage False Match Rates
William E. Winkler
KEY WORDS: EM algorithm, unsupervised and semi-supervised learning
This paper provides a mechanism for automatically estimating record linkage false match rates in situations where the subset of the true matches is reasonably well separated from other pairs. The method provides an alternative to the method of Belin and Rubin (JASA 1995) and is applicable in more situations. We provide examples demonstrating why the general problem of error rate estimation (both false match and false nonmatch rates) is likely impossible in situations without training data and exceptionally difficult even in the extremely rare situations when training data are available.
Source: U.S. Census Bureau, Statistical Research Division
Created: June 13, 2007
Last revised: June 13, 2007
This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.