Fellegi, I. P., and A. B. Sunter (1969), "A Theory for Record Linkage," Journal of the American
Statistical Association, 64, pp. 1183-1210.
Formal mathematical model based on generalization of hypothesis testing. Introduced many
ideas that serve as the main mathematical reference.
Belin, T. R., and Rubin, D. B. (1995), "A Method for Calibrating False-Match Rates in Record
Linkage," Journal of the American Statistical Association, 90, 694-707.
Gives a method for automatically estimating matching error rates. Software available.
Copas, J. R., and F. J. Hilton (1990), "Record Linkage: Statistical Models for Matching Computer
Records," Journal of the Royal Statistical Society, A, 153, pp. 287-320.
Well-written invited paper with many impressive ideas that have not yet been implemented.
Jaro, M. A. (1989), "Advances in Record-Linkage Methodology as Applied to Matching the 1985
Census of Tampa, Florida," Journal of the American Statistical Association, 89, pp. 414-420.
Describes precursor to current Census system. Jaro introduced the most powerful methods of
string comparison and how they relate to the likelihoods of the Fellegi-Sunter model. Also,
introduced ideas from operations research for optimizing sets of assignments. High accuracy.
Kilss, B. and Alvey, W. (eds.) (1985), "Record Linkage Techniques -- 1985"
Statistics of Income Division, Internal Revenue Service Publication 1299 (2-86.
Classic overall reference. Now out of print. Available as a 30 megabyte pdf file at
http://www.bts.gov/fcsm/methodology
Newcombe, H. B. (1988), Handbook of Record Linkage: Methods for Health and Statistical
Studies, Administration, and Business, Oxford: Oxford University Press.
Classic book reference. Covers some of the theory and much of the heuristics needed for
good record linkage practice. Now out of print.
Rogot, E., P. Sorlie, and N. Johnson (1986), "Probabilistic methods of matching Census samples
to the National Death Index," Journal of Chronic Disease, 39, pp. 719-734.
Nice application of ideas of Newcombe and of Fellegi and Sunter.
Winkler, W. E. (1994), "Advanced Methods of Record Linkage," American Statistical
Association, Proceedings of the Section of Survey Research Methods, 467-472.
Describes new theory and algorithms in computer science, operations research, and statistics
that were developed at the Census Bureau and used in current Census system. Extends original
Jaro string comparator and gives likelihood-based methods for connecting the comparators to
the main decision rule of Fellegi and Sunter. Introduces a new assignment algorithm for
forcing 1-1 matching that is as fast the benchmark Burchard-Derigs algorithm and uses 1/500
as much storage; is also much faster and uses less storage than the MCF algorithm of
Klingman. Gives general theory extending EM ideas of Meng and Rubin (Biometrika 1994)
and shows how it is applied in estimating record linkage parameters. Gives method for
estimating record linkage error rates that holds in more situations than the Belin-Rubin method,
that does not require a training set as does the Belin-Rubin method, and requires an ad hoc
intervention that tends to limit its application to record linkage experts.
Winkler, W. E. (1995), "Matching and Record Linkage," in B. G. Cox et al. (ed.) Business Survey Methods, New York: J. Wiley, 355-384. Survey article that gives much background about record linkage. Describes available software, list acquisition and preparation, and a large number methods for evaluating the quality of lists and the quality of matching results.