The State of Record Linkage and Current Research Problems

William E. Winkler

KEY WORDS: computer matching, modeling, iterative fitting, string comparison, optimization


This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin. Still other limits are due to the inability do auxiliary programming to clean-up and adjust for data-specific anomalies. The evaluation of matching results also necessitates auxiliary analyses. The paper closes with a description of selected research problems.

