Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2005/02
Skip top of page navigation

Approximate String Comparator Search Strategies for Very Large Administrative Lists

William E. Winkler

KEY WORDS: search mechanisms, approximate string comparison, computer matching


Rather than collect data from a variety of surveys, it is often more efficient to merge information from administrative lists. Matching of person files might be done using name and date-of-birth as the primary identifying information. There are obvious difficulties with entities having a commonly occurring name such as John Smith that may occur 30,000+ times (1.5 for each date-of-birth). If there are 5% typographical errors in each field, then using fast character-by-character searches can miss 20% of true matches among non-commonly occurring records where name plus date-of-birth might be unique. This paper describes some existing solutions and current research directions.


Source: U.S. Census Bureau, Statistical Research Division

Created: March 21, 2005
Last revised: March 21, 2005

Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or |   Last Revised: October 08, 2010