census.gov Notification
Due to the lapse of federal funding, portions of this website are not being updated. Any inquiries submitted via www.census.gov will not be answered until appropriations are enacted.

Approximate String Comparator Search Strategies for Very Large Administrative Lists

Written by:
RRS2005-02

Abstract

Rather than collect data from a variety of surveys, it is often more efficient to merge information from administrative lists. Matching of person files might be done using name and date-of-birth as the primary identifying information. There are obvious difficulties with entities having a commonly occurring name such as John Smith that may occur 30,000+ times (1.5 for each date-of-birth). If there are 5% typographical error in each field, then using fast character-by-character searches can miss 20% of true matches among non-commonly occurring records where name plus date-of-birth might be unique. This paper describes some existing solutions and current research directions.

Related Information


Page Last Revised - October 28, 2021