Data Quality: Automated Edit/Imputation and Record Linkage
William E. Winkler
Statistical agencies collect data from surveys and create data warehouses by combining data from a variety of sources. To be suitable for analytic purposes, the files must be relatively free of error. Record linkage (Fellegi and Sunter, JASA 1969) is used for identifying duplicates within a file or across a set of files. Statistical data editing and imputation (Fellegi and Holt, JASA 1976) are used for locating erroneous values of variables and filling-in for missing data. Although these powerful methods were introduced in the statistical literature, the primary means of implementing the methods have been via computer science and operations research (Winkler, Information Systems 2004a). This paper provides an overview of the recent developments.
Source: U.S. Census Bureau, Statistical Research Division
Created: July 12, 2006
Last revised: July 12, 2006
This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.