Skip Main Navigation Skip To Navigation Content

Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2006/07
Skip top of page navigation

Data Quality: Automated Edit/Imputation and Record Linkage

William E. Winkler



Statistical agencies collect data from surveys and create data warehouses by combining data from a variety of sources. To be suitable for analytic purposes, the files must be relatively free of error. Record linkage (Fellegi and Sunter, JASA 1969) is used for identifying duplicates within a file or across a set of files. Statistical data editing and imputation (Fellegi and Holt, JASA 1976) are used for locating erroneous values of variables and filling-in for missing data. Although these powerful methods were introduced in the statistical literature, the primary means of implementing the methods have been via computer science and operations research (Winkler, Information Systems 2004a). This paper provides an overview of the recent developments.


Source: U.S. Census Bureau, Statistical Research Division

Created: July 12, 2006
Last revised: July 12, 2006

[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or |   Last Revised: October 08, 2010