Motivation: Record linkage is intrinsic to efficient, modern survey operations. It is used for unduplicating and updating name and address lists. It is used for applications such as matching and inserting addresses for geocoding, coverage measurement, Primary Selection Algorithm during decennial processing, Business Register unduplication and updating, re-identification experiments verifying the confidentiality of public-use microdata files, and new applications with groups of administrative lists. Significant theoretical and algorithmic progress (Winkler 2006ab, 2008, 2009a, 2013b, 2014a, 2014b; Yancey 2005, 2006, 2007, 2011, 2013) demonstrates the potential for this research. For cleaning up administrative records files that need to be linked, theoretical and extreme computational results (Winkler 2010, 2011b, 2018) yield methods for editing, missing data and even producing synthetic data with valid analytic properties and reduced/eliminated re-identification risk. Easy means of constructing synthetic data make it straightforward to pass files among groups.
Accomplishments (October 2017 - September 2018):
Short-Term Activities (FY 2019):
Longer-Term Activities (beyond FY 2019):
Alvarez, M., Jonas, J., Winkler, W. E., and Wright, R. “Interstate Voter Registration Database Matching: The Oregon- Washington 2008 Pilot Project,” Electronic Voting Technology.
Herzog, T. N., Scheuren, F., and Winkler, W. E. (2007). Data Quality and Record Linkage Techniques, New York, NY: Springer. Herzog, T. N., Scheuren, F., and Winkler, W. E. (2010). “Record Linkage,” in (Y. H. Said, D. W. Scott, and E. Wegman, eds.) Wiley Interdisciplinary Reviews: Computational Statistics.
Winkler, W. E. (2006a). “Overview of Record Linkage and Current Research Directions,” Research Report Series (Statistics #2006-02), Statistical Research Division, U.S. Census Bureau, Washington, DC.
Winker, W. E. (2006b). “Automatically Estimating Record Linkage False-Match Rates without Training Data,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, VA, CD-ROM.
Winkler, W. E. (2008). “Data Quality in Data Warehouses,” in (J. Wang, Ed.) Encyclopedia of Data Warehousing and Data Mining (2nd Edition).
Winkler, W. E. (2009a). “Record Linkage,” in (D. Pfeffermann and C. R. Rao, eds.) Sample Surveys: Theory, Methods and Inference, New York: North-Holland, 351-380.
Winkler, W. E. (2009b). “Should Social Security numbers be replaced by modern, more secure identifiers?”, Proceedings of the National Academy of Sciences.
Winkler, W. E. (2010). “General Discrete-data Modeling Methods for Creating Synthetic Data with Reduced Re-identification Risk that Preserve Analytic Properties,” https://www.census.gov/srd/papers/pdf/rrs2010-02.pdf .
Winkler, W. E. (2011). “Machine Learning and Record Linkage” in Proceedings of the 2011 International Statistical Institute.
Winkler, W. E. (2013). “Record Linkage,” in Encyclopedia of Environmetrics. J. Wiley.
Winkler, W. E. (2013). “Cleanup and Analysis of Sets of National Files,” Federal Committee on Statistical Methodology, Proceedings of the Bi-Annual Research Conference, http://www.copafs.org/UserFiles/file/fcsm/J1_Winkler_2013FCSM.pdf., https://fcsm.sites.usa.gov/files/2014/05/J1_Winkler_2013FCSM.pdf
Winkler, W. E. (2014a). “Matching and Record Linkage,” Wiley Interdisciplinary Reviews: Computational Statistics, http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WICS1317.html,, DOI: 10.1002/wics.1317, available from author by request for academic purposes.
Winkler, W. E. (2014b). “Very Fast Methods of Cleanup and Statistical Analysis of National Files,” Proceedings of the Section on Survey Research Methods, American Statistical Association, CD-ROM.
Winkler, W. E. (2015). “Probabilistic Linkage,” in (H. Goldstein, K. Harron, C. Dibben, eds.) Methodological Developments in Data Linkage, J. Wiley: New York.
Winkler, W. E. (2018 to appear). “Cleaning and Using Administrative Lists: Enhanced Practices and Computational Algorithms for Record Linkage and Modeling/Editing/Imputation,” in (A.Y. Chun and M. D. Larsen, eds.) Administrative Records for Survey Methodology, J. Wiley, New York: NY.
Winkler, W. E., Yancey, W. E., and Porter, E. H. (2010). “Fast Record Linkage of Very Large Files in Support of Decennial and Administrative Records Projects,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Alexandria, VA.
Yancey, W. E. (2005). “Evaluating String Comparator Performance for Record Linkage,” Research Report Series (Statistics #2005-05), Statistical Research Division, U.S. Census Bureau, Washington, DC.
Yancey, W. E. (2007). “BigMatch: A Program for Extracting Probable Matches from a Large File,” Research Report Series (Computing #2007-01), Statistical Research Division, U.S. Census Bureau, Washington, DC.
Contact: William E. Winkler, Edward H. Porter, Emanuel Ben-David
Funding Sources for FY 2018: