U.S. Census Bureau decennial census, survey, and estimates programs work with subsets, known as extracts, of the Master Address File. These extracts are produced using a set of rules called filters. Filters attempt to maximize the number of valid Master Address File units, while minimizing the number of invalid units on the resulting extracts. These extracts provide the basis for the address frames used in census operations or the sample universes for current demographic household surveys. One such survey is the American Community Survey. The American Community Survey filter rules tend toward overcoverage (inclusion of invalid units) due to the higher difficulty of correcting undercoverage (exclusion of valid units ) in field work. The 2010 Census Evaluation of Data - Based Extraction Processes for the Address Frame, also referred to as the Data Mining Evaluation, presents possible improvements to the American Community Survey filter rules following analysis using data mining techniques to answer the research question:
How can the quality of the address frame be improved with a more scientific extract process?