Skip Header
U.S. flag

An official website of the United States government

Please be advised that our Call Center will be closed from Saturday, July 2 through Monday, July 4 while the system is undergoing an upgrade. The Call Center will reopen on Tuesday, July 5. Thank you for your patience as we work to improve our Call Center system.

Use of Auxiliary Data to Predict Housing Unit Status and Improve the Quality of ACS Data Impacted by the Pandemic

The COVID-19 pandemic forced the American Community Survey (ACS) to make some drastic changes in the nonresponse follow-up methodology that impacted the estimate of vacant and occupied housing units. Normally, when field representatives visit sampled addresses, even if they cannot get an interview, they are able to determine if the unit is vacant. When we adjust the weights for the occupied cases to account for unit nonresponse, we assume that we have identified the vacant units and treat the remaining nonresponding housing units as occupied. However, during 2020, personal visits were limited, and Census Bureau field representatives were unable to verify the housing unit status of a large number of sample addresses. Having a large number of cases where we could not identify the vacant units led to an underestimation of the number of vacant units and an overestimation of the number of occupied units.

In April, May, and June 2020, field representatives were prohibited from visiting sampled addresses; instead, all interviews were to be conducted by telephone. We were able to resume personal visits in July and August for some areas, but telephone interviews were still mandated over much of the country. Starting in September, we finally were able to conduct most nonresponse interviews in person, although in November and December, we could not do personal interviews in a small set of areas most affected by the pandemic. The Census Bureau provided the field representatives phone numbers, but there were many cases where we either did not have a good phone number for the current residents of the address or the residents did not answer our calls.

To improve our estimates, we used auxiliary data in a statistical model to estimate predicted probabilities of vacancy for the universe of addresses that we could mail to and were part of our nonresponse follow-up workload. We fit a binomial logit model on the housing unit-level auxiliary data to predict the vacancy probability. The use of a statistical model naturally allowed the incorporation of information from multiple sources. This model combined information from the United States Postal Service (USPS) mailing data associated with the ACS mailings. It also associated persons with addresses present on Internal Revenue Service (IRS) tax returns and the Medicare enrollment database. Specifically, vacancy status was modeled as a function of independent variables from administrative records, field collection paradata, and survey information. Such covariate information includes the undelivered-as-addressed (UAA) data from the USPS for each of the ACS mailings, persons from the administrative record sources, characteristics associated with the block group as determined by the ACS, and other address-level information. The estimated coefficients from the model were then applied to the universe of cases for which vacancy status was unknown. Edit and allocation procedures were applied for these predicted vacant housing units to impute housing characteristic data, such as home value, etc.

Overall, the number of housing units that were converted to vacant was relatively small, only approximately 10,000 nationally out of approximately two million housing units that were eligible in the 2020 sample. To augment this approach, we also added a weighting adjustment for the 2020 estimates so that the vacancy rate was equal to the rate observed during those months where we were able to implement our standard data collection methodology. This adjustment effectively guarded against an over or underestimate of the vacancy rate as a result of the vacancy prediction methodology.


Back to Header