Model-based or Hot deck? Imputing Item Non-Response Data in the Survey of Income and Program Participation (SIPP)

Written by:
Working Paper Number: SEHSD-WP2026-14 / SIPP-WP-326

For many years, hot deck has been the default method for imputing non-response in many variables within the Survey of Income and Program Participation. The availability of modern imputation methods and the ongoing efforts to modernize the SIPP present an opportunity to implement more modern imputation methods. Using four sets of variables from the 2022 SIPP, we test how hot deck performs relative to model-based imputation methods. We compared hot deck against linear regression, logistic regressions, Multiple Imputation by Chained Equations (MICE), and eXtreme Gradient Boosting (XGBoost). We find that MICE and XGBoost are relatively more accurate and efficient methods than hot deck. The implications of these findings and how practitioners can apply them are discussed.

Page Last Revised - June 22, 2026