The main goal of the Small Area Income and Poverty Estimates project is to create single-year estimates of median household income and number in poverty for states, counties, and school districts that are more precise than those available from surveys alone. For each of five key income and poverty statistics at the state level and each of four key income and poverty statistics at the county level, we have used a combination of multiple regression estimation techniques and shrinkage techniques to create these estimates. At the state level, we model poverty ratios. To obtain estimates of numbers of poor persons, we multiply these rates by demographic estimates of their denominators. At the county level, we model number in poverty directly. We do not model poverty rates for counties because we do not know how to gauge the quality of the population estimates for counties. The strategy of separating the state and county models was adopted because it was found that models constructed for states were superior in terms of goodness-of-fit, and that their results could provide "controls" to the county estimates.
Our modeling relies on administrative data derived from tax returns, counts of participants in the Supplemental Nutrition Assistance Program, data from the Bureau of Economic Analysis, decennial census estimates, postcensal population estimates, and the American Community Survey (ACS). Using these administrative and survey data, we build dependent and independent variables for our models and test our models. For school districts we use a synthetic approach that utilizes ACS 5-Year poverty estimates, confidential IRS personal income tax data, and the most recent model-based estimates for counties. We also use the most up-to-date boundaries from the Census Bureau’s School District Review Program (SDRP).
Estimates from the ACS 1-Year provide the measures of income and poverty that serve as the dependent variables in the state and county regression models. The ACS was first incorporated into to the SAIPE model for the 2005 estimates. Prior to this, the SAIPE program used data from the Current Population Survey (CPS) Annual Social and Economic Supplement (ASEC) as its survey component. This change was made primarily for two reasons. First, in 2006 the Census Bureau changed the basis of its official direct state poverty estimates from CPS ASEC data to ACS data. Since SAIPE focuses on estimates at state and lower levels of geography, changing to ACS as the basis for SAIPE is consistent with this change made for the official direct survey estimates. Second, the much larger sample size in the ACS (over 3,000,000 addresses nationally) than in the CPS ASEC (about 100,000 addresses nationally) conveys significant advantages for small area estimation. In general, the larger ACS sample sizes lead to substantially lower variances of the direct survey estimates and to mostly lower variances for the resulting model-based estimates.
While the ACS sample sizes for many counties are large enough to permit the derivation of direct county estimates for the key statistics, they are not sufficient for all statistics in some counties. Direct estimates from the ACS 1-Year are mainly available for counties with population size greater than 65,000, which makes up approximately 26% of counties and covers roughly 85% of the U.S. population. By “borrowing strength” from administrative data, the SAIPE program increases estimate precision and decreases year-to-year volatility of ACS estimates allowing SAIPE to release income and poverty estimates for all counties annually.
For the state and county models, both published and unpublished ACS 1-Year estimates are used as the dependent variable. From this estimated equation and known values of administrative variables, a regression "prediction" is obtained for each county. This regression-based prediction is combined with the direct sample estimate, with each component receiving a weight. The sum of the two weights for each area is one. The weight for the model prediction component is the ratio of the sampling variance of the direct estimate to the total variance (sampling plus "lack of fit") of the direct estimate. Using this technique, the more uncertain the direct sample estimate, the larger the contribution from the regression model. These weights are commonly referred to as "shrinkage weights" and the final estimates as "shrinkage estimates." The final step in the estimation of state- and county-level estimation is to use a simple ratio technique to control the sum of the number in poverty at the state-level to the ACS national estimate, and at the county-level to the resulting controlled state estimate. Estimated median household income for counties and states are not similarly controlled to state or national medians. Since school district estimates are derived from a synthetic within-county shares approach, the application of the eventual shares estimates to the prior controlled county estimates will result in consistent totals throughout the database.