Small Area Estimation

Motivation:

Small area estimation is important in light of a continual demand by data users for finer geographic and demographic detail of published statistics and for various subpopulations. Traditional demographic and economic sample surveys designed for national estimates do not provide large enough samples to produce reliable direct estimates for smaller areas such as counties and even most states. The use of valid statistical models, along with the availability of suitable auxiliary data, can provide small area estimates with greater precision; however, bias due to an incorrect model or failure to account for informative sampling can result.

Research Problems:

· Development of models that combine data across multiple sample surveys or combines survey and observational data (non-probability samples) to improve survey estimates.

· Development of model diagnostic and model comparison tools for small area models.

· Development of small area share models for subareas estimates (e.g., school districts or tracts).

· Development of temporal small area estimation techniques.

· Development of spatial small area estimation techniques.

· Development of more robust estimates of mean squared error of prediction by incorporating Bayesian and bootstrap methods.

· Development of area-level models to jointly estimate the survey mean and variance.

· Development of models combining both small geographic areas crossed with small demographic subgroups.

Current Subprojects:

· Bootstrap Mean Squared Error Estimation for Small Area Means under Non-normal Random Effects (Datta, Irimata, Maples)

· Bayesian Hierarchical Spatial Models for Small Area Estimation (Datta, Janicki, Maples)

· Construction of Joint Credible Set of Ranks of Small Area Means (Datta, Maples)

· Developing Correlated Small Area Share Models to Create Estimates of School District Child Poverty and Population (Maples)

· Developing Geographically Weighted Methods to Assess the Assumption of Constant Parameter Values Across All Domains (Maples, Dompreh)

· Development of Tract by Demographic Population Estimates for Non-census Years Using Census, ACS and Demographic Frame Data (Maples, Mule (R&M), Basel (SEHSD), Holan (R&M))

· Development of Small Area Models for Establishment Surveys for Employment and Receipts. (Aleshin-Guendel, Maples, Datta, Janicki, Kaputa (ESMD), Maison (EMD)

· Variance Estimation and Modeling for Privacy-Protected Redistricting Data (Irimata)

Potential Applications:

· Model diagnostic and comparison tools can be applied in any small area application, from SAIPE to SAHIE, to small area models applied to SIPP, AHS, etc.

· Temporal extensions of small area models will be potentially useful for population estimates in sub-county areas in non-census years.

· Small area share models may be a replacement to the current for the current school district estimates procedures for SAIPE.

· Spatial small area models can improve estimates and provide limited disclosure avoidance for some of the ACS special tabulations.

· Small area models to estimate employment and receipts using data from the new AIES (Annual Integrated Economic Survey) at the state by NAICS-3 level.

· Joint area-level models can be used to produce estimates of the population counts, as well as the variance in the TopDown Algorithm (TDA) due to differentially private noise addition and post-processing in the PL94-171 redistricting data.

Accomplishments (October 2020-September 2025):

· Developed a small area share model to estimate the number of school aged children in poverty and total school aged children for school districts given the official county level poverty estimates.

· Generalized the small area share model to allow systematic differences in the precision term across areas given area-specific covariates.

· Derived several different mean squared error estimators from the Fay-Herriot model, both analytical and bootstrapped-based, and demonstrated the benefits of these estimators through a large simulation study.

· Studied the impact of differential privacy noise infusion on voting district plans and evaluated measures of variability.

· Developed a small area share model to distribute county population counts down to tracts and give an estimate of uncertainty based on the both the uncertainty of the county total and the estimated county-to-track share.

· Developed a geographically weighted version of the Fay-Herriot model to assess variability of parameters across space.

· Studied the variability inherent to the TDA at the block group level for different race and ethnicity groups for the PL94-171 redistricting data.

Short-Term Activities (FY 2025 – FY 2027):

· Extend the Small Area Shares model to allow for dependence between sets of shares, e.g., allow the school district to county shares of school age children in poverty and not-in-poverty to have a dependence.

· Evaluate different mean squared error estimates under the Fay-Herriot model when the error distribution is not always correctly specified.

· Develop multivariate spatial models which use differentially private measurements and auxiliary survey data for the purpose of predicting the number of persons in counties and AIAN areas for detailed race groups.

· Develop models for tract by demographic group population estimates for non-census years.

· Apply the geographically weighted Fay-Herriot models to the production SAIPE county model to test the assumption that the parameters are constants across all areas.

· Extent the small area models for the AIES to estimate NAICS-4 by state domains.

· Investigate and incorporate additional covariate data sources to improve population count estimates of the TDA privacy-protected PL94-171 redistricting data using a joint area-level model.

· Improve draws of the variance in the joint area-level model by replacing the independent Metropolis-Hastings step with a different method, such as Vertical Weighted Strips.

Longer-Term Activities (beyond FY 2027):

· Generalize the geographically weight Fay-Herriot to count and rate models (non Normal) and allow for only a subset of parameters to vary while the remaining remain fixed.

· Develop models that jointly model survey-weighted proportions and effective sample sizes.

· Evaluation of new models (county and school district) to update official SAIPE methodology.

· Develop model with spatial components for estimating tract by demographic group population estimates.

· Extend the joint area-level model for survey mean and variance to account for correlation between different race groups.

Selected Publications (Journal Articles, Peer Review):

Datta, G.S. and Li, J. (2024). “A Quasi-Bayesian Approach to Small Area Estimation Using Spatial Models,” Calcutta Statistical Association Bulletin, 76(1), 118-136.

Datta, G.S., Lee, J., and Li, J. (2023). “Pseudo-Bayesian Small Area Estimation,” Journal of Survey Statistics and Methodology, 12(2), 343-368.

Franco, C. and Bell, W.R. (2022). “Using American Community Survey Data to Improve Estimates from Smaller U.S. Surveys through Bivariate Small Area Estimation Models,” Journal of Survey Statistics and Methodology, 10(1), 225-247.

Parker, P.A., Janicki, R., and Holan, S. (In Press). “Bayesian Methods Applied to Small Area Estimation for Establishment Statistics,” in Bavdaž, M., Bender, S., Jones, J., MacFeely, S., Sakshaug, J.W., Thompson, K.J., and van Delden, A. (Eds.), Advances in Business Statistics, Methods and Data Collection, Wiley.

Parker, P., Holan, S., and Janicki, R. (2022). “Computationally Efficient Bayesian Unit-level Models for Non-Gaussian Data Under Informative Sampling with Application to Estimation of Health Insurance Coverage,” The Annals of Applied Statistics, Vol 16, No. 2, 887-904.

Ghosh, T., Ghosh, M., Maples, J., and Tang, X. (2022). "Multivariate Global-Local Priors for Small Area Estimation," STATS, v5, 673-688. https://www.mdpi.com/2571-905X/5/3/40/htm.

Janicki, R., Raim, A.M., Holan, S.H., and Maples, J. (2022). “Bayesian Nonparametric Multivariate Spatial Mixture Mixed Effects Models with Application to American Community Survey Special Tabulations,” The Annals of Applied Statistics, Volume 16, Issue 1, 144-168.

Erciulescu, A., Franco, C., and Lahiri, P. (2021). “Use of Administrative Records in Small Area Estimation,” in Chun, A. Y. and Larsen, M. (Eds.), Administrative Records for Survey Methodology, New York, NY: Wiley Publishers.

Liu, B., Dompreh, I., and Hartman, A.M. (2021). “Small Area Estimation of Smoke-Free Workplace Policies and Home Rules in U.S. Counties,” Journal of Nicotine and Tobacco Research.

Parker, P. A., Holan, S. H., and Janicki, R. (2020). “Bayesian Unit-Level Modeling of Count Data under Informative Sampling Designs,” Stat, 9.

Bell, W. R., Chung, H. C., Datta, G. S., and Franco, C. (2019). “Measurement Error in Small Area Estimation: Functional vs. Structural vs. Naïve Models,” Survey Methodology, 45, 61-80.

Chakraborty, A., Datta, G.S., and Mandal, A. (2019). “Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model,” International Statistical Review, 87, S1, S158–S176, doi:10.1111/insr.12283.

Chung, H., Datta, G., and Maples, J. (2019). “Estimation of Median Incomes of the American States: Bayesian Estimation of Means of Subpopulations,” Opportunities and Challenges in Development, Simanti Bandyopadhyay and Mousumi Datta (ed.), New York: Springer, 505-518.

Franco, C., Little, R.J.A., Louis, T.A., and Slud, E.V. (2019). “Comparative Study of Confidence Intervals for Proportions in Complex Surveys,” Journal of Survey Statistics and Methodology, 7, 3, 334-364.

Datta, G.S., Rao, J.N.K., Torabi, M., and Liu, B. (2018). “Small Area Estimation with Multiple Covariates Measured with Errors: A Nested Error Linear Regression Approach of Combining Two Surveys,” Journal of Multivariate Analysis, 167, 49-59.

Arima, S., Bell, W.R., Datta, G.S., Franco, C., and Liseo, B. (2017). “Multivariate Fay-Herriot Bayesian Estimation of Small Area Means Under Functional Measurement Error,” Journal of the Royal Statistical Society--Series A, 180(4), 1191-1209.

Janicki, R. and Vesper, A. (2017). “Benchmarking Techniques for Reconciling Small Area Models at Distinct Geographic Levels,” Statistical Methods Applications, DOI: https://doi.org/10.1007/s10260-017-0379-x, 26, 557-581.

Maples, J. (2017). “Improving Small Area Estimates of Disability: Combining the American Community Survey with the Survey of Income and Program Participation,” Journal of the Royal Statistical Society-Series A, 180(4), 1211-1227.

Chakraborty, A., Datta, G.S., and Mandal, A. (2016). “A Two-component Normal Mixture Alternative to the Fay-Herriot Model,” Joint issue of Statistics in Transition new series and Survey Methodology, Part II, 17, 67-90.

Datta, G.S. and Mandal, A. (2015). “Small Area Estimation with Uncertain Random Effects,” Journal of the American Statistical Association: Theory and Methods, 110, 1735-1744.

Franco, C. and Bell, W.R. (2015). “Borrowing Information over Time in Binomial/logit Normal Models for Small Area Estimation,” Joint Issue of Statistics in Transition and Survey Methodology, 16, 4, 563-584.

Bell, W.R., Datta, G.S., and Ghosh, M. (2013). “Benchmarking Small area Estimators,” Biometrika, 100, 189-202, doi:10.1093/biomet/ass063.

Datta, G., Ghosh, M., Steorts, R., and Maples, J. (2011). “Bayesian Benchmarking with Applications to Small Area Estimation,” TEST, Volume 20, Number 3, 574-88.

Slud, E. and Maiti, T. (2011). “Small-Area Estimation Based on Survey Data from Left-Censored Fay-Herriot Model,” Journal of Statistical Planning & Inference, 3520-3535.

Malec, D. and Maples, J. (2008). “Small Area Random Effects Models for Capture/Recapture Methods with Applications to Estimating Coverage Error in the U.S. Decennial Census,” Statistics in Medicine, 27, 4038-4056.

Malec, D. and Müller, P. (2008). “A Bayesian Semi-Parametric Model for Small Area Estimation,” in Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh (eds. S. Ghoshal and B. Clarke), Institute of Mathematical Statistics, 223-236.

Slud, E. and Maiti, T. (2006). “Mean-Squared Error Estimation in Transformed Fay-Herriot Models,” Journal of the Royal Statistical Society-Series B, 239-257.

Malec, D. (2005). “Small Area Estimation from the American Community Survey Using a Hierarchical Logistic Model of Persons and Housing Units,” Journal of Official Statistics, 21 (3), 411-432.

**Selected Publications (CSRM Research Reports, CSRM Studies, Proceedings Papers, and Other):**

Janicki, R (2016). “Estimation of the Difference of Small Area Parameters from Different Time Periods,” Research Report Series (Statistics #2016-01), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Maples, J. (2019). “Small Area Estimates of the Child Population and Poverty in School Districts Using Dirichlet-Multinomial Models,” 2019 Proceedings of the American Statistical Association, Section on Survey Research Methods, American Statistical Association, Alexandria, VA, 3150-3152.

Franco, C. and Bell, W.R. (2013). “Applying Bivariate/Logit Normal Models to Small Area Estimation,” in JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. 690-702.

Janicki, R. (2011). “Selection of Prior Distributions for Multivariate Small Area Models with Application to Small Area Health Insurance Estimates,” JSM Proceedings, Government Statistics Section. American Statistical Association, Alexandria, VA.

Maples, J. (2011). “Using Small-Area Models to Improve the Design-Based Estimates of Variance for County Level Poverty Rate Estimates in the American Community Survey,” Research Report Series (Statistics #2011-02), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Joyce, P. and Malec, D. (2009). “Population Estimation Using Tract Level Geography and Spatial Information,” Research Report Series (Statistics #2009-3), Statistical Research Division, U.S. Census Bureau, Washington, D.C.

Huang, E., Malec, D., Maples J., and Weidman, L. (2007). “American Community Survey (ACS) Variance Reduction of Small Areas via Coverage Adjustment Using an Administrative Records Match,” Proceedings of the 2006 Joint Statistical Meetings, American Statistical Association, Alexandria, VA, 3150-3152.

Maples, J. and Bell, W. (2007). “Small Area Estimation of School District Child Population and Poverty: Studying Use of IRS Income Tax Data,” Research Report Series (Statistics #2007-11), Statistical Research Division, U.S. Census Bureau, Washington, D.C.

Contact:

Jerry Maples, Gauri Datta, Kyle Irimata, Bill Bell (ADRM)

Funding Sources for FY 2025-2030:

0331 – Working Capital Fund / General Research Project

Various Decennial, Demographic, and Economic Projects

Related Information

Research and Expertise

Page Last Revised - July 16, 2025