It is often the case that data collected from large-scale surveys can be used to produce high quality estimates at large domains. However, data users are often interested in more granular domains or regions than can be reasonably supported by the data due to small samples which can lead to both imprecise estimates as well as unintended disclosure of respondent data. Indirect methods of inference which utilize statistical models, latent Gaussian processes, and auxiliary data sources have proven to be an effective method for improving the quality of published data products. In addition, there is often a high degree of clustering and spatial correlation present in these large data sets which can be exploited to improve precision. Statistical modeling can be used to incorporate spatial, multivariate, and temporal dependencies as well as to integrate various data sources to both improve quality as well as to produce new estimates in regions and sub-domains with sparse or no data.
· Statistical methodology for integration of data from various sources.
· Development of unit-level models.
· Incorporation of survey weights in statistical models.
· Development of change-of-support methodology.
· Development of computationally efficient methods for fitting models to non-Gaussian data.
· Incorporation of spatially-correlated random effects in small area models.
· Model-based methods for prediction at low geographic levels.
· Mean-squared error, uncertainty, and interval estimation.
· Synthesis of privacy protection and model-based inference.
· Nonparametric covariance estimation.
· Inference for irregularly spaced observations from locally-stationary random fields.
· Developing Bayesian pseudolikelihood models for unit-level data obtained from a complex sample survey which incorporate spatio-temporal dependencies. (Janicki, Holan)
· Development of change-of-support methodology for inference on regions with no direct measurement, based on observations on a distinct geographic region or grid. (Janicki, Holan, Lahiri)
· Incorporation of spatially-correlated random effects in small area models. (Aleshin-Guendel, Datta, Janicki, Maples)
· Integration of deep learning, machine learning, and model selection, with spatial modeling. (Janicki, Holan)
· Production of “gridded” data products which correspond to a regular lattice which remains constant over time.
· Improved precision and interpretability of privacy-protected decennial census tables.
· Estimation of health insurance coverage by different demographic classifications at different geographic levels.
· Creation of new custom tabulations of ACS data products.
· Improvement of the precision of noisy measurements of census counts or other variables subject to disclosure avoidance techniques.
· Methodology for producing public use synthetic micro data.
· Developed and implemented small area estimation methodology to produce state level estimates for all Annual Integrated Economic Survey core items by three-digit North American Industry Classification System groups.
· Developed statistical models for noisy measurements of decennial census tabulations subject to constraints and implemented this methodology to produce five count tables and three ratio tables, along with associated measures of uncertainty, as official data products.
· Applied machine learning methods to a downscaling problem where the target of statistical inference is prediction of the total of a response variable of interest over a user-specified spatial region using a large number of potentially useful covariates.
· Developed methods to utilize multiple data sources such as sample survey data, tax records, and other administrative data sources, as well as variable selection techniques to select a subset of available predictors, to estimate the number of domestic migrants and the rate of domestic migration, as well as to provide uncertainty measures for the estimated counts and rates.
· Developed a multivariate spatial mixture model for American Community Survey special tabulations which can be used to produce model-based predictions when the survey-specific sample size is insufficient, either due to privacy concerns or data quality concerns.
· Developed spatial models for differentially private measurements of decennial census counts and ratios for improving precision and aggregating to marginal table cells.
· Developed a spatial change-of-support model for predicting counts in regions where no direct response variable is available.
· Produce model-based estimates of 2020 decennial census counts using spatial models fit to differentially private measurements for count and ratio tables at sub-state geographies.
· Exploration of novel uses of auxiliary data and data integration for improved prediction and development of new data products.
· Research the extent to which utilization of spatial information and multivariate dependencies can reduce the impact of the effect of differential privacy on the precision of data products.
· Development of software for efficiently fitting a variety of spatial, spatio-temporal, longitudinal, mixture, and other hierarchical Bayesian models.
· Investigate new and efficient computational methods for fitting high-dimensional models.
· Development of model-based methods for inference on very small domains, such as block groups, when the data are very sparse and are not of sufficient quality for publication.
· Development of efficient methods for producing special tabulations which of survey data and which meet the U. S. Census Bureau’s data quality standards.
· Development of methodology for producing estimates at non-standard geographies such as American Indian and Alaska Native areas and school districts
· Methodology for producing synthetic microdata which can be made publicly available for data users.
Aleshin-Guendel, S. and Steorts, R. (2024). “Monitoring Convergence Diagnostics for Entity Resolution,” Annual Review of Statistics and Its Applications, Vol 11, 419-435.
Wang, Q., Parker, P.A., and Lund, R. (2025). “Spatial Deep Convolution Neural Networks.” Spatial Statistics, Vol 66.
Aleshin-Guendel, S. and Wakefield, J. (2024). “Adaptive Gaussian Markov Random Fields for Child Mortality Estimation,” Biostatistics, Vol. 26, No. 1.
Aleshin-Guendel, S., Sadinle, M., and Wakefield, J. (2024). “The Central Role of the Identifying Assumption in Population Size Estimation,” Biometrics (with Discussion), Vol. 80, No. 1.
Parker, P.A. (2024). “Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks.” Journal of Official Statistics, Vol 40, No. 2, 317-332.
Parker, P., Holan, S.H., and Janicki, R. (2024). “Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure,” Journal of Survey Statistics and Methodology, Vol. 12, 1061-1080.
Janicki, R., Holan, S.H., Irimata, K. M., Livsey, J., and Raim, A. (2023). “Spatial Change of Support Models for Differentially Private Decennial Census Counts of Persons by Detailed Race and Ethnicity,” Journal of Statistical Theory and Practice, Vol. 17.
Parker, P., Holan, S.H., and Janicki, R. (2023). “Comparison of Unit Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling,” Journal of Survey Statistics and Methodology, Vol 11, No. 4, 858-872.
Parker, P., Holan, S.H., and Janicki, R. (2023). “A Comprehensive Overview of Unit Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling,” Journal of Survey Statistics and Methodology, Vol 11, No. 4, 829-857.
Parker, P., Holan, S H., and Janicki, R. (2022). “Computationally Efficient Bayesian Unit-Level Models for Multivariate Non- Gaussian Data Under Informative Sampling,” Annals of Applied Statistics, 16, 887 – 904.
Janicki, R., Raim, A., Holan, S.H., and Maples, J. (2022). “Bayesian Nonparametric Multivariate Spatial Mixture Mixed Effects Models with Application to American Community Survey Special Tabulations,” Annals of Applied Statistics, 16, 144 – 168.
Parker, P., Holan, S.H., and Janicki, R. (2020). “Conjugate Bayesian Unit-level Modeling of Count Data Under Informative Sampling Designs,” Stat, 9, e267.
Janicki, R., Holan, S.H., Irimata, K. M., Livsey, J. A. and Raim, A. M. (2024). “Bayesian Methods to Improve the Accuracy of Differentially Private Measurements of Constrained Parameters,” arXiv:2406.18455.
Irimata, K., Holan, S.H., Janicki, R., Livsey, J.A., and Raim, A.M. (2022). “Evaluation of Bayesian Hierarchical Models of Differentially Private Data Based on an Approximate Data Model,” Research Report Series (Statistics #2022-05), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.
Ryan Janicki, Soumen Lahiri, Scott Holan (ADRM), Serge Aleshin-Guendel
0331 – Working Capital Fund / General Research Project
Various Decennial, Demographic, and Economic Projects