end of header
You are here: Census.govSubjects A to Z › Center for Statistical Research and Methodology (CSRM)
Skip top of page navigation

Center for Statistical Research and Methodology (CSRM)

Experimentation and Statistical Modeling

Motivation: Experiments at the Census Bureau are used to answer many research questions, especially those related to testing, evaluating, and advancing survey sampling methods. A properly designed experiment provides a valid, cost-effective framework that ensures the right type of data is collected as well as sufficient sample sizes and power are attained to address the questions of interest. The use of valid statistical models is vital to both the analysis of results from designed experiments and in characterizing relationships between variables in the vast data sources available to the Census Bureau. Statistical modeling is an essential component for wisely integrating data from previous sources (e.g., censuses, sample surveys, and administrative records) in order to maximize the information that they can provide. In particular, linear mixed effects models are ubiquitous at the Census Bureau through applications of small area estimation. Models can also identify errors in data, e.g. by computing valid tolerance bounds and flagging data outside the bounds for further review.

Research Problem:

  • Investigate methodology for experimental designs embedded in sample surveys; investigation of large-scale field experiments embedded in ongoing surveys; design based and model based analysis and variance estimation incorporating the sampling design and the experimental design; factorial designs embedded in sample surveys and the estimation of interactions; testing non-response using embedded experiments. Use simulation studies.
  • Research methods to provide principled measures of statistical variability for constructs like the POP Division's Population Estimates.
  • Assess feasibility of established design methods (e.g., factorial designs) in Census Bureau experimental tests.
  • Identify and develop statistical models (e.g., loglinear models, mixture models, and mixed-effects models) to characterize relationships between variables measured in censuses, sample surveys, and administrative records.
  • Assess the applicability of post hoc methods (e.g., multiple comparisons and tolerance intervals) with future designed experiments and when reviewing previous data analyses.
  • Develop predictors of random effects in mixed models based on fiducial quantities that can be used for point estimation. Fiducial inference has seen a revival in recent years as an alternative to maximum likelihood, restricted maximum likelihood, and Bayesian methods. Point estimation with fiducial quantities has not yet been addressed; an advantage of this approach is that explicit point estimators of unknown parameters would not be required.
  • Construct rectangular nonparametric tolerance regions for multivariate data. Tolerance regions for multivariate data are usually elliptical in shape, but such regions cannot provide information on individual components of the measurement vector. However, such information can be obtained through rectangular tolerance regions.
  • Investigate statistical methods for remote sensing data, such as multispectral and LIDAR images, and potential applications at the Census Bureau.

Potential Applications:

  • Modeling approaches with administrative records can help enhance the information obtained from various sample surveys.
  • Experimental design can help guide and validate testing procedures proposed for the 2020 Census. Embedded experiments can be used to evaluate the effectiveness of alternative contact strategies.
  • Expanding the collection of experimental design procedures currently utilized with the American Community Survey.
  • Fiducial predictors of random effects can be applied to mixed effects models such as those used in small area estimation.
  • Rectangular tolerance regions can be applied to multivariate economic data and aid in the editing process by identifying observations that are outlying in one or more attributes and which subsequently should undergo further review.

Accomplishments (October 2017 - September 2018):

  • Studied finite-sample bias in the fiducial distribution of some functions of multinomial probabilities. Released a research report with initial data analysis. Started work on a paper about the phenomenon.
  • Started work on a companion paper for the spatio-temporal change of support modeling package (stcos), featuring a small case study using ACS data. Carried out simulation studies to assess package correctness.
  • Identified numerical issues with computation of the COM-Poisson normalizing constant and began to address them in the COMPoissonReg package.
  • Developed flexible zero-inflated regression model that contains ZIP, ZINB, ZIB regression as special cases.
  • Developed flexible one-step autoregressive model for discrete data containing dispersion.
  • Proposed models for internet-mode self-response in ACS to understand qualities of non-response individuals.
  • Developed flexible model for multinomial data exhibiting either over- or underdispersion based on COM-Poisson counts.

Short-Term Activities (FY 2019):

  • Complete the report on the design and analysis of embedded experiments.
  • Complete companion paper for spatio-temporal change of support modeling package.
  • Complete paper on reduction of finite sample bias in fiducial inference.
  • Complete application of multivariate space-time models to ACS Special Tabulations and consider possible methodological improvements.
  • Complete paper on multinomial model based on COM-Poisson.

Longer-Term Activities (beyond FY 2019):

  • Develop generalized/flexible spatial and time series models motivated by the Conway-Maxwell-Poisson distribution.
  • Significant progress has been made recently on randomization-based causal inference for complex experiments; Ding (Statistical Science, 2017), Dasgupta, Pillai and Rubin (Journal of the Royal Statistical Society, Series B, 2015), Ding and Dasgupta (Journal of the American Statistical Association, 2016) and Mukerjee, Dasgupta and Rubin (Journal of the American Statistical Association, 2017). It is proposed to adopt these methodologies for analyzing complex embedded experiments, by taking into account the features of embedded experiments (for example, random interviewer effects and different sampling designs).

Selected Publications:

Gamage, G., Mathew, T., and Weerahandi, S. (2013). “Generalized Prediction Intervals for BLUPs in Mixed Models,” Journal of Multivariate Analysis, 120, 226-233.

Heim, K. and Raim, A.M. (2016). Predicting coverage error on the Master Address File using spatial modeling methods at the block level. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association.

Klein, M., Mathew, T. and Sinha, B. K. (2014). “Likelihood Based Inference Under Noise Multiplication,” Thailand Statistician. 12(1), pp.1-23. URL: http://www.tci-thaijo.org/index.php/thaistat/article/view/34199/28686

Mathew, T. and Young, D. S. (2013). “Fiducial-Based Tolerance Intervals for Some Discrete Distributions,” Computational Statistics and Data Analysis, 61, 38-49.

Mathew, T., Menon, S., Perevozskaya, I. and Weerahandi, S. (2016). “Improved Prediction Intervals in Heteroscedastic Mixed-Effects Models,” Statistics & Probability Letters, 114, 48-53.

Morris, D.S., Sellers, K.F., and Menger, A. (2017) Fitting a Flexible Model for Longitudinal Count Data Using the NLMIXED Procedure, SAS Global Forum Proceedings Paper 202-2017, SAS Institute: Cary, NC.

Raim, A.M. and Gargano, M.N. (2015). “Selection of predictors to model coverage errors in the Master Address File,” Research Report Series: Statistics #2015-04, Center for Statistical Research and Methodology, U.S. Census Bureau.

Raim, A.M. (2016). Informing maintenance to the U.S. Census Bureau's Master Address File with statistical decision theory. In JSM Proceedings, Government Statistics Section. Alexandria, VA: American Statistical Association.

Andrew M. Raim, Scott H. Holan, Jonathan R. Bradley, and Christopher K. Wikle (2017). “A Model Selection Study for Spatio-Temporal Change of Support,” in Proceedings, Government Statistics Section of the American Statistical Association, Alexandria, VA: American Statistical Association.

Sellers, K., Lotze, T., and Raim, A. (2017) COMPoissonReg: Conway-Maxwell-Poisson Regression, version 0.4.0, 0.4.1, https://cran.r-project.org/web/packages/COMPoissonReg/index.html

Sellers, K.F., and Morris, D. (In Press). “Under-dispersion Models: Models That Are ‘Under The Radar’”, Communications in Statistics – Theory and Methods.

Sellers, K.F., Morris, D.S., and Balakrishnan, N. (2016). “Bivariate Conway-Maswell-Poisson Distribution: Formulation, Properties, and Inference,” Journal of Multivariate Analysis, 150:152-168.

Sellers K.F., Morris D.S., Shmueli, G., and Zhu, L. (2017). “Reply: Models for Count Data (a response to a letter to the editor), The American Statistician.

Sellers, K.F. and Raim, A.M. (2016). "A flexible zero-inflated model to address data dispersion". Computational Statistics and Data Analysis, 99: 68-80.

Sellers, K., Morris, D., Balakrishnan, N., Davenport, D. (2017) multicmp: Flexible Modeling of Multivariate Count Data via the multivariate Conway-Maxwell-Poisson distribution, https://cran.r-project.org/web/packages/multicmp/index.html

Young, D.S. (2013). “Regression Tolerance Intervals,” Communications in Statistics – Simulation and Computation, 42(9), 2040-2055.

Young, D.S. (2014), "A procedure for approximate negative binomial tolerance intervals", Journal of Statistical Computation and Simulation, 84(2), pp.438-450. URL: http://dx.doi.org/10.1080/00949655.2012.715649

Young, D. and Mathew, T. (2015). “Ratio Edits Based on Statistical Tolerance Intervals.” Journal of Official Statistics 31, 77-100.

Young, D.S., Raim, A.M., and Johnson, N.R. (2017). "Zero-inflated modelling for characterizing coverage errors of extracts from the U.S. Census Bureau's Master Address File". Journal of the Royal Statistical Society: Series A. 180(1):73-97.

Zhu, L., Sellers, K.F., Morris, D.S., and Shmueli, G. (2017) Bridging the Gap: A Generalized Stochastic Process for Count Data, The American Statistician, 71 (1): 71-80.

Zhu, L., Sellers, K., Morris, D., Shmueli, G.,and Davenport, D. (2017) cmpprocess: Flexible Modeling of Count Processes, https://cran.r-project.org/web/packages/cmpprocess/index.html

Contact: Andrew Raim, Thomas Mathew, Kimberly Sellers, Dan Weinberg, Robert Ashmead, Scott Holan (R&M)

Funding Sources for FY 2018:

  • 0331 - Working Capital Fund / General Research Project
    Various Decennial and Demographic Projects

Annual and Quarterly Reports

X
  Is this page helpful?
Thumbs Up Image Yes    Thumbs Down Image No
X
No, thanks
255 characters remaining
X
Thank you for your feedback.
Comments or suggestions?
Source: U.S. Census Bureau | Research and Methodology Directorate | Center for Statistical Research & Methodology | (301) 763-9862 (or lauren.emanuel@census.gov) |   Last Revised: October 02, 2018