Small Area Health Insurance Estimates (SAHIE)

Skip top of page navigation
Census.govPopulation SAHIE Main SAHIE Methodology › Demographic and Income Model Methodology (2006 and 2007)

SAHIE 2005 - 2007 Demographic and Income Model Methodology: Summary for Counties and for States

This page provides a summary of the demographic and income model methodology used for the SAHIE 2005 to 2007 estimates. Additional methodological detail is available at the below individual links. Technical papers that describe previous versions of the model are available on the Publications page.


The SAHIE program produces model-based estimates of health insurance coverage for demographic groups within counties and states. We publish state estimates by sex (female, male, both), race/ethnicity (all races, non-Hispanic White, non-Hispanic Black, Hispanic), age (0-18, under 65, 18-64, 40-64, 50-64), and income. We publish county estimates by sex (female, male, both), age (0-18, under 65, 18-64, 40-64), and income. Income groups are defined by the income-to-poverty ratio (IPR) – the ratio of family income to the federal poverty level. The income groups estimated are: all incomes, IPR of 0-200%, and IPR of 0-250%.

Model Summary

For estimation, SAHIE uses statistical models that combine survey data from the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC) with administrative records data and Census 2000 data. The models are “area level” models because we use survey estimates and administrative data at certain levels of aggregation, rather than individual survey and administrative records. Our modeling approach is similar to that of common models developed for small area estimation, but with some additional complexities.

The published estimates are based on aggregates of modeled demographic groups. For states, we model at a base level defined by the full cross-classification of: four age groups, four race/ethnicity groups, both sexes and three income groups. For counties, we model at a base level defined by four age groups both sexes, and two income groups.

We use estimates from the Census Bureau’s Population Estimates Program for the population in groups defined for state by age by race/ethnicity by sex, and for county by age by sex. We treat these populations as known. Within each of these groups, the number with health insurance coverage in any of the income categories is given by that population multiplied by two unknown proportions to be estimated: the proportion in the income category and the proportion insured within that income category. The models have two largely distinct parts - an “income part” and an “insurance part” - that correspond to these proportions. We use survey estimates of the number in the income groups and of the proportion insured within those groups. We assume these survey estimates are unbiased and normally distributed. We also assume functional forms for the variances of the survey estimates that involve parameters that are estimated.

We treat supplemental variables that predict one or both of the unknown income and insurance proportions in two ways. Some of these variables are used as fixed predictors in a regression model. There is a regression component in both the income and insurance parts of the model. In each case, a transformation of the proportion is predicted by a linear combination of fixed predictors.

Some of these predictors are categorical variables that define the demographic groups we model. Others are continuous. The continuous fixed predictors include variables regarding employment from the County Business Patterns data file, the extent of homeownership, and demographic population.

We also utilize random continuous predictors, which include data from Census 2000, Internal Revenue Service, Supplemental Nutrition Assistance Program and Medicaid/Children’s Health Insurance Program. These are not fixed predictors in the model. Instead, we treat them as random, in a way similar to survey estimates. They are not unbiased estimators of the numbers. Instead, we assume that their expectations are linear functions of the total or number insured in an income group. We typically assume they are normally distributed with variances that depend on unknown parameters.

We formulate the model in a Bayesian framework and report the posterior means as the point estimates. We use the posterior means and variances together with a normal approximation to calculate symmetric 90-percent confidence intervals, and report their half-widths as the margins of error.

Controlling to National Estimates

We control the estimates to be consistent with specified national totals. As a result, when the estimates are summed over the states, they match specified national CPS ASEC one-year survey estimates. We match the national estimates for both the number insured and the number uninsured for the following groups:

  • Hispanic, ages 0-64
  • Non-Hispanic, ages 0-64 (2006 and 2007 only)
  • White non-Hispanic, ages 0-64
  • Black non-Hispanic, ages 0-64
  • Ages 0-64 in IPR 0-250%
  • Ages 0-17 in IPR 0-250% 1
  • Ages 18-64.

Our margin of error estimates take into account that these controls are not without error.

We also control the estimates from the SAHIE county models to the state small area estimates of the number insured and uninsured by demographic group. As a result, there is arithmetic consistency across the geographic levels for many of the demographic groups.

1 For estimating ages 0-18 (2006 and 2007 only), the control is Age 0-18 in IPR 0-200%

[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe. This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
Source: U.S. Census Bureau | Small Area Health Insurance Estimates |  Last Revised: August 29, 2012