The 2010 SAHIE estimates utilize the American Community Survey (ACS) as the base, similar to the estimates for 2008 and 2009. For years prior to 2008, the SAHIE estimates utilized the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC). Other input data sources remain the same, as described further on this page.
The definitions of health insurance coverage differ between the two surveys. Insured was defined from the CPS ASEC as being covered SOME TIME during the past calendar year. The ACS health insurance question asks "Is this person CURRENTLY covered by [specifically stated] health insurance or health coverage plans?"
Due to these definitional differences, comparisons between 2008-10 SAHIE estimates and earlier years are not recommended. Guidance on comparisons within SAHIE datasets is available.The remainder of this page provides a summary of the demographic and income model methodology used for the SAHIE estimates. Additional methodological detail is available at the below individual links. Technical papers that describe previous versions of the model are available on the Publications page.
The SAHIE program produces model-based estimates of health insurance coverage within counties and states. Estimates are provided by sex (female, male, both), race/ethnicity (all races, non-Hispanic White, non-Hispanic Black, Hispanic), age (0-18, under 65, 18-64, 40-64, 50-64), and income groups. For 2008 and 2009, county-level estimates for the age 50-64 group are not available.
For estimation, SAHIE uses statistical models that combine survey data from the American Community Survey (ACS) with administrative records data and Census data. The models are “area level” models because we use survey estimates and administrative data at certain levels of aggregation, rather than individual survey and administrative records. Our modeling approach is similar to that of common models developed for small area estimation, but with some additional complexities.
The published estimates are based on aggregates of modeled demographic groups. For states, we model at a base level defined by the full cross-classification of: five age groups, four race/ethnicity groups, both sexes and five income groups. For counties, we model at a base level defined by the same age, sex and income groups, but no race/ethnicity breakdown.
We use estimates from the Census Bureau’s Population Estimates Program for the population in groups defined for states by age by race/ethnicity by sex, and for counties by age by sex. We treat these populations as known. Within each of these groups, the number with health insurance coverage in any of the income categories is given by that population multiplied by two unknown proportions to be estimated: the proportion in the income category and the proportion insured within that income category. The models have two largely distinct parts - an “income part” and an “insurance part” - that correspond to these proportions. We use survey estimates of the proportions in the income groups and of the proportion insured within those groups. We assume these survey estimates are unbiased. We also assume functional forms for the variances of the survey estimates that involve parameters that are estimated.
We treat supplemental variables that predict one or both of the unknown income and insurance proportions in two ways: some of these variables are fixed predictors in regression components of the model; others are random, modeled in ways similar to the survey data.
There is a regression component in both the income and insurance parts of the model. In each case, a transformation of the proportion is predicted by a linear combination of fixed predictors. Some of these predictors are categorical variables that define demographic groups; others are continuous. The continuous fixed predictors include variables regarding employment from the County Business Patterns data file, educational employment, and demographic population.
The supplemental variables we treat as random include data from the decennial census, aggregated federal income tax data, and participation in the Supplemental Nutrition Assistance Program and Medicaid/Children’s Health Insurance Program. For 2010, the decennial census data was replaced with lagged ACS five-year estimates. We model these in ways similar to survey estimates, in that their distributions depend on the proportions being estimated. But they are not unbiased estimators of these proportions or related totals. Instead, we assume that their expectations are parametric functions of either the total or the number insured in an income group. We typically assume they are normally distributed with variances that depend on unknown parameters.
We formulate the model in a Bayesian framework and report the posterior means as the point estimates. We use the posterior means and variances together with a normal approximation to calculate symmetric 90-percent confidence intervals, and report their half-widths as the margins of error.
We control the estimates to be consistent with specified national totals. As a result, when the estimates are summed over the states, they match specified national ACS survey estimates. We match the national estimates for both the number insured and the number uninsured for the following groups:
Our margin of error estimates take into account that these controls are not without error.
We also control the estimates from the SAHIE county models to the state small area estimates of the number insured and uninsured by demographic group. As a result, there is arithmetic consistency across the geographic levels for many of the demographic groups.