Starting in 2008, SAHIE began utilizing the American Community Survey (ACS) as the base. For years prior to 2008, the SAHIE estimates utilized the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC). Other input data sources remain the same, as described further on this page.
The definitions of health insurance coverage differ between the two surveys. Insured was defined from the CPS ASEC as being covered SOME TIME during the past calendar year. The ACS health insurance question asks "Is this person CURRENTLY covered by [specifically stated] health insurance or health coverage plans?"
Due to these definitional differences, comparisons between 2008-2012 SAHIE estimates and earlier years are not recommended. Guidance on comparisons within SAHIE datasets is available.The remainder of this page provides a summary of the demographic and income model methodology used for the SAHIE estimates. Additional methodological detail is available at the below individual links. Technical papers that describe previous versions of the model are available on the Publications page.
The SAHIE program produces model-based estimates of health insurance coverage for demographic groups within counties and states. We publish state estimates by sex (female, male, both), race/ethnicity (all races, non-Hispanic White, non-Hispanic Black, Hispanic), age (0-18, under 65, 18-64, 40-64, 50-64), and income. We publish county estimates by the same sex, age and income groups, but not by race. Income groups are defined by the income-to-poverty ratio (IPR) – the ratio of family income to the [appropriate] federal poverty level. We produce estimates for all incomes, and for the IPR groups: 0-138%, 0-200%, 0-250%, 0-400%, and, beginning in 2012, 138-400% IPR.
For estimation, SAHIE uses statistical models that combine survey data from the American Community Survey (ACS) with administrative records data and Census 2010 data. The models are "area-level" models because we use survey estimates and administrative data at certain levels of aggregation, rather than individual survey and administrative records. Our modeling approach is similar to that of common models developed for small area estimation, but with some additional complexities.
The published estimates are based on aggregates of modeled demographic groups. For states, we model at a base level defined by the full cross-classification of: four age groups, four race/ethnicity groups, both sexes, and five income groups. For counties, we model at a base level defined by the same age, sex, and income groups.
We use estimates from the Census Bureau’s Population Estimates Program for the population in groups defined for state by age by race/ethnicity by sex, and for county by age by sex. We treat these populations as known. Within each of these groups, the number with health insurance coverage in any of the income categories is given by that population multiplied by two unknown proportions to be estimated: the proportion in the income category and the proportion insured within that income category. The models have two largely distinct parts - an "income part" and an "insurance part" - that correspond to these proportions. We use survey estimates of the number in the income groups and of the proportion insured within those groups. We assume these survey estimates are unbiased and normally distributed. We also assume functional forms for the variances of the survey estimates that involve parameters that are estimated. We treat supplemental variables that predict one or both of the unknown income and insurance proportions in one of two ways:
Some of these variables are used as fixed predictors in a regression model. There is a regression component in both the income and insurance parts of the model. In each case, a transformation of the proportion is predicted by a linear combination of fixed predictors. Some of these predictors are categorical variables that define the demographic groups we model. Others are continuous. The continuous fixed predictors include variables regarding employment from the County Business Patterns data file, educational employment, and demographic population.
We also utilize random continuous predictors, which include data from 5-year ACS, Internal Revenue Service, Supplemental Nutrition Assistance Program, and Medicaid/Children’s Health Insurance Program. These are not fixed predictors in the model. Instead, we treat them as random, in a way similar to survey estimates, but not as unbiased estimators of the numbers. Instead, we assume that their expectations are linear functions of the number in an income group or the number insured within an income group. We typically assume they are normally distributed with variances that depend on unknown parameters.
We formulate the model in a Bayesian framework and report the posterior means as the point estimates. We use the posterior means and variances together with a normal approximation to calculate symmetric 90-percent confidence intervals, and report their half-widths as the margins of error.
We control the estimates to be consistent with specified national totals. As a result, when the estimates are summed over the states, they match specified national ACS survey estimates. We match the national estimates for both the number insured and the number uninsured for the following groups:
Our margin of error estimates take into account that these controls are not without error.
We also control the estimates from the SAHIE county models to the state small area estimates of the number insured and uninsured by demographic group. As a result, there is arithmetic consistency across the geographic levels for many of the demographic groups.