Demographic and Income Model Methodology (2005-2007)

For years 2005 to 2007, SAHIE utilized the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC). In the CPS ASEC, "insured" was defined as being covered SOME TIME during the past calendar year. This definition is different from the American Community Survey (ACS) health insurance question which is utilized for SAHIE starting with the 2008 estimates and onward. The ACS asks, "Is this person CURRENTLY covered by [specifically stated] health insurance or health coverage plans?"

Due to these definitional differences, comparisons between 2008-2015 SAHIE estimates and earlier years are not recommended. Guidance on comparisons within SAHIE datasets is available.

For 2005-2007, SAHIE publishes STATE and COUNTY estimates of population with and without health insurance coverage, along with measures of uncertainty, for the full cross-classification of:

4 age categories: 0-64, 18-64, 40-64, and 50-64
3 sex categories: both sexes, male, and female
3 income categories: all incomes, as well as income-to-poverty ratio (IPR) categories 0-200% and 0-250% of the poverty threshold
4 races/ethnicities (for states only): all races/ethnicities, White not Hispanic, Black not Hispanic, and Hispanic (any race).

In addition, estimates for age category 0-18 in the 0-200% IPR income category are published.

Each year's estimates are benchmarked to the national CPS ASEC corresponding to the income year of the estimates. For example, the 2007 SAHIE estimates are adjusted so that before rounding the county estimates sum to their respective state totals and for key demographics the state estimates sum to the national 2008 CPS ASEC (which contains questions about income during calendar year 2007) numbers insured and uninsured.

The remainder of this page provides a summary of the demographic and income model methodology used for the SAHIE 2005 to 2007 estimates. Additional methodological detail is available at the below individual links. Technical papers that describe previous versions of the model are available on the Publications page.

SAHIE 2007 County and State Demographic and Income Model Methodology [<1.0 MB]
SAHIE 2006 County Demographic and Income Model Methodology [<1.0 MB]
SAHIE 2006 State Demographic and Income Model Methodology [<1.0 MB]
SAHIE 2005 County Demographic and Income Model Methodology [<1.0 MB]
SAHIE 2005 State Demographic and Income Model Methodology [<1.0 MB]

Overview

The SAHIE program produces model-based estimates of health insurance coverage for demographic groups within counties and states. We publish state estimates by sex (female, male, both), race/ethnicity (all races, non-Hispanic White, non-Hispanic Black, Hispanic), age (0-18, under 65, 18-64, 40-64, 50-64), and income. We publish county estimates by sex (female, male, both), age (0-18, under 65, 18-64, 40-64), and income. Income groups are defined by the income-to-poverty ratio (IPR) – the ratio of family income to the federal poverty level. The income groups estimated are: all incomes, IPR of 0-200%, and IPR of 0-250%.

Model Summary

For estimation, SAHIE uses statistical models that combine survey data from the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC) with administrative records data and Census 2000 data. The models are “area level” models because we use survey estimates and administrative data at certain levels of aggregation, rather than individual survey and administrative records. Our modeling approach is similar to that of common models developed for small area estimation, but with some additional complexities.

The published estimates are based on aggregates of modeled demographic groups. For states, we model at a base level defined by the full cross-classification of: four age groups, four race/ethnicity groups, both sexes and three income groups. For counties, we model at a base level defined by four age groups both sexes, and two income groups.

We use estimates from the Census Bureau’s Population Estimates Program for the population in groups defined for state by age by race/ethnicity by sex, and for county by age by sex. We treat these populations as known. Within each of these groups, the number with health insurance coverage in any of the income categories is given by that population multiplied by two unknown proportions to be estimated: the proportion in the income category and the proportion insured within that income category. The models have two largely distinct parts - an “income part” and an “insurance part” - that correspond to these proportions. We use survey estimates of the number in the income groups and of the proportion insured within those groups. We assume these survey estimates are unbiased and normally distributed. We also assume functional forms for the variances of the survey estimates that involve parameters that are estimated.

We treat supplemental variables that predict one or both of the unknown income and insurance proportions in two ways. Some of these variables are used as fixed predictors in a regression model. There is a regression component in both the income and insurance parts of the model. In each case, a transformation of the proportion is predicted by a linear combination of fixed predictors.

Some of these predictors are categorical variables that define the demographic groups we model. Others are continuous. The continuous fixed predictors include variables regarding employment from the County Business Patterns data file, the extent of homeownership, and demographic population.

We also utilize random continuous predictors, which include data from Census 2000, Internal Revenue Service, Supplemental Nutrition Assistance Program and Medicaid/Children’s Health Insurance Program. These are not fixed predictors in the model. Instead, we treat them as random, in a way similar to survey estimates. They are not unbiased estimators of the numbers. Instead, we assume that their expectations are linear functions of the total or number insured in an income group. We typically assume they are normally distributed with variances that depend on unknown parameters.

We formulate the model in a Bayesian framework and report the posterior means as the point estimates. We use the posterior means and variances together with a normal approximation to calculate symmetric 90-percent confidence intervals, and report their half-widths as the margins of error.

Controlling to National Estimates

We control the estimates to be consistent with specified national totals. As a result, when the estimates are summed over the states, they match specified national CPS ASEC one-year survey estimates. We match the national estimates for both the number insured and the number uninsured for the following groups:

Hispanic, ages 0-64
Non-Hispanic, ages 0-64 (2006 and 2007 only)
White non-Hispanic, ages 0-64
Black non-Hispanic, ages 0-64
Ages 0-64 in IPR 0-250%
Ages 0-17 in IPR 0-250% ¹
Ages 18-64.

Our margin of error estimates take into account that these controls are not without error.

We also control the estimates from the SAHIE county models to the state small area estimates of the number insured and uninsured by demographic group. As a result, there is arithmetic consistency across the geographic levels for many of the demographic groups.

¹ For estimating ages 0-18 (2006 and 2007 only), the control is Age 0-18 in IPR 0-200%

Related Information

Methodology

Demographic and Income Model Methodology (2001)

Page Last Revised - October 8, 2021