Skip header section
US Census Bureau
People Business Geography Newsroom Subjects A to Z Search@Census
 

Small Area Health Insurance Estimates

Skip top of page navigation
Census.govPeople and Households SAHIE Main SAHIE MethodologySAHIE Demographic and Income Model Methodology (2005) › County

SAHIE 2005 County Demographic and Income Model Methodology

Overview

The documentation presented here will be expanded upon in a forthcoming technical paper. Technical papers that describe previous versions of the model are available on the Publications page.

We estimate the number of people with health insurance coverage by county within demographic and income groups, and estimate the number without insurance as the difference between estimates of the number of people in a group and the number with insurance in that group. The number insured in a group is the product of the number in the group and the proportion in that group who are insured. Correspondingly, our model has two main parts: one for estimating the number of people in county demographic and income groups, and one for estimating the propor-tion with health insurance in these groups. Each part is a hierarchical two-level regression model. We use Bayesian methods to estimate the parameters in the model. Our estimates take into account that the estimates from the Annual Social and Economic Supplement (ASEC) of the Current Population Survey (CPS) for different counties have different reliabilities due to varying sample sizes in each county. Our estimates for counties with large sample sizes tend to be close to the CPS ASEC estimates. The demographic and income groups of the CPS ASEC estimates that are modeled are described in the Model Details section.

The dependent variables in the regression models are:

  • 3-year average estimates from the CPS ASEC.
    • proportions of people in demographic and income groups.
    • proportions insured in these groups.
  • Estimates from Census 2000 – Sample Data (i.e., the long form).
  • Numbers of IRS tax exemptions.
  • Numbers of food stamp participants.
  • Numbers of Medicaid and State Children's Health Insurance Program (SCHIP) participants.

The CPS ASEC estimates of the proportion of people in a county demographic and income group, and of the proportion insured in that group, are assumed to be unbiased. The other dependent variables are related to these proportions but are not assumed to be unbiased estimates of them. However, they are believed to be predictive of them. For more information about the dependent variables, see information about data inputs on the web page.

The universe for these health insurance estimates is the CPS ASEC poverty universe. Therefore, we use demographic population estimates (from the U.S. Census Bureau’s Population Estimates Program) that are adjusted to the CPS ASEC poverty universe to ensure that the small area health insurance estimates conform to this universe. For more information about the demographic population estimates, see information about data inputs.

We provide on the website margins of error for the estimates that represent the uncertainties associated with both sampling and modeling. These margins of errors can be used to construct 90-percent confidence intervals which are approximate Bayesian credible regions calculated using posterior standard deviations and a normal approximation.

Model Details

We estimate the number of people with health insurance coverage by county, age, sex, and income groups. The income groups are defined by the ratio of family income to the Federal Poverty Level (FPL), which is referred to as the income-to-poverty ratio (IPR). The number of people in a group with health insurance can be factored as the product of the number of people in the group and the proportion of these people who have health insurance. Correspondingly, there are two submodels: one for estimating the number of people in the groups and another for estimating the proportion of people in these groups who have health insurance. The number of people without health insurance is estimated as the difference between the estimates of the number of people in the group and the number with health insurance.

We provide estimates for age groups 0 to 64, 18 to 64, and 40 to 64, and for 0 to 18. The demo-graphic and IPR groups of the CPS ASEC estimates that are modeled for the first three age groups are:

  • Age: 0-17, 18-39, 40-64, 65+ (the 65+ group is used only in the model for the number of people by demographic and IPR groups).
  • Sex: Female, Male.
  • IPR: family IPR = 200%, IPR > 200% (also separately for IPR = 250%, IPR > 250%).

The development of these estimates is funded in part by the Centers for Disease Control and Prevention's (CDC's) National Breast and Cervical Cancer Early Detection Program (NBCCEDP). For some states the eligibility requirement is adult women with IPR = 200% and for other states the requirement is adult women with IPR = 250%. To provide estimates for IPR = 200% and IPR = 250%, the model was run once with IPR = 200% and IPR > 200% and a second time with IPR = 250% and IPR > 250%.

For the estimates of the 0 to 18 age group, the demographic and IPR groups are:

  • Age: 0-18, 19-39, 40-64, 65+ (the 65+ group is used only in the model for the number of people by demographic and IPR groups).
  • Sex: Female, Male.
  • IPR: family IPR = 200%, IPR > 200%.

A separate run was done to provide the estimates for this age group.

To simplify the discussion, the following methodology will be described only in terms of the IPR = 200% and IPR > 200% classification and the age groups 0-17, 18-39, 40-64, 65+.

Model for the Number of People for Counties by Demographic and IPR Groups

The model is a multivariate, two-level hierarchical model. The data sources are:

  • Three-year average estimates of the numbers of people from the CPS ASEC by county and demographic groups, and three-year average estimates of the number of people with IPR = 200% in each of these groups. From these estimates, we calculate estimates of the proportions of people from the CPS ASEC in county by demographic groups with IPR = 200%.
  • The estimates from Census 2000 – Sample Data for the same groups as the CPS ASEC.
  • The number of tax exemptions for six groups in each county defined by age groups 0-17, 18-64, and 65+ crossed with IPR = 200% and IPR > 200%. (For simplicity, we say the age groups are 0-17, 18-64, and 65+; actually, they are defined less exactly on tax forms. For more information about the tax data, see information about data inputs.)
  • The number of food stamp participants by county.

First Level

At the first level, we model the CPS ASEC proportions conditional on the proportion in the county demographic group with IPR = 200%. We model the 2000 Census – Sample Data, tax exemption data, and food stamp participants data conditional on the number of people in a county demographic group and IPR category. The number of people is the proportion multiplied by the demographic population estimate for the demographic group. These proportions are latent variables – unknown quantities – that are to be estimated by the model. Each data source is modeled as a regression where the independent variables are these latent variables. In all of the models, the errors are modeled as normally distributed and are assumed to be independent across age, sex, and IPR groups, as well as between the data sources.

  • The CPS ASEC estimated proportion in a group is modeled such that its expected value is the proportion of people in a county and demographic group with IPR = 200% and its variance is the sampling variance. We assume that the sampling variance has a particular functional form, which contains parameters that need to be estimated. We model the county CPS ASEC estimates of proportions because these estimates are more reliable than the CPS ASEC estimates of numbers of people.
  • The 2000 Census – Sample Data estimate in a demographic and IPR group is modeled as proportional to the number of people in that group. There are different proportionality factors for different age and IPR groups. The combined model and census sample data variance is modeled as proportional to the expected number of people raised to a power.
  • The tax exemption data are broken down into six age and income groups for each county. Each age and income group is modeled as a linear regression where the independent variables are the number of people in the demographic and IPR groups comprising the tax exemption group. The model variance is modeled as proportional to the expected number of people in the tax exemption group raised to a power.
  • The food stamp participants in a county are modeled as proportional to the number of people in a county in families with IPR = 200% raised to a power. The model vari-ance is modeled as proportional to the expected number of food stamp participants raised to a power.

Second Level

At the second level, we model the number in counties by demographic and IPR groups. We do this by modeling the proportion of those in a county by demographic group who are in each of the IPR groups. These proportions are then multiplied by the demographic population estimates to obtain the estimates for the county by demographic and IPR groups.

The proportion of people in the county by demographic group with IPR = 200% is modeled as a logistic regression model with independent normal errors. We assume the errors have constant variance. The independent variables for the model are:

  • An intercept term.
  • Main effects for age and sex.
  • Main effect for each state.
  • Two-way interactions between age and sex.
  • The population of the county.
  • The proportion Hispanic in the county.

Model for the Proportions of People with Health Insurance Coverage for Counties by Demographic and IPR Groups

The model for the proportion with health insurance coverage is a two-level hierarchical model. The data sources are:

  • The CPS ASEC estimates of the proportion of people with health insurance for county by demographic and IPR groups.
  • The number of Medicaid and State Childrens Health Insurance Program (SCHIP) participants in each county, for groups defined by age and sex.

First Level

At the first level, the CPS ASEC estimates and the numbers of participants in Medicaid and SCHIP are modeled, conditional on the proportions insured. The proportions insured are latent variables – unknown quantities – that are to be estimated by the model.

The CPS ASEC estimated proportion with health insurance is modeled such that the expected value of the CPS ASEC proportion is the proportion of people with health insurance in each of the corresponding groups. The errors are assumed to be normally distributed and independent. The variances are the sampling variances. We assume that the sampling variance has a particular functional form, which contains parameters that need to be estimated.

The Medicaid and SCHIP participation data are broken down into six age by sex groups for each county. The number of participants in each age by sex group is modeled as proportional to the number of people with health insurance coverage and IPR = 200% within the age by sex group. The proportionality constants are products of fixed and random effects. The errors are assumed to be independent of each other, and independent of the CPS ASEC estimates. The model variance is modeled as proportional to the expected number of Medicaid and SCHIP participants in the county age by sex group raised to a power.

Second Level

The proportion of people with health insurance in county by demographic and IPR groups is modeled as a logistic regression model with independent normal errors. We assume the errors have constant variance. The independent variables for the model are:

  • An intercept term.
  • Main effects for age and IPR.
  • Main effect for each state.
  • Two-way interactions between age and IPR ; between sex and IPR; and between age and sex.
  • Three-way interactions between age, sex, and IPR.
  • Two-way interactions between continuous predictors (listed below) and age; two-way interactions between continuous predictors and sex; and two-way interactions between continuous predictors and IPR.

All of the continuous predictors are at the county level. They are:

  • Population.
  • Mean of the log IPR, as estimated from tax returns.
  • Variance of the log IPR, as estimated from tax returns.
  • Proportion Hispanic.
  • Proportion non-citizens.
  • Proportion American Indian or Alaska Native.
  • Proportion with less than a high school education.
  • Proportion of owner-occupied households.
  • Proportion of households in rural areas.
  • Proportion of employees in retail firms.
  • Proportion of employees in non-retail firms with less than 20 employees.
  • Proportion of employees in non-retail firms with 100 or more employees.

Prior Distributions

All high-level parameters, such as regression coefficients and variance parameters are given prior distributions. The prior distributions for the regression coefficients are noninformative flat priors. The prior distributions for the other parameters are normal or gamma priors that generally carry little information.

Model Limitations

A model is an approximate, not exact, description of the distribution of the data. The models have been evaluated against the data and no major discrepancies have been found between the predictions from the model and the data. Research continues to improve the models so that they more accurately describe the distributions of the data. For example, modeling choices, including assumptions of independence, the choices of variance forms, and estimation of the sampling variances, have not been completely validated. Because the models are determined using the same data used to produce the estimates, and because the model used is one of many possible models for the data, we may underestimate variances of the estimates.

Estimation of the Number of People with Health Insurance Coverage

We use a Bayesian approach for estimating the parameters in the model and the number of people with health insurance. The estimated number of people with health insurance is the posterior mean conditional on the CPS ASEC data, the Census 2000 – Sample Data, the tax exemption data, the food stamp data, and the Medicaid and SCHIP participation data. The final estimate for a county demographic and IPR group is a complex mixture of the CPS ASEC estimate for that group and the other data. Estimates with large sample sizes, and thus low variances, tend to be close to the CPS ASEC estimates.

The method used to estimate the parameters in the model and to estimate the number of people with and without health insurance is called Markov Chain Monte Carlo (MCMC). This method involves drawing samples from the posterior distribution of the parameters in the model and the posterior distributions of the number of people with and without health insurance. Estimates for the number of people are the averages from the samples, called the posterior means. In order to control these estimates so that they agree with state small area estimates, this procedure was altered as described in the section Controlling to State Small Area Estimates.

Controlling to State Small Area Estimates

We control the estimates from the SAHIE county models to the state small area estimates of the number of people with health insurance, and to the state small area estimates of the number of people without health insurance, within cross-classifications of age, sex, and IPR groups. This ensures that county estimates of number of people with and without health insurance for this cross-classification sum to the corresponding state estimates.

We control to state small area estimates for two reasons. One is to guarantee consistency with the state small area estimates. The second reason is to control for possible weaknesses or failures of the model. We account for the variances of the controls by treating the controls as random quantities in the estimation program.

Measure of Errors and Confidence Intervals

One goal of the small area work is to provide measures of uncertainty surrounding the estimates. We provide on the website margins of error for the estimates that represent the uncertainties associated with both sampling and modeling. These margins of errors can be used to construct 90-percent confidence intervals which are approximate Bayesian credible regions calculated using posterior standard deviations and a normal approximation.


Source: U.S. Census Bureau | Small Area Health Insurance Estimates |  Last Revised: September 03, 2009