SAHIE Demographic and Income Model Methodology 2001: Model for the Number of People for States by Demographic Group and IPR Categories
The model is a multivariate, two-level hierarchical model. The data sources are:
the estimated numbers of people from the CPS ASEC by state and demographic group and IPR category;
the estimates from Census 2000 long form for the same categories as the CPS ASEC;
the numbers of tax exemptions for four groups in each state defined by age groups 0-17 and 18+ crossed with IPR ≤ 200% and IPR > 200% categories; and
the number of Supplemental Nutrition Assistance Program (SNAP) participants, formerly known as the Food Stamp program, by state.
At the first level, we model the data sources, conditional on the numbers in the state demographic group by IPR categories. These numbers are latent variables - unknown quantities - that are to be estimated by the model. Each data source is modeled as a linear regression where the independent variables are these latent variables. In all of the models, the errors are modeled as normally distributed and are assumed independent between age, race, sex and IPR categories, as well as between the data sources.
The CPS ASEC estimate in a category is modeled such that its expected value is the number of people in that category and the variance is the modeled sampling variance.
The Census 2000 estimate in a category is modeled to be proportional to the number of people in that category. There are different proportionality factors for different race/ethnicity groups. The model and census sampling variance are jointly modeled as proportional to the number of people.
The tax exemption data is broken down into four age and income categories for each state. Each age and income category is modeled as a linear regression where the independent variables are the number of people in a subset of the demographic and IPR categories comprising the tax exemption category. The model variance is modeled as proportional to a power of 1.7 of the number of people in the state and age group. The factor of 1.7 was chosen because it fit well in tests of the model.
The SNAP participants in a state are modeled as proportional to the number of people in a state in families with IPR ≤ 200%. The model variance is modeled as proportional to the number of SNAP participants.
At the second level, we model the numbers in states by demographic and IPR groups. We do this by modeling the proportions of those in a state by demographic group who are in each of the IPR categories. These proportions are then multiplied by known population totals to obtain the estimates for the state by demographic and IPR groups.
The proportions of people in the state by demographic group who are in IPR categories are modeled as a multiple-category logistic regression with independent normal errors. The independent variables for the model are:
sex, age, and race/ethnicity each crossed with the IPR categories;
three-way effects of sex, age, and IPR; sex, race/ethnicity and IPR; and age, race/ethnicity and IPR; and
interactions of the state tax non-filing rates with the IPR categories.