- We estimate a regression model which predicts the number of poor persons using three-year averages of county-level observations from the March Current Population Survey (CPS) as the dependent variable and administrative record and census data as the predictors. Though only the 1,527 counties with CPS sample cases are used to estimate the equation, we make regression "predictions" for all 3,143 counties.
- The model is multiplicative, that is, the number of poor is modeled as the product of a series of predictors which are numbers (not rates) and unknown errors. To estimate the coefficients in the model, we take logarithms of the dependent and all independent variables. While for facility of exposition we may omit reference to logs, all variables in the county regression models for numbers of poor are logarithmic.
- The estimates for different counties are of different reliability because of the size of the CPS sample in them. Our estimates take this factor into account.
- In order to make use of the information contained in the direct estimates for the 1,527 counties in the CPS, we combine the regression predictions with these direct estimates using Empirical Bayes (or "shrinkage") techniques. The Empirical Bayes techniques weight the contribution of the two components (regression and direct estimates) based on their relative precision.
- We force the estimates for the counties of a given state to sum to the independently derived state estimate.
- We provide a confidence interval, which represents uncertainty from both sampling and from modeling, for each estimate.

*Using counties in the CPS sample.* Our use of the CPS implicitly
assumes that the counties in the survey sample are representative of those
not selected, but this need not be the case. The CPS sample is designed
to represent the population, and only incidentally represents counties.
The characteristics of some counties guarantee that they are included,
e.g., most counties in large metropolitan areas and counties with large
populations. More generally, while all counties have a nonzero probability
of being included in the sample, some have higher probabilities than others.
Further, the probability of selection of a county may be related to its
income and poverty level. On the other hand, comparison of regression equations
based on 1990 census data for counties in the CPS sample and equations based on all
counties indicate remarkably similar results, providing some assurance that the
CPS counties are largely representative of all counties.

The survey weights used in estimation at the national level are not appropriate for county-level estimates. The CPS sample design selects some primary sampling units (usually a county or group of counties) to represent a set of counties in the same stratum. The sum of the weights for sample households from such a county estimates the total population of the entire set of counties it represents. Because we want each county in the CPS sample to stand for itself, we have adjusted the weights to make each county self-representing.

*Estimation of the model equation.* CPS sampling variances are
not constant over all counties. We avoid giving observations with larger
variances (a great deal of uncertainty) the same influence on the regression
as observations with smaller variances (less uncertainty) by, in effect,
weighting each observation by the inverse of its uncertainty. Representing
this uncertainty requires recognizing that it arises from two sources:

- uncertainty about the true value which the sample data estimate for each county (sampling error), and
- uncertainty about where the true county values lie with respect to the regression surface (lack of fit).

To estimate the lack of fit component, we estimate our model using the 1990 census data and assume that the lack of fit component of residual variance is the same when the same model is fit to the CPS and to the census. Since we do have separate estimates of sampling variance for each observation in the 1990 Census, we use them to estimate the unknown lack of fit component with a maximum likelihood procedure. (See "Appendix C: Accuracy of the Data" in 1990 Census, STF3 documentation.)

Next we fit a regression equation to the CPS data. We assume the sampling variance of the log of the number of poor is inversely proportional to the sample size (in households) and the lack of fit variance is the same as that estimated in the Census regression. We estimate the CPS regression parameters and the two components of CPS variance with a maximum likelihood procedure.

*Combining model and direct survey estimates.* The final estimates
are weighted averages of the model predictions and the direct CPS estimates,
where they exist. The two weights for each county add to 1.0 and the weight
on the model prediction is computed as the sampling variance divided by
the total variance (sampling plus lack of fit) of the direct estimate.
Using this technique, the larger the sampling variance of the survey estimate,
the smaller its contribution and the larger the contribution from the prediction
model. These weights are commonly referred to as "shrinkage weights,"
and the final estimates as "shrinkage" or "Empirical Bayes"
estimates. For counties which are not in the CPS sample, the weight on the
model's predictions is one and the weight on the direct survey estimate
is zero.

*Controlling to State Estimates.* Completion of the shrinkage
estimates does not produce the final county estimates of the number of
poor. The last step in the process is to transform the county estimates
from the log scale to estimates of numbers and control them to the state
estimates that have been derived independently. In our current approach,
a simple ratio adjustment is made to the county level estimates to assure
that they add to the state totals. Model-based estimates at the state level
are controlled to the national level direct estimates provided by the March
1994 CPS. The estimated standard errors of the county estimates are adjusted
to reflect this additional level of control.

*Standard Errors and Confidence Intervals.* One of the goals
of our small area estimation work is to provide estimates of the uncertainty
surrounding the estimates of the numbers of poor. The census and model-based
estimates shown in the tables are accompanied by their 90 percent confidence
intervals. These intervals were constructed from estimated standard errors.

For the model-based estimates, the standard error depends mainly on the uncertainty about the model and the CPS sampling variance. While the variance of the shrinkage weights could also be a significant component of uncertainty about our estimates (if sizeable and ignored we would be underestimating the standard errors), our research indicates that its contribution is negligible. For the census, the standard errors were derived from a set of generalized variance functions that reflect the nature of the census sample design for the long form questionnaire. (For further information, see Quantifying Uncertainty in State and County Estimates.)

The model is multiplicative, that is, the number of poor is modeled
as the product of a series of predictors which are numbers (not rates),
and unknown errors. To estimate the coefficients in the model, we take
logarithms of the dependent and all independent variables. Our choice
of a multiplicative model is motivated in part by the fact that the distribution
of county numbers of poor has a huge range -- the 1990 Census distribution
ranges from zero in some counties to more than a million in the largest
county, with a mean of 10,000 -- and it is highly skewed. Taking the logarithm
of all variables makes their distributions more centered and symmetrical.
It has the effect of diminishing the otherwise inordinate influence of
large counties on the coefficient estimates. Another advantage of a multiplicative
model is that it makes it plausible to maintain that the (unobserved) errors
for every county no matter how large or small, are drawn from the same
distribution. *All* variables in the county regression models for
numbers of poor are logarithmic.

The predictor variables in the income year 1993 regression model for the total number of poor people by county are:

- The log of the number of 1993 tax return exemptions (all ages) on returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form.
- The log of number of 1993 food stamp recipients.
- The log of the estimated total 1993 resident population.
- The log of total number of 1993 tax return exemptions.
- The log of the 1990 Census estimate of the total number of poor.

(For further information on these variables see Information about Data Inputs.)

The dependent variable is the log of the total number of poor in each county as measured by the three-year average of values from the March 1993-1995 CPS's. The regression predictions, in the log scale, are combined with the logs of the direct CPS sample estimates and then transformed to estimates of the numbers of poor, which are finally controlled to the independent estimates of state totals.

**The Model for the Number of Related Children Ages 5 to 17
in Families in Poverty**

The estimation model for related children age 5 to 17 in poverty parallels that for all people in poverty in structure. There are five predictor variables:

- The log of the number of 1993 child exemptions indicated on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form.
- The log of the 1993 number of food stamp recipients.
- The log of the 1993 estimated resident population under age 18.
- The log of the total number of child exemptions indicated on 1993 tax returns.
- The log of the 1990 Census estimate of the number of poor related children age 5 to 17.

(For further information on these variables see Information about Data Inputs.)

The dependent variable is the log of the number of poor related children age 5 to 17 in each county as measured by the three-year weighted average of the March 1993-1995 CPS's. The regression predictions, in the log scale, are combined with the logs of the direct CPS sample estimates and then transformed to estimates of the numbers of poor, which are finally controlled to the independent estimates of state totals.

**The Model for the Number of Poor People Under Age 18**

The estimation model for poor people under age 18 in poverty is quite similar. There are five predictor variables:

- The log of the number of 1993 child exemptions indicated on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form.
- The log of the 1993 number of food stamp recipients.
- The log of the 1993 estimated resident population under age 18.
- The log of the total number of child exemptions indicated on 1993 tax returns.
- The log of the 1990 Census estimate of the number of poor persons under age 18.

(For further information on these variables see Information about Data Inputs.)

The dependent variable is the log of the number of poor persons under age 18 in each county as measured by the three-year weighted average of the March 1993-1995 CPS's. The regression predictions, in the log scale, are combined with the logs of the direct CPS sample estimates and then transformed to estimates of the numbers of poor, which are finally controlled to the independent estimates of state totals.

**The Model for Median Household Income**

The predictor variables in the regression model to generate the estimates for median 1993 household income by county are:

- The median adjusted gross income from 1993 tax returns.
- The ratio of the number of dependent ("zero exemption") tax returns -- returns representing persons claimed as dependents on other returns -- to the total number of returns.
- The log of the proportion of the 1993 Bureau of Economic Analysis (BEA) estimate of total personal income derived from government transfers.
- The 1990 Census estimate of median household income.
- The ratio of the 1993 BEA estimate of per capita total personal income to the 1989 estimate.
- The product of the 1990 Census median household income and the ratio of 1993 to 1989 BEA per capita total personal income.

(For further information on these variables see Information about Data Inputs.)

The dependent variable is the county median household income as measured by the three-year average of the March 1993-1995 CPS's (income for years 1992-1994). Adjustments were made to the March 1993 and 1995 CPS to express incomes in 1993 dollars before the median incomes were computed.

Source: U.S. Census Bureau | Small Area Income and Poverty Estimates |
Last Revised:
April 29, 2013