The 2001 and 2002 state and county estimates of poverty and income were released December 6, 2004. For an overview of the changes in methodology between this release and the previous release see Estimation Procedure Changes.

Several features of the 2001 and 2002 county estimates should be noted.

- County estimates for income years 2001 and 2002 were released together and were produced using identical methodology.
- We estimate a regression model which predicts the number of people in poverty using a 3-year average of county-level observations from the Annual Social and Economic Supplement (ASEC) of the Current Population Survey (CPS) as the dependent variable, and administrative records and census data as the predictors. Although we use only the approximately 1,200 counties with CPS ASEC sample cases to estimate the equation, we make regression "predictions" for all 3,141 counties.
- The model is multiplicative; that is, we model the number of people in poverty as the product of a series of predictors which are numbers (not rates) and have unknown errors. When estimating the coefficients in the model, we take logarithms of the dependent and all independent variables. While we may omit reference to logs in the description, all variables in the county regression models for numbers of people in poverty are logarithmic.
- The CPS ASEC estimates for different counties are of different reliability because of the size of the sample in them. Our estimates take this factor into account.
- To use the information contained in the direct estimates for the approximately 1,200 counties in the CPS ASEC, we combine the regression predictions with these direct estimates using Empirical Bayes (or "shrinkage") techniques. The Empirical Bayes techniques weight the contribution of the two components (regression and direct estimates) based on their relative precision.
- We control the estimates for the counties of a given state to sum to the independently derived state estimate (which in turn have been controlled to sum to the official national estimate).
- We provide a confidence interval, which represents uncertainty from both sampling and from modeling, for each estimate.

**Using counties in the CPS ASEC sample**

Our use of the CPS ASEC implicitly assumes
that the counties in the survey sample are representative of those not selected,
but this need not be the case. The CPS ASEC sample is designed to represent each
state's population and only incidentally represents counties. The characteristics
of some counties guarantee that they are included, e.g., most counties in large
metropolitan areas and counties with large populations. More generally, while
all counties have a nonzero probability of being included in the sample, some
have higher probabilities than others. Further, the probability of selecting
a county may be related to its income and poverty level. On the other hand,
comparison of regression equations based on census data for counties in the CPS
ASEC sample and equations based on all counties indicate remarkably similar
results, providing some assurance that the CPS ASEC counties are largely representative
of all counties.

The survey weights used in estimation at the national level are not appropriate for county-level estimates. The CPS ASEC sample design selects some primary sampling units (usually a county or group of counties) to represent a set of counties in the same stratum. The sum of the weights for sample households from such a county estimates the total population of the entire set of counties it represents. Because we want each county in the CPS ASEC sample to stand for itself, we have adjusted the weights to make each county self-representing.

**Estimation of the model equation**

CPS ASEC sampling variances are not constant
over all counties. We avoid giving observations with larger variances (a great
deal of uncertainty) the same influence on the regression as observations with
smaller variances (less uncertainty) by, in effect, weighting each observation
by the inverse of its uncertainty. Representing this uncertainty requires recognizing
that it arises from two sources:

- uncertainty about where the estimates lie relative to the true values for each county (sampling error), and
- uncertainty about where the true county values lie with respect to the regression surface (lack of fit).

To estimate the lack-of-fit component, we estimate our model using the Census
2000 data and assume that the lack-of-fit component of residual variance is
the same when the same model is fit to the CPS ASEC and to the census. Since we
have separate estimates of sampling variance for each observation in Census
2000, we use them to estimate the unknown lack-of-fit component with a maximum
likelihood procedure (for information on variances for Census 2000, see

"Chapter 8: Accuracy of the Data" in Census 2000, STF3 documentation).
(PDF 7.4M)

Next we fit a regression equation to the CPS ASEC data. We assume the sampling variance of the log of the number of people in poverty is inversely proportional to the square root of the sample size (in households) and the lack-of-fit variance is the same as that estimated in the census regression. We estimate the CPS ASEC regression parameters and the two components of the CPS ASEC variance with a maximum likelihood procedure.

**Combining model and direct survey estimates**

Final estimates are weighted
averages of the model predictions and the direct CPS ASEC estimates, where they
exist. The two weights for each county add to 1.0, and we compute the weight
on the model prediction as the sampling variance divided by the total variance
(sampling plus lack-of-fit) of the direct estimate. With this technique, the
larger the sampling variance of the direct estimate, the smaller its contribution
and the larger the contribution from the prediction model. These weights are
commonly referred to as "shrinkage weights," and the final estimates as "shrinkage"
or "Empirical Bayes" estimates. For counties not in the CPS ASEC sample, the weight
on the model's predictions is 1.0 and the weight on the direct survey estimate
is zero.

**Controlling to state estimates**

The last steps in the production process
are transforming the county estimates from the log scale to estimates of numbers
and controlling them to the independently derived state estimates. We make a
simple ratio adjustment to the county-level estimates to ensure that they sum
to the state totals. We control model-based estimates at the state level to
the national level direct estimates derived from the CPS ASEC. We adjust the estimated
standard errors of the county estimates to reflect this additional level of
control.

The estimates for the number of school-aged children in poverty are handled slightly differently. The Department of Education, a major sponsor of the SAIPE program, requires that the estimated numbers of school-aged children in poverty be integers. We use an algorithm to round the counties' estimates in a way that forces the sum of the estimates of school-aged children in poverty for the counties to sum to the estimate for the states. Note that this algorithm is first applied to the states' estimates, so they are integers and add to the integer-valued national estimate.

We do not control estimates of county median household income to the state medians. This would require that the estimation model produce the entire household income distribution, rather than just the median as it does now.

**Standard errors and confidence intervals**

One goal of our small area estimation work is to provide estimates of the uncertainty surrounding the estimates of the numbers of people in poverty. The census and model-based estimates shown in the tables are accompanied by their 90-percent confidence intervals. These
intervals were constructed from estimated standard errors.

For the model-based estimates, the standard error depends mainly on the uncertainty about the model and the CPS ASEC sampling variance. While the variance of the shrinkage weights could also be a significant component of uncertainty about our estimates (if sizeable and ignored, we would be underestimating the standard errors), our research indicates that its contribution is negligible.

For the decennial census, we derive the standard errors from a set of generalized variance functions that reflects the nature of the decennial census sample design for the long form questionnaire (for further information, see Quantifying Uncertainty in State and County Estimates.)

For income year 2001, the predictor variables described below use 2001 tax and food stamp data, and 2002 population estimates; the dependent variable is based on a 3-year average from 2001-2003. For income year 2002, the models use 2002 tax and food stamp data, and 2003 population estimates; the dependent variable is based on a 3-year average from 2002-2004.

**The Model for Total Number of People in Poverty**

The model is multiplicative; that is, we model the number of people in poverty
as the product of a series of predictors that are numbers (not rates), and we
model the unknown errors. To estimate the coefficients in the model, we take
logarithms of the dependent and all independent variables. Our choice of a multiplicative
model is motivated, in part, by the fact that the distribution of the number
in poverty has a huge range -- from zero in some counties to more than a million
in the largest county (with a mean of 10,000), based on the Census 2000 -- and
the distribution is highly skewed. Taking the logarithm of all variables makes
their distributions more centered and symmetrical and has the effect of diminishing
the otherwise inordinate influence of large counties on the coefficient estimates.
Another advantage of a multiplicative model is that it makes it plausible to
maintain that the (unobserved) errors for every county, no matter how large
or small, are drawn from the same distribution.

- the log of the number of tax return exemptions (all ages) on returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
- the log of the number of food stamp recipients in July;
- the log of the estimated total resident population as of July 1;
- the log of the total number of tax return exemptions; and
- the log of the Census 2000 estimate of the total number of people in poverty.

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the total number of people in poverty in each county as measured by the 3-year average of values from the CPS ASEC. We combine the regression predictions, in the log scale, with the logs of the direct CPS ASEC sample estimates, and then transform the results into estimates of the numbers of people in poverty. Finally, we control the estimates to the independent estimates of state totals.

**The Model for the Number of Related Children Ages 5 to 17 in Families
in Poverty**

The estimation model for related children age 5 to 17 in poverty parallels
that for all people in poverty in structure. There are five predictor variables:

- the log of the number of child exemptions claimed on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
- the log of the number of food stamp recipients in July;
- the log of the estimated resident population under age 18 as of July 1;
- the log of the total number of child exemptions indicated on tax returns; and
- the log of the Census 2000 estimate of the number of related children in poverty ages 5 to 17.

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the number of related children in poverty ages 5 to 17 in each county as measured by the 3-year weighted average of the CPS ASEC. We combine the regression predictions, in the log scale, with the logs of the direct CPS ASEC sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

**The Model for the Number of People Under Age 18 in Poverty**

The estimation model for people under age 18 in poverty is quite similar. There are five predictor variables:

- the log of the number of child exemptions indicated on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
- the log of the number of food stamp recipients in July;
- the log of the estimated resident population under age 18 as of July 1;
- the log of the total number of child exemptions indicated on tax returns; and
- the log of the Census 2000 estimate of the number of people under age 18 in poverty .

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the number of people in poverty under age 18 in each county as measured by the 3-year weighted average of the CPS ASEC. We combine the regression predictions, in the log scale, with the logs of the direct CPS ASEC sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

**The Model for Median Household Income**

Like the models for the number of people in poverty, the model for median household
income is multiplicative. A consequence of the multiplicative form and the model
performing well relative to the direct CPS ASEC estimates of median household income
is that the standard errors of the estimates are proportional to the point estimates.
In other words, the unobserved errors associated with high-income counties are larger
than the unobserved errors in counties with high proportions of people in poverty. To
estimate the model, we take logarithms of the dependent and all independent
variables; i.e., the model is linear in logarithms. However, we report median
household income in the linear scale and, as a result, the confidence intervals
are asymmetric. The predictor variables in the regression model used to generate
the estimate for county median household income, except where otherwise noted, reference the same year as the estimate. The predictor variables are:

- the log of the Census 2000 estimate of county median household income;
- the log of the median adjusted gross income from tax returns;
- the log of the proportion of the Bureau of Economic Analysis (BEA) estimate of total personal income derived from government transfers;
- the log of the growth of BEA total personal income from 1999 through the target year; and
- the log of the "nonfiler" rate.

We define the nonfiler rate as the ratio of estimated total population minus total exemptions claimed on IRS tax returns to estimated total population. For further information on these variables see Information about Data Inputs.

The dependent variable is the log of county median household income interpolated
with 3 years of CPS ASEC surveys. We adjust the CPS ASEC surveys to express incomes in target-year dollars before computing median household income, using the official Consumer Price Index for Urban Consumers (CPI-U).

Source: U.S. Census Bureau | Small Area Income and Poverty Estimates |
Last Revised:
April 29, 2013