ESTIMATES AND PROJECTIONS AREA METHODOLOGY
STATE POPULATION ESTIMATES BY AGE, SEX, RACE, AND HISPANIC ORIGIN FOR
JULY 1, 2002
PDF Version of this methodology
BACKGROUND
The U.S. Census Bureau produces estimates of the resident population by age, sex,
race and Hispanic origin for each state in the United States on an annual basis.
The following documentation outlines the methodology that was used in the
production of the July 1, 2002 resident population estimates by age, sex, race,
and Hispanic origin for the 50 states in the United States and the District of
Columbia.
OVERVIEW
The Census Bureau develops state population estimates with a demographic procedure
called a cohort-component method. This method follows each birth cohort across
time according to its exposure to mortality, fertility, and migration. In order
to apply the model, Census Bureau demographers estimate each component of
population change separately. For the population residing in households the
components of population change are births, deaths, and net migration, including
net international migration. For the non-household population, change is
represented by the net change in the population living in group-quarters
facilities. A more detailed discussion of the methodology is provided below.
METHOD
The cohort-component method is based on the traditional demographic accounting
system. Starting with a base population, deaths are subtracted from the
population and births are added to the population, forming new cohorts. Estimates
of net international migration and net internal migration are added to or
subtracted from the population. The components of change are measured separately
by age, sex, race, and Hispanic origin for each state and added to the base
population as follows:
P1 = P0 + B - D + NDM + NIM
Where:
P1 = population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net internal migration during the period
NIM = net international migration during the period
In the process of developing the July 1, 2002 estimates, revised estimates
of the July 1, 2001, and July 1, 2000 state population with demographic
detail were produced. The revised estimate for 2001 and 2000 incorporates actual
data for the demographic components that were not previously available and
includes updates or corrections to the data previously used. In cases where we
do not have data for all states for the current estimate year (2002), we estimate
the components of population change based on one or more simplifying assumptions.
One of the guiding principles in the Census Bureau’s subnational methodology
is that all of our population estimates are consistent. This means that the sum
of the state estimates must be equal to the independently produced national
population estimates. This consistency is required for all demographic
characteristics produced. While this consistency is essential in the production
and interpretation of the population estimates it does add an additional layer
of complexity to their development.
The methodology used to produce the July 1, 2002 estimates is described next.
STEP 1: SPECIFICATION OF THE BASE POPULATIONS
The enumerated resident population in Census 2000 is the base for the post-2000
population estimates. The enumerated population was modified in two ways for
purposes of developing these estimates. First, the race data were modified to
eliminate the "Some other race" category in order to be more consistent
with race categories that appear on the administrative records used to produce
the population estimates. Second, the April 1, 2000 population estimates
base reflects modifications to the Census 2000 population as documented in the
Count Question Resolution program.
The race modification conforms to the Office of Management and Budget’s
(OMB) 1997 revised standards for collecting and presenting data on race and
ethnicity. The revised OMB standards identified five minimum race categories:
White; Black or African American; American Indian and Alaska Native; Asian; and,
Native Hawaiian and Other Pacific Islander. Additionally, the OMB recommended
that respondents be given the option of marking or selecting one or more races
to indicate their racial identity. Finally, for respondents unable to identify
with any of the five race categories, the OMB approved including a sixth category
- "Some other race" - on the Census 2000 questionnaire.
No modification was necessary for responses indicating only an OMB race alone or
in combination with another OMB race. However at the national level, about 18.5
million people checked "Some other race" alone or in combination with
another race. These people were primarily of Hispanic origin and many wrote in
their Hispanic origin or Hispanic origin type (such as Mexican or Puerto Rican)
as their race. For purposes of estimates production, responses of "Some
other race" alone were modified by blanking the "Some other race"
response and imputing an OMB race alone or in combination with another race
response. The responses were imputed from a donor, who matched on response to the
question on Hispanic origin. Responses of both "Some other race" and
an OMB race were modified by blanking the "Some other race" response
and keeping the OMB race response.
The resulting race categories (White; Black; American Indian and Alaska Native;
Asian; and, Native Hawaiian and Other Pacific Islander) conform with OMB’s
1997 revised standards for the collection of data on race and ethnicity and are
more consistent with the race categories in other administrative sources, such
as vital statistics.
Because the group quarters (GQ) population experiences somewhat different
demographic processes, the first step in the estimates process is to subtract
the GQ population from the Census 2000 resident population to develop a base
population that consists of two pieces (the household population and the
population residing in group quarters). Both pieces of the base population
contain full demographic detail (age, sex, race, and Hispanic origin) for each
state and the District of Columbia.
1A. Base Household Population
1A.i. The Census 2000 household population (obtained by subtracting the GQ
population from the resident population) is the starting point for the
July 1, 2002 state population estimates. The inclusion of demographic
detail in the development of the state population estimates adds an additional
layer of complexity to the estimation method. Although the Census 2000
population data were available for the full set of race categories described
above, the administrative data sources used to estimate the components of
change (births, deaths, and migration) were not available for all 31 races.
Because the administrative data were available only in the 4 race categories
consistent with the 1990 census (White; Black; American Indian, Eskimo, Aleut;
Asian and Pacific Islander), the Census 2000 base household population was
converted from the 31 race categories to the four race groups consistent with
the 1990 Census. Then the July 1, 2002 population estimates were
produced for the four race categories consistent with the 1990 Census.
Finally, the July 1, 2002 estimates were converted to 31 races to be
consistent with Census 2000 (see Step 6).
The conversion of Census 2000 categories to 1990 Census categories was based
on a "straight proportional allocation." This uses the simplifying
assumption that multiple race responses in Census 2000 would be evenly
distributed between the comparable single race responses allowed in the 1990
Census. For example, the Census 2000 population in the three race categories
of "White Alone," "Black Alone," and "White and
Black" were converted into two of the 1990 Census race categories,
"White" and "Black". The entire White Alone population
enumerated in Census 2000 was assigned to White category and the entire Black
Alone population was assigned to the Black category. Based on the assumption
of straight proportion allocation, half of the Census 2000 "White and
Black" population were assigned to the White race category and half of
the population were assigned to the Black race category. These assignments
are done at the state level, by age groups, sex, and Hispanic origin.
The assumption of proportional allocation is the best available assumption
at this time, though future estimates may not require this conversion or may
be based on different distributions to the single races. See Step 6 for the
conversion of the July 1, 2002 estimates for four races back to the 31
race categories.
1A.ii. Because the Census 2000 reference date is April 1, 2000 and the
estimate periods are July 1 to June 30, it was first necessary to
develop a July 1, 2000 base population. This base population was
calculated using the July 1, 2000 national estimates with full demographic
detail (by age, sex, race, and Hispanic origin) and July 1, 2000 state
estimates by age and sex. A ratio method was used to calculate July 1,
2000 state population estimates by age, sex, race, and Hispanic origin. This
method applied the age, sex, race, and Hispanic origin distribution for states
from Census 2000 to the July 1, 2000 state population estimates by age
and sex to develop initial July 1, 2000 estimates with demographic detail.
The national estimates by age, sex, race, and Hispanic origin were applied
as controls to the initial estimates to generate the July 1, 2000 set
of base population estimates that sum to equal the July 1, 2000 national
population estimates by age, sex, race, and Hispanic origin and the state
populations by age and sex.
The July 1, 2000 estimates then serve as the base population for the
July 1, 2001 and July 1, 2002 estimates produced using the
cohort-component method. The July 1, 2000 estimates could not be
calculated using the cohort-component method because the administrative
records used in the cohort-component method are available for calendar years
and not the three-month time period from April 1, 2000 to July 1,
2000.
1B. Base Group-quarters Population
Examples of types of group-quarters (GQ) populations are: military personnel
living in barracks, college students living in dormitories, and persons
residing in institutions. Inmates of correctional facilities, persons in
health care facilities, persons in Job Corps Centers, and persons residing
in nursing homes are also included in this category.
The Census 2000 group quarters population (obtained by subtracting the
household population from the resident population) is the starting point for
the July 1, 2002 state population estimates. First, the Census 2000 GQ
data for the 31 race groups were converted to the four race groups consistent
with the 1990 Census as described above.
STEP 2: SPECIFICATION OF BIRTHS AND DEATHS (VITAL STATISTICS) COMPONENTS
2A. The birth and death components are calculated from three sources of
data. Files containing all registered births and deaths that occurred to
U.S. residents during the estimate period are obtained from the National
Center for Health Statistics (NCHS). The birth files contain the total
numbers of birth in a calendar year by state and county of mothers’
residence, sex, race, and Hispanic origin. The NCHS death files contain the
total numbers of deaths by sex, race, Hispanic origin, age at death, and
state and county of residence at death. The Federal State Cooperative Program
for Population Estimates (FSCPE) also report annual numbers of registered
births and deaths by sex, race, Hispanic origin, age at death, and state and
county of residence at death or county of mothers residence at birth. A
reconciliation process occurs between the NCHS and FSCPE vital statistics.
In general, we believe that the total demographic characteristics distribution
of data from the NCHS file is more accurate due to its national coverage,
while the geographic distribution of data from FSCPE files is more accurate
due to more specific local knowledge.
It is assumed that the vital statistics files represent complete counts of
births and deaths for the resident population. No adjustments are made for
undercoverage or differential coverage by states, age, race, or Hispanic
origin.
2B. After the NCHS and FSCPE figures are reconciled, they are controlled to
the national estimates of the numbers of births and deaths by sex, race,
Hispanic origin, and age at death developed as part of the national population
estimates for the same time period.
2C. Finally, the births are added to the base population for each year
(July 1, 2000 and July 1, 2001) and the deaths are subtracted from
the base population.
STEP 3: SPECIFICATION OF NET INTERNATIONAL MIGRATION
We estimate the net international migration to/from the United States as several
sub-components: net foreign-born international migration, net movement to/from
Puerto Rico, net federal and civilian citizen movement, and native emigration.
In this last vintage, we did not have current state-level data in the detail we
needed to be able to directly estimate net international migration at the
state-level by characteristics. Instead, for each of the sub-components of net
international migration we used the national characteristics net international
migration data and distributed by characteristics to the states consistent with
the state total population estimates.
STEP 4: SPECIFICATION OF NET INTERNAL MIGRATION
Step 4A. Match of Tax Returns to create counts of exemptions (filers and
dependents) who migrate by demographic characteristics
4A.i. For the July 1, 2002 estimates the component of internal migration
was developed using data from two administrative record sources: annual
extracts of tax returns provided by the Internal Revenue Service (IRS)
linked by Social Security Number across successive years; and the Census
Numident file, derived from the Social Security Administration 100 percent
file (SSA). In order to ensure confidentiality and privacy, these data sets
are matched by SSN/PIK (Protected Identification Key) and are referred to
jointly as IRS-SSA data. The IRS 1040 tax return records were matched to
the SSA data to identify the age, sex, race, and Hispanic origin of the tax
filers. A number of assumptions were made to assign demographic characteristics
to spouses and dependents. Exemptions claimed for children were assigned to
the under 20 age group and exemptions claimed for parents were assigned to
the age category 65 and over. Sex was assigned randomly for exemptions.
Spouses were assigned the same age and the opposite sex as filers. All
spouses and exemptions were assigned the same race and Hispanic origin as
filers.
4A.ii. After the demographic characteristics are added to the IRS tax return
records, two years of records are matched by SSN/PIK to determine migration
status. Filers (and their dependents) with a change in the state of residence
between the two periods were identified as "Inter-State" migrants.
Otherwise, if there was no change in the state of residence, the filers (and
dependents) were identified as non-migrants.
Step 4B. Calculate State Out-Migration Rates and Number of State Out-Migrants
Migration rates are computed using the number of exemptions with addresses in
different states in the second period as the numerator and the total number
of exemptions in the state in the first period as the denominator. The rates
are applied to the July 1, 2000 and July 1, 2001 base populations
by age, sex, race, and Hispanic origin to generate a the number of state out
migrants by age, sex, race, and Hispanic origin.
Because of the potentially large number of origin-characteristic combinations,
a few simplifying assumptions were required in the production of the
July 1, 2002 estimates. It was necessary in some cases to combine
individual origin-characteristic categories (which will be referred to as
cells) to improve the robustness of the data. If a given cell had less than
30 exemptions, then it was combined with adjacent age cells within the same
origin-ethnicity-race-sex group until the combined category contained at least
30 exemptions. If it was not possible to create a combined category containing
at least 30 exemptions within an origin-ethnicity-race-sex group, then cells
were combined for both sexes. When individual ages were combined to compute
a migration probability, each of the ages was assigned the probability for
the aggregated age group.
Step 4C. Calculate In-Migration Proportion and In-Migrant Population
From the matched records the destinations of migrants by demographic
characteristics can be determined. The numbers of out-migrants calculated
from Step 4b were distributed as in-migrants to states by applying proportions
of total in-migrant exemptions who moved to each state. These proportions
were calculated by age, sex, race, and Hispanic origin. The numbers of
out-migrants by state and characteristics were subtracted from the base
population for each time period and the numbers of in-migrants were added to
the base population.
STEP 5: PROCESSING OF GROUP QUARTERS POPULATION
GQ population change was estimated separately from the demographic accounting
procedure described above. This was done primarily because of the uniqueness of
this subpopulation and the special difficulties of estimating the GQ population.
The July 1, 2002 GQ estimates were calculated starting with the Census 2000 GQ
population by age, sex, race, Hispanic origin, and seven GQ types for each states.
States provide updated information on the total GQ population by GQ type to the
Census Bureau each year.1 The Census 2000 age, sex, race, and Hispanic
origin distributions of the GQ population by state and type were applied to the
July 1, 2002 GQ populations by type reported by states to produce GQ
population estimates for states with demographic detail.
STEP 6: GENERATE RESIDENT POPULATION ESTIMATES BY DEMOGRAPHIC CHARACTERISTIC
6A. Prior to combining the July 1, 2000 (revised), July 1, 2001
(revised) and July 1, 2002 group quarters and household population
estimates, each set of estimates was converted from the four race groups
consistent with the 1990 Census to the 31 race groups consistent with Census
2000 by applying conversion factors. Continuing the example from Step 1A.i.,
the estimated "White" population was apportioned to the "White
Alone" and "White and Black" populations. The estimated
July 1, 2002 White population was multiplied by the ratio of the White
Alone population from Census 2000 to the sum of the White Alone and half of
the "White and Black" population from Census 2000 to produce the
July 1, 2002 estimate for the White Alone population. The estimated
July 1, 2002 White population was multiplied by the ratio of the
"White and Black" population from Census 2000 to the sum of the
White Alone and half of the "White and Black" population from
Census 2000 to produce part of the July 1, 2002 estimates for the White
and Black population. The remaining part of the July 1, 2002 "White
and Black" population was obtained by applying comparable ratios to the
July 1, 2002 estimates Black population.
6B. The group quarters and household estimates of the population were summed
by state and by age group and sex within states. The preliminary estimates
were compared with independently calculated state total population estimates,
state estimates by age and sex, and national estimates by age, sex, race and
Hispanic origin. The final resident population estimates were adjusted to
equal the independent totals by multiplying each estimate by the ratio of the
independent totals to the sum of the relevant estimates. These adjusted
estimates were rounded to whole numbers for each combination of demographic
characteristics within states and compared with the independent totals.
1 This data is collected annually by state agencies through the
Federal-State Cooperative Program for Population Estimates (FSCPE).
|