|
Estimates And Projections Area Methodology
State Population Estimates By Age, Sex, Race, And Hispanic Origin For
July 1, 2003
PDF Version of this methodology
The U.S. Census Bureau produces estimates of the resident population by
age, sex, race and Hispanic origin for each state in the United States
and the District of Columbia on an annual basis. The following
documentation outlines the methods that were used in the production of
the July 1, 2003 estimates.
OVERVIEW
For the July 1, 2003 state estimates of the resident population by age,
sex, race, and Hispanic origin, the Census Bureau used a proportional
distribution method. This method was applied in the following manner.
First, we started with previously developed resident state population
estimates by age and sex and resident national population estimates by
age, sex, race, and Hispanic origin. Second, we estimated the race and
Hispanic origin distributions for the state-age-sex estimates using
information about the post-censal change in the corresponding populations.
Third, we applied these distributions to the original state age-sex and
national characteristics estimates. A detailed discussion of this
method is provided below.
Estimating the Race and Hispanic Origin Distributions
The majority of the work that went into producing the state resident
population estimates by age, sex, race, and Hispanic origin consisted of
estimating the race and Hispanic origin distributions for each state-age-sex
estimate. This was done by producing a preliminary set of age, sex, race,
and Hispanic origin estimates for each state and then calculating the
race-Hispanic origin proportions from these.
The preliminary set of resident population estimates by age, sex, race,
and Hispanic origin was produced by first splitting the Census population
into two mutually exclusive universes: The household population and the
group quarters (GQ) population. For the household population, a cohort
component technique was then used to estimate change in this population.
For the GQ population, GQ change was estimated through a data-collection
effort conducted in conjunction with members of the Federal State
Cooperative Program of Population Estimates (FSCPE). The resulting
household and GQ estimates were then added together to produce the new
set of preliminary resident population estimates.
The Preliminary Household Population Estimates
The cohort-component technique used to estimate the household population
follows each birth cohort across time according to its exposure to
mortality, fertility, and migration. This technique was applied using
the following equation.
P1 = P0 + B - D + NDM + NIM + NMM
Where:
P1 = population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net domestic migration during the period
NIM = net international migration during the period
NMM = Net military movement during the period
The actual application of this algorithm was quite simple. Not so simple
was the preparation of the input data. Great care was taken to estimate
each component of this equation as accurately as possible. The details
of this work are outlined below.
One complicating factor in the estimation of the household population
was that the administrative data used in these estimates continues to
come to the Census Bureau with different race categories than used in
Census 2000. In Census 2000, information was gathered using 6 races
groups (White; Black; American Indian and Alaska Native; Asian; Native
Hawaiian and Pacific Islander; and Some Other Race). In addition,
individuals were allowed to report multiple races. Conversely, most
administrative data available for the estimates still come to us in the
4 race categories consistent with the 1990 census (White; Black; American
Indian, Eskimo, or Aleut; and Asian and Pacific Islander). For this
reason, the household estimates were processed with the 4 race categories
consistent with the 1990 census and the results were converted to race
categories consistent with Census 2000. Details of all the conversions
needed in order to carry out this procedure are presented below.
Specification of the Base Population
The enumerated population from Census 2000 was the base for the
July 1, 2003 estimates. This population was modified in four
ways to prepare it for inclusion in the cohort-component technique.
- The original race data from the Census were modified to eliminate
the "some other race" category.1
- The April 1, 2000 population estimates base reflects modifications
to the Census 2000 population as documented in the Count Question
Resolution program and errata notes.2
- The Census 2000 base household population was converted from
the 31 race categories to the four race groups consistent with
the 1990 Census by "straight proportional allocation."
- Finally, the results of step [c] were used to estimate the
population on July 1, 2000.3 This was done as follows:
- A set of race-Hispanic origin proportions was calculated
by summing the data by state, age, and sex and then dividing
each state-age-sex-race-origin cell by the corresponding sum.
- These proportions were then applied to the previously
produced state age-sex estimates for July 1, 2000.
- Finally, the results were controlled to the July 1,
2000 national estimates by age, sex, race, and Hispanic origin.
Specification of Births
The birth component was calculated with data from two sources. The
Federal State Cooperative Program for Population Estimates (FSCPE)
provided data on all registered births that occurred in the members’
respective state for calendar years 2000-2002. The FSCPE births were
adjusted and distributed by sex, race (consistent with the 1990 Census),
and Hispanic origin using data from The National Center for Health
Statistics (NCHS). Finally, data for the last 6 months of the
estimate period (January 1 - June 30, 2003) were assumed
to be equal to the number of births from July 1 - December 31,
2002. These final birth counts were considered to be complete for
the resident population. No adjustments were made for under coverage
or differential coverage by state, sex, race, or Hispanic origin.
Specification of Deaths
The death component was also calculated with data from two sources.
The Federal State Cooperative Program for Population Estimates (FSCPE)
provided data on all registered deaths that occurred in the members’
respective state for calendar years 2000-2002. The FSCPE deaths were
adjusted and distributed by age, sex, race (consistent with the 1990
Census), and Hispanic origin using data from The National Center for
Health Statistics (NCHS). Finally, data for the last 6 months of the
estimate period (January 1 - June 30, 2003) were assumed to
be equal to the number of deaths from July 1 - December 31,
2002. These final death counts were considered to be complete for
the resident population. No adjustments were made for under coverage
or differential coverage by state, age, sex, race, or Hispanic origin.
Specification of Net International Migration
The net international migration component consisted of three migration
flows: (1) net migration of the foreign-born, (2) emigration
of natives, and (3) net movement from Puerto Rico to the United
States.
To measure the net migration of the foreign born, we used the American
Community Survey (ACS) because it provided annually updated data. We
first determined the national level of migration for the foreign born
by calculating the net difference in the estimates of these surveys
from 2000 to 2001 and 2001 to 2002. We then accounted for deaths to
the entire foreign-born population during the periods of interest to
arrive at the final national estimate of net migration of the foreign
born. Then, to ascribe county of destination, age, sex, race, and
Hispanic origin to these estimates, we applied the distribution of
the non-citizen foreign born from Census 2000 who entered in 1995 or
later to the national-level estimate. Finally, we assumed the net
migration of the foreign born between 2002 and 2003 to be the same
as net migration between 2001 and 2002.
The national net movement from Puerto Rico to the United States by
age and sex was measured using levels observed during the 1990s.4
To assign characteristics to these flows, we applied the age-sex-race-Hispanic
origin-destination county distributions from Census 2000 from those
who indicated that their place of birth was Puerto Rico and who had
entered the United States in 1995 or later.
The emigration of natives was produced in a similar way to the net
movement from Puerto Rico. Again, the national levels of movement
were measured using levels observed during the 1990s.5
Then, to assign characteristics to these flows, we applied the
age-sex-race-Hispanic origin-destination state distributions from
Census 2000 of all natives who currently reside in the United States.
Therefore, the characteristics of natives who emigrated were assumed
to be the same age-sex-race Hispanic origin-destination state
distribution as natives residing in the 50 states and the District
of Columbia in Census 2000.
Once the net migration of the foreign born, net movement from Puerto
Rico, and the emigration of natives were estimated, all three parts
were combined to estimate a final net international migration component.
Specification of Net Internal Migration
For the July 1, 2003 estimates, internal migration was estimated
using data from two administrative record sources: annual
individual-level extracts of tax returns provided by the Internal
Revenue Service (IRS); and the Census Numident file derived from the
Social Security Administration 100 percent file (SSA).
The IRS 1040 tax return records were matched to the SSA data to
identify the age, sex, race, and Hispanic origin of the tax filers.
Next, demographic characteristics were assigned to spouses and
dependents of each filer using several simplifying assumptions. First,
spouses were assigned the same age and the opposite sex as filers.
Second, exemptions claimed for dependent children were assigned to
the under-20 age group and exemptions claimed for dependent parents
were assigned to the 65 and over age group. Third, the sex of
dependent children and parents was assigned randomly. Fourth, the
spouse and other dependents were assigned the same race and Hispanic
origin as the filer.
After the demographic characteristics were assigned to the IRS tax
return records, two years of records were matched and a migration
status was assigned. Filers and their dependents with a change in
the state of residence between the two periods were identified as
"inter-state migrants." If there was no change in the state
of residence, the filers and dependents were identified as non-migrants.
It was now possible to calculate both the out-migration rates and
in-migration proportions for each state by age, sex, race, and
Hispanic origin. First, the out migration rate for each age, sex,
race, and Hispanic origin group within a state was computed by
dividing the number of "inter-state migrants" moving out
of the state during the period by the number of filers and dependents
(i.e., exemptions) in the state at the beginning of the period.6
Then, to calculate the in-proportions for the age, sex, race, and
Hispanic origin groups of each state, the out-migration rates were
multiplied by a proxy estimate of the population for that year taken
from the previous vintage of estimates (vintage 2002). From this,
the inter-state in-migration proportion for each state was calculated
by dividing the number of in-migrants by age, sex, race, and Hispanic
origin for that state by the national sum for that characteristic
group.
In the production of the out-migration rates and in-migration
proportions, when the number of exemptions for any age-sex-race-Hispanic
origin category (which will be referred to as a cell) was low, the
exemptions were combined with those of other cells in order to
improve the robustness of the resulting migration out-rates or
in-proportions. If a given cell had less than 30 exemptions, it was
combined with the exemptions of adjacent age cells within the same
sex-race-Hispanic origin group until the combined category contained
at least 30 exemptions. If it was not possible to create a combined
category containing at least 30 exemptions with the procedure, then
cells were combined for both sexes. After this was done and the
out-migration rate or in-migration proportion was calculated, each
of the individual ages was assigned the rate or proportion of the
aggregated age group.
Two other aspects of the estimation of the out-migration rates and
in-migration proportions should be noted. First, the individual ages
in the 0-19 and 65+ age groups were assigned the same out-migration
rate or in-migration proportion as the aggregated age group. Second,
the age distributions of the out-migration rates and in-migration
proportions by state, sex, race, and Hispanic origin were smoothed
using a moving average.
The final step in the production of the number of in- and out-migrants
for each state occurred during the actual estimation process. The
out-rates are applied to the estimate of the population at the
beginning of the period to generate the number of state out migrants
by age, sex, race, and Hispanic origin. Then, these migrants are
converted into in-migrants for each state by age, sex, race, and
Hispanic origin by multiplying the in-proportions for each state by
the corresponding national sums.
Specification of Net Military Movement
The net movement of the military, both foreign and domestic, into
each state for each year was estimated using data received directly
from the armed forces and the Department of Defense. They provided
yearly estimates of the station strength in each state from 2000 to
2003. The net-movement was then calculated as the difference in the
station strength from one year to the next. Finally, the age, sex,
race, and Hispanic origin distribution was assigned using those who
reported being employed by the military in Census 2000.
The Preliminary GQ Population Estimates
Group Quarters (GQ) population change is estimated separately from the
household population because of the unique character of this subpopulation
and the ability to acquire direct data that reflects change in this
population. The technique for estimating the GQ population for the
vintage 2003 estimates started with the Census 2000 enumerated GQ
population. As with the household population, the race breakdowns of
the GQ data were converted from categories consistent with Census 2000
to categories consistent with 1990 through proportional allocation (see
above).
Next, the state representatives who participate in the Federal-State
Cooperative Program for Population Estimates (FSCPE) developed an
independent list of GQ facilities in their state with the populations
typically associated with them at the time of Census 2000 and annually
from 2000 to 2003 from the sources available to them in their state. In
turn, the Census Bureau calculated the implied change in the GQ population
from the numbers provided by the FSCPE members. This change was then
applied to the Census GQ base to come up with the estimate of the total
GQ population in the state. Finally, these state totals were distributed
by age, sex, race, and Hispanic origin using the distribution of the GQ
population within each GQ type from the base GQ population.
The Final Population Estimates by Demographic Characteristic
The final steps in the production of the vintage 2003 state characteristics
estimates consisted of 1) adding together the household and GQ
population estimates for each year, 2) calculating the necessary
proportions from the preliminary estimates and applying them to both the
previously created state population estimates by age and sex and the
National estimates by age, sex, race, and Hispanic origin, and
3) converting the data from the 4 race categories consistent with
the 1990 census to the categories consistent with Census 2000.
In combining the household and GQ population estimates, the only caveat
that should be noted is that it was assumed that there was no change to
the GQ population between April 1, 2000 and July 1, 2000.
The next step in the production of the state estimates by age, sex, race,
and Hispanic origin was to control the preliminary resident state estimates
by age, sex, race, and Hispanic origin to both the previously created
resident state population estimates by age and sex and the national
resident estimates by age, sex, race, and Hispanic origin. This was
done through a two-step iterative process.
In the first step, the preliminary state characteristic estimates were
summed to the national level by age, sex, race, and Hispanic origin. Next,
proportions were calculated by dividing each state-age-sex-race-Hispanic
origin cell by the corresponding sum. Finally, these proportions were
applied to the original national characteristics estimates in order to
calculate an intermediate set of state estimates by age, sex, race, and
Hispanic origin.
In the second step, a similar procedure was used to calculate new
proportions from the transformed data so that the state age-sex estimates
could be distributed by race and Hispanic origin. First, the preliminary
state characteristic estimates were summed to the state level by age and
sex. Next, proportions were calculated by dividing each
state-age-sex-race-Hispanic origin cell by the corresponding sum. Finally,
these proportions were applied to the original state age-sex estimates
in order to calculate an intermediate set of state estimates by age, sex,
race, and Hispanic origin.
When attempting to make estimates consistent with two different sets of
data using the procedure described above, performing the second step of
the iteration tends to distort the fit achieved in the first step. Likewise,
repeating the first step again tends to distort the fit achieved in the
second step. However, these distortions can be minimized by repeating
the two-step procedure multiple times. For this reason, the iterative
procedure was repeated five times. After this, the resulting set of
estimates was rounded to integers.
The final step in the estimates process was to convert the estimates
from the 4-race categories consistent with the 1990 census in which the
estimates were processed to the 31-race categories consistent with Census
2000. To do this, the procedure used to go from 31 to 4 races was
essentially reversed. First, the proportion of each 4-race category
associated with the 31 race categories was calculated for both universes
using Census 2000 household and GQ data by state, age, sex, and Hispanic
origin. Next, the GQ population for each estimate year was subtracted
from the corresponding resident population in order to arrive at the
household population for that estimate year. Then, the proportions for
both the household and GQ data were applied to the respective 4-race
estimates for each year to arrive at the 31-category race household and
GQ estimates.7 Finally, the 31-category household and GQ
estimates by age, sex, and Hispanic origin were added together to arrive
at the final resident state population estimates by age, sex, race, and
Hispanic origin.
1 This modification has been accepted for all Census Bureau
estimates produces and is explained in the document entitled "Modified
Race Data Summary File Technical Documentation and ASCII Layout"
that can be found on the Census Bureau website at
http://www.census.gov/popest/archives/files/MRSF-01-US1.html.
2 Details about the Count Question Resolution Program can be
found on the Census Bureau website at
http://www.census.gov/dmd/www/CQR.htm.
Errata notes can be found on the Census Bureau website at
http://www.census.gov/prod/cen2000/notes/errata.pdf.
3 This step was needed since the estimates procedure produces
annual estimates and since the target reference date for each estimate
year is July 1.
4 A description of the methodology used to produce these
estimates can be found on the Census Bureau website at
http://www.census.gov/population/www/documentation/twps0064.html.
5 A description of the methodology used to produce these
estimates can be found on the Census Bureau website
http://www.census.gov/population/www/documentation/twps0063.html.
6 Technically, the quotient produced by this procedure is the
probability of moving out of the state. However, we make the simplifying
assumption that the out-migration rate is the same as the probability of
moving out of the state.
7 The reason for separating the process of converting the
household and GQ data from 4 to 31 races is that the GQ data are processed
by GQ type so that they may be used in estimates not described in this
document. |