|
Estimates And Projections Area Methodology
County Population Estimates By Age, Sex, Race, And Hispanic Origin For
July 1, 2003
PDF Version of this methodology
The U.S. Census Bureau produces estimates of the resident population by
age, sex, race and Hispanic origin for each county in the United States
and the District of Columbia on an annual basis. The following
documentation outlines the methods that were used in the production of
the July 1, 2003 estimates.
OVERVIEW
For the July 1, 2003 county estimates of the resident population by age,
sex, race, and Hispanic origin, the Census Bureau used a proportional
distribution method. This method was applied in the following manner.
First, we started with previously developed resident county population
estimates by age (0-64 and 65+) and resident state population estimates
by age, sex, race, and Hispanic origin. Second, we estimated the age,
sex, race, and Hispanic origin distributions for the county estimates
using information about the post-censal change in the corresponding
populations. Third, we applied these distributions to the original county
estimates by age and state characteristics estimates. A detailed
discussion of this method is provided below.
Estimating the Age, Sex, Race, and Hispanic Origin Distributions
The majority of the work that went into producing the county resident
population estimates by age, sex, race, and Hispanic origin consisted of
estimating the race and Hispanic origin distributions for each county
age estimate. This was done by producing a preliminary set of age, sex,
race, and Hispanic origin estimates for each county and then calculating
the age-sex-race-Hispanic origin proportions from these.
The preliminary set of resident population estimates by age, sex, race,
and Hispanic origin was produced by first splitting the Census population
into two mutually exclusive universes: The household population and the
group quarters (GQ) population. For the household population, a cohort
component technique was then used to estimate change in this population.
For the GQ population, GQ change was estimated through a data-collection
effort conducted in conjunction with members of the Federal State
Cooperative Program of Population Estimates (FSCPE). The resulting
household and GQ estimates were then added together to produce the new
set of preliminary resident population estimates.
The Preliminary Household Population Estimates
The cohort-component technique used to estimate the household population
follows each birth cohort across time according to its exposure to
mortality, fertility, and migration. This technique was applied using
the following equation.
P1 = P0 + B - D + NDM + NIM + NMM
Where:
P1 = population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net domestic migration during the period
NIM = net international migration during the period
NMM = net military movement during the period
The actual application of this algorithm was quite simple. Not so simple
was the preparation of the input data. Great care was taken to estimate
each component of this equation as accurately as possible. The details
of this work are outlined below.
One complicating factor in the estimation of the household population
was that the administrative data used in these estimates continues to
come to the Census Bureau with different race categories than used in
Census 2000. In Census 2000, information was gathered using 6 races
groups (White; Black; American Indian and Alaska Native; Asian; Native
Hawaiian and Pacific Islander; and Some Other Race). In addition,
individuals were allowed to report multiple races. Conversely, most
administrative data available for the estimates still come to us in the
4 race categories consistent with the 1990 census (White; Black; American
Indian, Eskimo, or Aleut; and Asian and Pacific Islander). For this
reason, the household estimates were processed with the 4 race categories
consistent with the 1990 census and the results were converted to race
categories consistent with Census 2000. Details of all the conversions
needed in order to carry out this procedure are presented below.
Specification of the Base Population
The enumerated population from Census 2000 was the base for the
July 1, 2003 estimates. This population was modified in four
ways to prepare it for inclusion in the cohort-component technique.
- The original race data from the Census were modified to eliminate
the "some other race" category.1
- The April 1, 2000 population estimates base reflects modifications
to the Census 2000 population as documented in the Count Question
Resolution program and errata notes.2
- The Census 2000 base household population was converted from
the 31 race categories to the four race groups consistent with
the 1990 Census by "straight proportional allocation."
- Finally, the results of step [c] were used to estimate the
population on July 1, 2000.3 This was done as follows:
- A set of age-sex-race-Hispanic origin proportions was
calculated by summing the data by county and age (0 to 64
and 65+) and then dividing each age-sex-race-Hispanic origin
cell by the corresponding sum.
- These proportions were then applied to the previously
produced county age estimates for July 1, 2000.
- Finally, the results were controlled to the July 1,
2000 state estimates by age, sex, race, and Hispanic origin.
Specification of Births
The birth component was calculated with data from two sources. The
Federal State Cooperative Program for Population Estimates (FSCPE)
provided data on all registered births that occurred in each county
of the members’ respective state for calendar years 2000-2001.
In addition, they provided births for their state as a whole for
calendar year 2002, which were subsequently distributed by county
using the distribution from the FSCPE 2001 births. After this, the
FSCPE births for each year were adjusted and distributed by sex, race
(consistent with the 1990 Census), and Hispanic origin using data
from The National Center for Health Statistics (NCHS). Finally,
data for the last 6 months of the estimate period (January 1 -
June 30, 2003) were assumed to be equal to the number of births
from July 1 - December 31, 2002. These final birth counts
were considered to be complete for the resident population. No
adjustments were made for under coverage or differential coverage by
county, sex, race, or Hispanic origin.
Specification of Deaths
The death component was also calculated with data from two sources.
The Federal State Cooperative Program for Population Estimates (FSCPE)
provided data on all registered deaths that occurred in each county
of the members’ respective state for calendar years 2000-2001.
In addition, they provided deaths for their state as a whole for
calendar year 2002, which were subsequently distributed by county
using the distribution from the FSCPE 2001 deaths. After this, the
FSCPE deaths for each year were adjusted and distributed by age, sex,
race (consistent with the 1990 Census), and Hispanic origin using
data from the National Center for Health Statistics (NCHS). Finally,
data for the last 6 months of the estimate period (January 1 -
June 30, 2003) were assumed to be equal to the number of deaths
from July 1 - December 31, 2002. These final deaths counts
were considered to be complete for the resident population. No
adjustments were made for under coverage or differential coverage by
county, age, sex, race, or Hispanic origin.
Specification of Net International Migration
The net international migration component consisted of three migration
flows: (1) net migration of the foreign-born, (2) emigration
of natives, and (3) net movement from Puerto Rico to the United
States.
To measure the net migration of the foreign born, we used the American
Community Survey (ACS) because it provided annually updated data. We
first determined the national level of migration for the foreign born
by calculating the net difference in the estimates of these surveys
from 2000 to 2001 and 2001 to 2002. We then accounted for deaths to
the entire foreign-born population during the periods of interest to
arrive at the final national estimate of net migration of the foreign
born. Then, to ascribe county of destination, age, sex, race, and
Hispanic origin to these estimates, we applied the distribution of
the non-citizen foreign born from Census 2000 who entered in 1995 or
later to the national-level estimate. Finally, we assumed the net
migration of the foreign born between 2002 and 2003 to be the same
as net migration between 2001 and 2002.
The national net movement from Puerto Rico to the United States by
age and sex was measured using levels observed during the 1990s.4
To assign characteristics to these flows, we applied the age-sex-race-Hispanic
origin-destination county distributions from Census 2000 from those
who indicated that their place of birth was Puerto Rico and who had
entered the United States in 1995 or later.
The emigration of natives was produced in a similar way to the net
movement from Puerto Rico. Again, the national levels of movement
were measured using levels observed during the 1990s.5
Then, to assign characteristics to these flows, we applied the
age-sex-race-Hispanic origin-destination county distributions from
Census 2000 of all natives who currently reside in the United States.
Therefore, the characteristics of natives who emigrated were assumed
to be the same age-sex-race Hispanic origin-destination county
distribution as natives residing in the 50 states and the District
of Columbia in Census 2000.
Once the net migration of the foreign born, net movement from Puerto
Rico, and the emigration of natives were estimated, all three parts
were combined to estimate a final net international migration
component.
Specification of Net Domestic Migration
The input data used for the estimation of county net domestic migration
came from two administrative sources: annual individual-level extracts
of tax returns provided by the Internal Revenue Service (IRS); and
the Census Numident file derived from the Social Security Administration
100 percent file (SSA). The IRS 1040 tax return records were matched
to the SSA data to identify the age, sex, race, and Hispanic origin
of each tax filer, their spouse, and their dependents. After this,
two years of records were matched and a migration status was assigned
to each filer. Filers with a change in residence across county
boundaries between the two periods along with their spouses and
dependents were identified as "domestic migrants." If there
was no change in the county of residence, the filers and their spouses
and dependents were identified as non-migrants.
While it would have been theoretically possible to estimate county
domestic in-migration and out-migration by calculating county-to-county
flows, we decided against this based on the nature of the input data
and the calculations required to transform them into the estimates
of net migration. In short, since the IRS exemption data reflected
only a portion of the population, we needed to convert these data
into rates and then apply them to an estimate of the entire population.
At the county level, the creation of rates for each
demographic-characteristic group (age, sex, and race-origin) within
all 3,141 counties would have resulted in millions of cells of data,
most with values either equaling zero or too small to produce robust
estimates of migration. Therefore, all calculations of the number
of domestic migrants for the July 1, 2003 county characteristics
estimates occurred at the state level and then were distributed to
the counties by proportional allocation.
The process of estimating the net number of domestic migrants for
each county was comprised of four steps. First, the total number of
domestic migrants crossing county borders within each state was
estimated. This was done with a process parallel to the one used
to estimate the number of inter-state migrants for each state and
occurred in the following manner.6 To begin, an inter-county
migration rate for each age-sex-race-Hispanic origin group within a
state was computed by dividing the total number of exemptions
associated with filers who moved across county borders during the
period by the number of exemptions associated with filers in the
state as a whole at the beginning of the period.7 8
Next, the rates were smoothed across ages using a moving average.
Finally, the rates were applied to an estimate of the state population
by characteristics at the beginning of the period from the state
characteristics estimates to generate the number of domestic
within-state county migrants by age, sex, race, and Hispanic origin.
The second step in the process of estimating the net number of domestic
migrants for each county was to distribute these estimates to the
counties within each state. To do this, two ratios were calculated
for each county from the original IRS data; (1) an out-migration
ratio, or the number of exemptions associated with filers who moved
out of a given county and into all other counties within the state
to the total number of exemptions associated with filers who moved
across county boundaries within the state, and (2) an in-migration
ratio, or the number of exemptions associated with filers who moved
into a given county from all other counties within the state to the
total number of exemptions associated with filers who moved across
county boundaries within the state. After this, these ratios were
applied to the total number of domestic migrants crossing county
borders within each state to arrive at the number of domestic
within-state out-migrants and in-migrants for each county.
The third step in the process of estimating the net number of domestic
migrants for each county was to distribute the total number of domestic
migrants going out of and coming into each state to each county within
the state.9 This was done in a way that paralleled the
distribution of the total number of domestic migrants crossing county
borders within each state. To begin, two ratios were calculated for
each county from the original IRS data; (1) an out-migration
ratio, or the number of exemptions associated with filers who moved
out of a given county and into all other states to the total number
of exemptions associated with filers who moved out of the state, and
(2) an in-migration ratio, or the number of exemptions associated
with filers who moved into a given county from all other states to
the total number of exemptions associated with filers who moved into
the state. After this, these ratios were applied to the total number
of domestic migrants exiting and entering the state, respectively,
to arrive at the number of inter-state out-migrants and inter-state
in-migrants for each county.
The final step in the process of estimating the net number of domestic
migrants was simply to sum the numbers of domestic within-state
out-migrants and inter-state out-migrants by county to get the total
number of out migrants for each county, and to sum the numbers of
domestic within-state in-migrants and inter-state in-migrants by
county to get the total number of in migrants for each county.
Specification of Net Military Movement
The net movement of the military, both foreign and domestic, into
each county for each year was estimated using data received directly
from the armed forces and the Department of Defense. They provided
yearly estimates of the station strength in each state from 2000 to
2003. The net-movement was then calculated as the difference in the
station strength from one year to the next. Finally, the county, age,
sex, race, and Hispanic origin distribution was assigned using those
who reported being employed by the military in Census 2000.
The Preliminary GQ Population Estimates
Group Quarters (GQ) population change is estimated separately from the
household population because of the unique character of this subpopulation
and the ability to acquire direct data that reflects change in this
population. The technique for estimating the GQ population for the
vintage 2003 estimates started with the Census 2000 enumerated GQ
population. As with the household population, the race breakdowns of the
GQ data were converted from categories consistent with Census 2000 to
categories consistent with 1990 through proportional allocation (see above).
Next, the state representatives who participate in the Federal-State
Cooperative Program for Population Estimates (FSCPE) developed an
independent list of GQ facilities in their state by county with the
populations typically associated with them at the time of Census 2000
and annually from 2000 to 2003 from the sources available to them in
their state. In turn, the Census Bureau calculated the implied change
in the GQ population from the numbers provided by the FSCPE members.
This change was then applied to the Census GQ base to come up with the
estimate of the total GQ population in each county. Finally, these
county totals were distributed by age, sex, race, and Hispanic origin
using the distribution of the GQ population within each GQ type from the
base GQ population.
The Final Population Estimates by Demographic Characteristic
The final steps in the production of the vintage 2003 county characteristics
estimates consisted of 1) adding together the household and GQ
population estimates for each year, 2) calculating the necessary
proportions from the preliminary estimates and applying them to both the
previously created county population estimates by age (0-64 and 65+) and
the State estimates by age, sex, race, and Hispanic origin, and
3) converting the data from the 4 race categories consistent with
the 1990 census to the categories consistent with Census 2000.
In combining the household and GQ population estimates, the only caveat
that should be noted is that it was assumed that there was no change to
the GQ population between April 1, 2000 and July 1, 2000.
The next step in the production of the county estimates by age, sex, race,
and Hispanic origin was to control the preliminary resident county
estimates by age, sex, race, and Hispanic origin to both the previously
created resident county population estimates by age (0-64 and 65+) and
the state resident estimates by age, sex, race, and Hispanic origin. This
was done through a two-step iterative process.
In the first step, the preliminary county characteristic estimates were
summed to the state level by age, sex, race, and Hispanic origin. Next,
proportions were calculated by dividing each county-age-sex-race-Hispanic
origin cell by the corresponding sum. Finally, these proportions were
applied to the original state characteristics estimates in order to
calculate an intermediate set of county estimates by age, sex, race, and
Hispanic origin.
In the second step, a similar procedure was used to calculate new
proportions from the transformed data so that the county age estimates
(0 to 64 and 65+) could be distributed by age, race, and Hispanic origin.
First, the preliminary county characteristic estimates were summed by the
appropriate age groups. Next, proportions were calculated by dividing
each county-age-sex-race-Hispanic origin cell by the corresponding sum.
Finally, these proportions were applied to the original county age
estimates in order to calculate an intermediate set of county estimates
by age, sex, race, and Hispanic origin.
When attempting to make estimates consistent with two different sets of
data using the procedure described above, performing the second step of
the iteration tends to distort the fit achieved in the first step.
Likewise, repeating the first step again tends to distort the fit
achieved in the second step. However, these distortions can be minimized
by repeating the two-step procedure multiple times. For this reason, the
iterative procedure was repeated five times. After this, the resulting
set of estimates was rounded to integers.
The final step in the estimates process was to convert the estimates from
the 4-race categories consistent with the 1990 census in which the
estimates were processed to the 31-race categories consistent with
Census 2000. To do this, the procedure used to go from 31 to 4 races
was essentially reversed. First, the proportion of each 4-race category
associated with the 31 race categories was calculated for both universes
using Census 2000 household and GQ data by county, age, sex, and Hispanic
origin. Next, the GQ population for each estimate year was subtracted
from the corresponding resident population in order to arrive at the
household population for that estimate year. Then, the proportions for
both the household and GQ data were applied to the respective 4-race
estimates for each year to arrive at the 31-category race household and
GQ estimates.10 Finally, the 31-category household and GQ
estimates by age, sex, and Hispanic origin were added together to arrive
at the final resident county population estimates by age, sex, race, and
Hispanic origin.
1 This modification has been accepted for all Census Bureau
estimates produces and is explained in the document entitled "Modified
Race Data Summary File Technical Documentation and ASCII Layout"
that can be found on the Census Bureau website at
http://www.census.gov/popest/archives/files/MRSF-01-US1.html.
2 Details about the Count Question Resolution Program can be
found on the Census Bureau website at
http://www.census.gov/dmd/www/CQR.htm.
Errata notes can be found on the Census Bureau website at
http://www.census.gov/prod/cen2000/notes/errata.pdf.
3 This step was needed since the estimates procedure produces
annual estimates and since the target reference date for each estimate
year is July 1.
4 A description of the methodology used to produce these
estimates can be found on the Census Bureau website at
http://www.census.gov/population/www/documentation/twps0064.html.
5 A description of the methodology used to produce these
estimates can be found on the Census Bureau website
http://www.census.gov/population/www/documentation/twps0063.html.
6 For a description of this method, see the following URL:
http://www.census.gov/popest/topics/methodology/2003_st_char_meth.html.
7 In the process, when the number of exemptions for any
age-sex-race-Hispanic origin category (which will be referred to as a
cell) was low, the exemptions were combined with those of other cells in
order to improve the robustness of the resulting migration rates.
Subsequently, each of the individual groups that were combined to make
up the aggregation was assigned the rate or proportion of the aggregated
group.
8 Technically, the quotient produced by this procedure is the
probability of moving. However, we make the simplifying assumption that
the migration-rate is the same as the probability of moving.
9 Developed during the production of state characteristics
estimates.
10 The reason for separating the process of converting the
household and GQ data from 4 to 31 races is that the GQ data are processed
by GQ type so that they may be used in estimates not described in this
document. |