Estimates And Projections Area Methodology
County Population Estimates By Age, Sex, Race, And Hispanic Origin For
July 1, 2005
PDF
Version of this methodology
The U.S. Census Bureau produces
estimates of the resident population by age, sex, race, and Hispanic origin for
each county in the United States and the District of Columbia on an annual
basis. The following documentation outlines the methods that were used in the production
of the July 1, 2005 estimates.
OVERVIEW
For the July 1, 2005 county
estimates of the resident population by age, sex, race, and Hispanic origin,
the Census Bureau used a distributive cohort component method. This method was
applied in the following manner. First, we started with previously developed
resident county population estimates by age (0-64 and 65+) and resident state
population estimates by age, sex, race, and Hispanic origin. Second, we
estimated the age, sex, race, and Hispanic origin distributions for each county
by estimating post-censal change in the corresponding populations with a cohort
component model. Third, we applied these distributions to the original county
estimates by age and state characteristics estimates. A detailed discussion of
this method is provided below.
Estimating the Age, Sex, Race, and Hispanic Origin Distributions
The majority of the work that went
into producing the county resident population estimates by age, sex, race, and
Hispanic origin consisted of estimating the age, sex, race, and Hispanic origin
distributions for each county age estimate. This was done by producing a
preliminary set of age, sex, race, and Hispanic origin estimates for each
county and then calculating the age-sex-race-Hispanic origin proportions from
these.
The preliminary set of resident
population estimates by age, sex, race, and Hispanic origin was produced by
first splitting the Census population into two mutually exclusive universes:
The household population and the group quarters (GQ) population. The 2005
estimates differ from previous vintages in that this vintage treats all college
students as part of the GQ population.
In previous vintages college students living in dormitories were considered
part of the GQ population but all other college students were considered part
of the household population. The cohort component technique was used to
estimate change in the household population. For the GQ population, change was
estimated through a data-collection effort conducted in conjunction with
members of the Federal State Cooperative Program of Population Estimates
(FSCPE). The resulting household and GQ estimates were then added together to
produce the new set of preliminary resident population estimates.
The Preliminary Household Population Estimates
The cohort-component technique used
to estimate the household population follows each birth cohort across time
according to its exposure to mortality, fertility, and migration. This
technique was applied using the following equation.
P1 = P0 + B - D + NDM + NIM + NMM
Where:
P1 = population at the end of the
period
P0 = population at the beginning of
the period
B = births during the period
D = deaths during the period
NDM = net domestic migration during
the period
NIM = net international migration
during the period
NMM = net military movement during
the period
The actual application of this
algorithm was quite simple. Not so simple was the preparation of the input
data. Great care was taken to estimate each component of this equation as accurately
as possible. The details of this work are outlined below.
One complicating factor in the
estimation of the household population was that the administrative data used in
these estimates continues to come to the Census Bureau with different race categories
than used in Census 2000. In Census 2000, information was gathered using 6
races groups (White; Black; American Indian and Alaska Native; Asian; Native
Hawaiian and Pacific Islander; and Some Other Race). In addition, individuals
were allowed to report multiple races. Conversely, most administrative data
available for the estimates still come to us in the 4 race categories
consistent with the 1990 census (White; Black; American Indian, Eskimo, or
Aleut; and Asian and Pacific Islander). For this reason, the household
estimates were processed with the 4 race categories consistent with the 1990
census and the results were converted to race categories consistent with Census
2000. Details of all the conversions needed in order to carry out this procedure
are presented below.
Specification of the Base Population
The enumerated population from
Census 2000 was the base for the July 1, 2005 estimates. This population was
modified in four ways to prepare it for inclusion in the cohort-component
technique.
- The original race data
from the Census were modified to eliminate the "some other
race" category.1
- The April 1, 2000
population estimates base reflects modifications to the Census 2000
population as documented in the Count Question Resolution program and errata
notes.2
- The Census 2000 base
household population was converted from the 31 race categories to the
four race groups consistent with the 1990 Census by "straight
proportional allocation."
- Finally, the results
of step [c] were used to estimate the population on July 1, 2000.3
This was done as follows:
- A set of
age-sex-race-Hispanic origin proportions was calculated by summing the
data by county and age (0 to 64 and 65+) and then dividing each
age-sex-race-Hispanic origin cell by the corresponding sum.
- These proportions
were then applied to the previously produced county age estimates for
July 1, 2000.
- Finally, the results
were controlled to the July 1, 2000 state estimates by age, sex, race,
and Hispanic origin.
Specification of Births
The birth component was
calculated with data from two sources. The Federal State Cooperative Program
for Population Estimates (FSCPE) provided data on all registered births that
occurred in each county of the members' respective state for calendar years
2000-2003. In addition, they provided births for their state as a whole for
calendar year 2004, which were subsequently distributed by county using the
distribution from the FSCPE 2003 births. After this, the FSCPE births for each
year were adjusted and distributed by sex, race (consistent with the 1990
Census), and Hispanic origin using data from The National Center for Health
Statistics (NCHS). Finally, data for the last 6 months of the estimate period
(January 1 - June 30, 2005) were assumed to be equal to the number of births
from July 1 - December 31, 2005. These final birth counts were considered to be
complete for the resident population. No adjustments were made for under
coverage or differential coverage by county, sex, race, or Hispanic origin.
Specification of Deaths
The death component was also
calculated with data from two sources. The Federal State Cooperative Program
for Population Estimates (FSCPE) provided data on all registered deaths that
occurred in each county of the members' respective state for calendar years
2000-2003. In addition, they provided deaths for their state as a whole for
calendar year 2004, which were subsequently distributed by county using the
distribution from the FSCPE 2003 deaths. After this, the FSCPE deaths for each
year were adjusted and distributed by age, sex, race (consistent with the 1990
Census), and Hispanic origin using data from the National Center for Health
Statistics (NCHS). Finally, data for the last 6 months of the estimate period
(January 1 - June 30, 2005) were assumed to be equal to the number of deaths
from July 1 - December 31, 2004. These final deaths counts were considered to
be complete for the resident population. No adjustments were made for under
coverage or differential coverage by county, age, sex, race, or Hispanic
origin.
Specification of Net International Migration
The net international migration
component consisted of three migration flows: (1) net migration of the
foreign-born, (2) emigration of natives, and (3) net movement from Puerto Rico
to the United States.
To measure the net migration of the
foreign born, we used the American Community Survey (ACS) because it provided
annually updated data. We first determined the national level of migration for
the foreign born by calculating the net difference in the estimates of these
surveys from 2000 to 2001, 2001 to 2002, 2002 to 2003, and 2003 to 2004. We
then accounted for deaths to the entire foreign-born population during the
periods of interest to arrive at the final national estimate of net migration of
the foreign born. In order to account for variability due to small sample sizes
of the foreign born, we used a moving average for the period changes to get the
final net foreign-born migration estimate. Then, to ascribe county of
destination, age, sex, race, and Hispanic origin to these estimates, we applied
the distribution of the non-citizen foreign born from Census 2000 who entered
in 1995 or later to the national-level estimate. Finally, we assumed the net
migration of the foreign born between 2004 and 2005 to be the same as net
migration between 2003 and 2004.
The national net movement from
Puerto Rico to the United States by age and sex was measured using levels
observed during the 1990s.4 To assign characteristics to these
flows, we applied the age-sex-race-Hispanic origin-destination county
distributions from Census 2000 from those who indicated that their place of
birth was Puerto Rico and who had entered the United States in 1995 or later.
The emigration of natives was
produced in a similar way to the net movement from Puerto Rico. Again, the
national levels of movement were measured using levels observed during the
1990s.5 Then, to assign characteristics to these flows, we applied
the age-sex-race-Hispanic origin-destination county distributions from Census
2000 of all natives who currently reside in the United States. Therefore, the
characteristics of natives who emigrated were assumed to be the same
age-sex-race Hispanic origin-destination county distribution as natives
residing in the 50 states and the District of Columbia in Census 2000.
Once the net migration of the
foreign born, net movement from Puerto Rico, and the emigration of natives were
estimated, all three parts were combined to estimate a final net international
migration component.
Specification of Net Domestic Migration
The input data used for the
estimation of county net domestic migration came from two administrative
sources: annual individual-level extracts of tax returns provided by the
Internal Revenue Service (IRS); and the Census Numident file derived from the
Social Security Administration 100 percent file (SSA). The IRS 1040 tax return
records were matched to the SSA data to identify the age, sex, race, and
Hispanic origin of each tax filer, their spouse, and their dependents. After
this, two years of records were matched and a migration status was assigned to
each filer. Filers with a change in residence across county boundaries between
the two periods along with their spouses and dependents were identified as
"domestic migrants." If there was no change in the county of residence,
the filers and their spouses and dependents were identified as non-migrants.
While it would have been
theoretically possible to estimate county domestic in-migration and
out-migration by calculating county-to-county flows, we decided against this
based on the nature of the input data and the calculations required to
transform them into the estimates of net migration. In short, since the IRS
exemption data reflected only a portion of the population, we needed to convert
these data into rates and then apply them to an estimate of the entire
population. At the county level, the creation of rates for each
demographic-characteristic group (age, sex, and race-origin) within all 3,141
counties would have resulted in millions of cells of data, most with values
either equaling zero or too small to produce robust estimates of migration.
Therefore, all calculations of the number of domestic migrants for the July 1,
2005 county characteristics estimates occurred at the state level and then were
distributed to the counties by proportional allocation.
The process of estimating the net
number of domestic migrants for each county was comprised of four steps. First,
the total number of domestic migrants crossing county borders within each state
was estimated. This was done with a process parallel to the one used to
estimate the number of inter-state migrants for each state and occurred in the
following manner.6 To begin, an inter-county migration rate for each
age-sex-race-Hispanic origin group within a state was computed by dividing the
total number of exemptions associated with filers who moved across county
borders during the period by the number of exemptions associated with filers in
the state as a whole at the beginning of the period.7 8 Next, the
rates were smoothed across ages using a moving average. Finally, the rates were
applied to an estimate of the state population by characteristics at the
beginning of the period from the state characteristics estimates to generate
the number of domestic within-state county migrants by age, sex, race, and
Hispanic origin.
The second step in the process of
estimating the net number of domestic migrants for each county was to
distribute these estimates to the counties within each state. To do this, two
ratios were calculated for each county from the original IRS data; (1) an
out-migration ratio, or the number of exemptions associated with filers who
moved out of a given county and into all other counties within the state to the
total number of exemptions associated with filers who moved across county
boundaries within the state, and (2) an in-migration ratio, or the number of
exemptions associated with filers who moved into a given county from all other
counties within the state to the total number of exemptions associated with
filers who moved across county boundaries within the state. After this, these
ratios were applied to the total number of domestic migrants crossing county
borders within each state to arrive at the number of domestic within-state
out-migrants and in-migrants for each county.
The third step in the process of
estimating the net number of domestic migrants for each county was to
distribute the total number of domestic migrants going out of and coming into each
state to each county within the state.9 This was done in a way that
paralleled the distribution of the total number of domestic migrants crossing
county borders within each state. To begin, two ratios were calculated for each
county from the original IRS data; (1) an out-migration ratio, or the number of
exemptions associated with filers who moved out of a given county and into all
other states to the total number of exemptions associated with filers
who moved out of the state, and (2) an in-migration ratio, or the number of
exemptions associated with filers who moved into a given county from all other states
to the total number of exemptions associated with filers who moved into the
state. After this, these ratios were applied to the total number of domestic
migrants exiting and entering the state, respectively, to arrive at the number
of inter-state out-migrants and inter-state in-migrants for each county.
The final step in the process of
estimating the net number of domestic migrants was simply to sum the numbers of
domestic within-state out-migrants and inter-state out-migrants by county to
get the total number of out migrants for each county, and to sum the numbers of
domestic within-state in-migrants and inter-state in-migrants by county to get
the total number of in migrants for each county.
Specification of Net Military Movement
The net movement of the military,
both foreign and domestic, into each county for each year was estimated using
data received directly from the armed forces and the Department of Defense.
They provided yearly estimates of the station strength in each state from 2000
to 2005. The net-movement was then calculated as the difference in the station
strength from one year to the next. Finally, the county, age, sex, race, and Hispanic
origin distribution was assigned using those who reported being employed by the
military in Census 2000.
The Preliminary GQ Population Estimates
Group Quarters (GQ) population
change is estimated separately from the household population because of the
unique character of this subpopulation and the ability to acquire direct data
that reflects change in this population. The technique for estimating the GQ
population starts with the Census 2000 enumerated GQ population. As with the
household population, the race breakdowns of the GQ data are converted from
categories consistent with Census 2000 to categories consistent with 1990
through proportional allocation (see above).
Next, the state representatives who
participate in the Federal-State Cooperative Program for Population Estimates
(FSCPE) developed an independent list of GQ facilities in their state by county
with the populations typically associated with them at the time of Census 2000
and annually from 2000 to 2005 from the sources available to them in their
state. In turn, the Census Bureau calculated the implied change in the GQ
population from the numbers provided by the FSCPE members. This change was then
applied to the Census GQ base to come up with the estimate of the total GQ
population in each county. Finally, these county totals were distributed by
age, sex, race, and Hispanic origin using the distribution of the GQ population
within each GQ type (e.g. college dormitories) from the base GQ population.
An additional step has been added
for this set of estimates. The GQ
population has always been defined to include college students living in
dormitories, but in the past non-dorm students has been treated as part of the
household population. However, our
research has indicated that non-dorm students are more like the GQ population
than the household population.
Consequently the 2005 estimates treated all college students as part of
the GQ population, which required that we develop an estimate of non-dormitory
college population to add to the dormitory population estimates developed as
part of the GQ estimation process described above. Census 2000 was used to
provide, by age, sex, race, and Hispanic origin, the number of college students
in each county who were not living in dormitories, and these numbers were
converted into 1990-consistent categories.
Because we have no additional data sources for this population, we were
forced to assume that it remains constant over time and add these numbers to
each of the annual GQ estimates from the calculations described above.
The Final Population Estimates by Demographic Characteristic
The final steps in the production
of the vintage 2005 county characteristics estimates consisted of 1) adding
together the household and GQ population estimates for each year, 2)
calculating the necessary proportions from the preliminary estimates and
applying them to both the previously created county population estimates by age
(0-64 and 65+) and the State estimates by age, sex, race, and Hispanic origin,
and 3) converting the data from the 4 race categories consistent with the 1990
census to the categories consistent with Census 2000.
In combining the household and GQ
population estimates, the only caveat that should be noted is that it was
assumed that there was no change to the GQ population between April 1, 2000 and
July 1, 2000.
The next step in the production of
the county estimates by age, sex, race, and Hispanic origin was to control the
preliminary resident county estimates by age, sex, race, and Hispanic origin to
both the previously created resident county population estimates by age (0-64
and 65+) and the state resident estimates by age, sex, race, and Hispanic
origin. This was done through a two-step iterative process.
In the first step, the preliminary
county characteristic estimates were summed to the state level by age, sex,
race, and Hispanic origin. Next, proportions were calculated by dividing each
county-age-sex-race-Hispanic origin cell by the corresponding sum. Finally,
these proportions were applied to the original state characteristics estimates
in order to calculate an intermediate set of county estimates by age, sex,
race, and Hispanic origin.
In the second step, a similar
procedure was used to calculate new proportions from the transformed data so that
the county age estimates (0 to 64 and 65+) could be distributed by age, sex,
race, and Hispanic origin. First, the preliminary county characteristic
estimates were summed by the appropriate age groups. Next, proportions were
calculated by dividing each county-age-sex-race-Hispanic origin cell by the
corresponding sum. Finally, these proportions were applied to the original
county age estimates in order to calculate an intermediate set of county
estimates by age, sex, race, and Hispanic origin.
When attempting to make estimates
consistent with two different sets of data using the procedure described above,
performing the second step of the iteration tends to distort the fit achieved
in the first step. Likewise, repeating the first step again tends to distort
the fit achieved in the second step. However, these distortions can be
minimized by repeating the two-step procedure multiple times. For this reason,
the iterative procedure was repeated five times. After this, the resulting set
of estimates was rounded to integers.
The final step in the estimates
process was to convert the estimates from the 4-race categories consistent with
the 1990 census in which the estimates were processed to the 31-race categories
consistent with Census 2000. To do this, the procedure used to go from 31 to 4
races was essentially reversed. First, the proportion of each 4-race category
associated with the 31 race categories was calculated for both universes using
Census 2000 household and GQ data by county, age, sex, and Hispanic origin.
Next, the GQ population for each estimate year was subtracted from the
corresponding resident population in order to arrive at the household
population for that estimate year. Then, the proportions for both the household
and GQ data were applied to the respective 4-race estimates for each year to
arrive at the 31-category race household and GQ estimates.10
Finally, the 31-category household and GQ estimates by age, sex, and Hispanic
origin were added together to arrive at the final resident county population
estimates by age, sex, race, and Hispanic origin.
1 This modification has
been accepted for all Census Bureau estimates products and is explained in the
document entitled "Modified Race Data Summary File Technical Documentation and
ASCII Layout" that can be found on the Census Bureau website at http://eire.census.gov/popest/data/national/tables/files/mod_race.php.
2 Details about the
Count Question Resolution Program can be found on the Census Bureau website at http://www.census.gov/dmd/www/CQR.htm.
Errata notes can be found on the Census Bureau website at http://www.census.gov/prod/cen2000/notes/errata.pdf.
3 This step was needed
since the estimates procedure produces annual estimates and since the target
reference date for each estimate year is July 1.
4A description of the
methodology used to produce these estimates can be found on the Census Bureau
website at http://www.census.gov/population/www/documentation/twps0064.html.
5A description of the
methodology used to produce these estimates can be found on the Census Bureau
website http://www.census.gov/population/www/documentation/twps0063.html.
6For a description of
this method, see the following URL: http://www.census.gov/popest/topics/methodology/2004_st_char_meth.html
7In the process, when
the number of exemptions for any age-sex-race-Hispanic origin category (which
will be referred to as a cell) was low, the exemptions were combined with those
of other cells in order to improve the robustness of the resulting migration
rates. Subsequently, each of the individual groups that were combined to make
up the aggregation was assigned the rate or proportion of the aggregated group.
8Technically, the
quotient produced by this procedure is the probability of moving. However, we
make the simplifying assumption that the migration-rate is the same as the probability
of moving.
9Developed during the
production of state characteristics estimates.
10The reason for
separating the process of converting the household and GQ data from 4 to 31
races is that the GQ data are processed by GQ type so that they may be used in
estimates not described in this document. |