Methodology

ESTIMATES AND PROJECTIONS AREA METHODOLOGY
COUNTY POPULATION ESTIMATES BY AGE, SEX, RACE, AND HISPANIC ORIGIN FOR JULY 1, 2002

PDF Version of this methodology

BACKGROUND

The U.S. Census Bureau produces estimates of the resident population by age, sex, race and Hispanic origin for 3,141 counties in the United States on an annual basis. The following documentation outlines the methodology that was used in the production of the July 1, 2002 resident population estimates by age, sex, race, and Hispanic origin for all counties in the United States.

OVERVIEW

The Census Bureau develops county population estimates with a demographic procedure called a cohort-component population estimation method. This method essentially follows each birth cohort according to its exposure to mortality, fertility, and migration. The cohort-component method is based on the traditional demographic accounting system method and is described in more detail below. A major assumption underlying this approach is that the components of population change can be closely approximated by administrative data in a demographic change model. In order to apply the model, Census Bureau demographers estimate each component of population change separately. For the population residing in households the components of population change are births, deaths, and net migration, including net international migration. For the non-household population, change is represented by the net change in the population living in group-quarters facilities.

Each component in our model is represented with administrative data that are symptomatic of some aspect of population change. For example, birth certificates are symptomatic of additions to the population resulting from births, so we use these data to estimate the birth component for a state. Other components are derived from death certificates, Internal Revenue Service (IRS) data, Medicare enrollment records, Armed Forces data, group-quarters population data, and data derived from the American Community Survey (ACS), Social Security files, Census 2000 data, and other internal Census Bureau data are used to estimate some of the demographic details (age, sex, race, and Hispanic origin) for counties.

METHOD

The cohort-component method is based on the traditional demographic accounting system. Starting with a base population, deaths are subtracted from the population and births are added to the population, forming new cohorts. Estimates of net international migration and net internal migration are added to or subtracted from the population. The components of change are measured separately by age, sex, race, and Hispanic origin for each state and added to the base population as follows:

P1 = P0 + B - D + NDM + NIM

Where:

P1 = population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net internal migration during the period
NIM = net international migration during the period

As described above, administrative records, such as birth and death records as well as data derived from tax returns, are used to estimate the components of population change. Net migration is calculated using several components including: net internal migration, net foreign-born international migration, net movement to/from Puerto Rico, net movement of federal and civilian citizens, the change in group-quarters population, and native emigration from the United States. Because the group quarters (GQ) population experiences somewhat different demographic processes, the GQ population is removed from the resident base population and estimated separately.

In the process of developing the July 1, 2002 estimates, revised estimates of the July 1, 2001, and July 1, 2000 state population with demographic detail were produced. The revised estimate for 2001 and 2000 incorporates actual data for the demographic components that were not previously available and includes updates or corrections to the data previously used. In cases where we do not have data for all counties for the current estimate year (2002), we estimate the components of population change based on one or more simplifying assumptions. When we develop the current population estimates, we use the same variant of the component model with these simplifying assumptions. In the creation of subsequent population estimates, we will replace these population estimates with "revised" population estimates based on actual data not yet received or corrected data.

One of the guiding principles in the Census Bureau’s subnational methodology is that all of our population estimates are consistent. This means that the sum of the county estimates must be equal to the independently produced state characteristics population estimates. This consistency is required for all demographic characteristics produced. While this consistency is essential in the production and interpretation of the population estimates, it does add an additional layer of complexity to their development.

The methodology used to produce the July 1, 2002 estimates is described next.

STEP 1: SPECIFICATION OF THE BASE POPULATIONS

The first step was to subtract the GQ population from the Census 2000 resident population to develop a base population that consists of two pieces (the household population and the population residing in group quarters). Both pieces of the base population contain full demographic detail (age, sex, race, and Hispanic origin) for each state and the District of Columbia.

1A. Base Household Population

1A.i. The Census 2000 household population (obtained by subtracting the GQ population from the resident population) is the starting point for the July 1, 2002 state population estimates. The inclusion of demographic detail in the development of the county population estimates adds an additional layer of complexity to the estimation method. Census 2000 was the first census to allow multiple responses to the race question, but the administrative data sources used to estimate the components of change (births, deaths, and migration) were not available for all 31 races. Therefore, the Census 2000 base household population was converted from the 31 race combinations to the four race groups consistent with the 1990 Census. Then the July 1, 2002 population estimates were produced for the four race categories consistent with the 1990 Census. Finally, the July 1, 2002 estimates were converted to 31 races to be consistent with Census 2000 (see Step 6).

The conversion of Census 2000 race categories to 1990 Census race categories was based on proportional allocation. This assumes the simplifying assumption that multiple race responses in Census 2000 would be evenly distributed between the comparable single race responses allowed in the 1990 Census. For example, the Census 2000 population in the three race categories of "White Alone," "Black Alone," and "White and Black" were converted into two of the 1990 Census race categories, "White" and "Black". The entire White Alone population enumerated in Census 2000 was assigned to White category and the entire Black Alone population was assigned to the Black category. Based on the assumption of straight proportion allocation, half of the Census 2000 "White and Black" population were assigned to the White race category and half of the population were assigned to the Black race category. These assignments are done at the county level, by sex and Hispanic origin.

The assumption of proportional allocation is the best available assumption at this time, though future estimates may not require this conversion or may be based on different distributions to the single races. See Step 6 for the conversion of the July 1, 2002 estimates for four races back to the 31 race categories.

1A.ii. Because the Census 2000 reference date is April 1, 2000 and the estimate periods are July 1 to June 30, a base population was calculated for July 1, 2000 using the July 1, 2000 national estimates with full demographic detail (by age, sex, race, and Hispanic origin) and July 1, 2000 state estimates for the age groups 0 to 64 and 65 and over. A ratio method was used to calculate July 1, 2000 state population estimates by age, sex, race, and Hispanic origin. This method applied the age, sex, race, and Hispanic origin distribution for counties from Census 2000 to the July 1, 2000 state population estimates for the two age groups to develop initial July 1, 2000 estimates with demographic detail. The national estimates by age, sex, race, and Hispanic origin were applied as controls to the initial estimates to generate state characteristics estimates that sum to equal the national distribution of demographic characteristics, the total population for counties for ages 0 to 64 and 65 and over.

The July 1, 2000 estimates then serve as the base population for the July 1, 2001 and July 1, 2002 estimates are produced using the cohort-component method. The July 1, 2000 estimates could not be calculated using the cohort-component method because the administrative records used in the cohort-component method are available for calendar years and not the three-month time period from April 1, 2000 to July 1, 2000.

1B. Base Group-quarters Population

The group-quarters (GQ) population component is primarily a combination of military personnel living in barracks, college students living in dormitories, and persons residing in institutions. Inmates of correctional facilities, persons in health care facilities, persons in Job Corps Centers, and persons residing in nursing homes are also included in this category.

1B.i. The Census 2000 group quarters population is the starting point for the July 1, 2002 county population estimates. First, the Census 2000 GQ data for the 31 race groups were converted to the four race groups consistent with the 1990 Census as described above.

1B.ii. The base group-quarters population for the July 1, 2002 estimates was the revised group-quarters population from July 1, 2001. The July 1, 2001 GQ estimates were calculated starting with the Census 2000 GQ population by age, sex, race, Hispanic origin, and seven group quarters types for each county. States provide updated information on the total GQ population by GQ type to the Census Bureau each year. The Census 2000 age, sex, race, and Hispanic origin distributions of the GQ population by county and type were applied to the July 1, 2001 total GQ populations reported by the Federal-State Cooperative Program for Population Estimates (FSCPE) to produce GQ population estimates for counties with demographic detail. These estimates were revised with any updated GQ data for July 1, 2001 to serve as the GQ base population for the July 1, 2002 estimates.

STEP 2: SPECIFICATION OF BIRTHS AND DEATHS (VITAL STATISTICS) COMPONENTS

2A. The birth and death components are calculated from three sources of data. Files containing all registered births and deaths that occurred to U.S. residents during the estimate period are obtained from the National Center for Health Statistics (NCHS). The birth files contain the total numbers of birth in a calendar year by state and county of mothers’ residence, sex, race, and Hispanic origin. The NCHS death files contain the total numbers of deaths by sex, race, Hispanic origin, age at death, and state and county of residence at death. The FSCPE also report annual numbers of registered births and deaths by sex, race, Hispanic origin, age at death, and state and county of residence at death or county of mothers residence at birth. A reconciliation process occurs between the NCHS and FSCPE vital statistics. In general, we believe that the total demographic characteristics distribution of data from the NCHS file is more accurate due to its national coverage, while the geographic distribution of data from FSCPE files are more accurate due to its more specific local knowledge.

It is assumed that the vital statistics files represent complete counts of births and deaths for the resident population. No adjustments are made for undercoverage or differential coverage by counties, age, race, or Hispanic origin.

2B. After the NCHS and FSCPE figures are reconciled, they are controlled to the national estimates of the numbers of births and deaths by sex, race, Hispanic origin, and age at death developed as part of the national population estimates for the same time period.

2C. Finally, the births are added to the base population for each year (July 1, 2000 and July 1, 2001) and the deaths are subtracted from the base population. As the vital statistics are provided for calendar years, assumptions are made to apply the birth and deaths to the July 1 reference dates.

STEP 3: SPECIFICATION OF NET INTERNATIONAL MIGRATION

We estimate the net international migration to/from the United States from several sources. Our estimate includes the net foreign-born international migration, net movement to/from Puerto Rico, net federal and civilian citizen movement, and native emigration. With the exception of military station strength data, the majority of these data are developed first at the national level (e.g., international migration components, totals by age, sex, race, Hispanic origin). These national and demographic characteristics totals are distributed to states and counties using Census 2000 proportions.

National-level July 1, 2000, July 1, 2001 and July 1, 2002 estimates of the level of net foreign-born international migration for each year are distributed to counties based on the state distribution of the foreign-born population who entered the U.S. during the 5 years prior to April 1, 2000 by country of birth from Census 2000

National-level July 1, 2000, July 1, 2001 and July 1, 2002 estimates of the total net movement of the population to or from Puerto Rico for each year are distributed to counties based on the counties distribution of the Puerto Rican population from Census 2000.

The number of Armed Forces personnel stationed at military bases is supplied by the each branch of the Armed Forces. National-level July 1, 2000, July 1, 2001 and July 1, 2002 estimates of the total federal and civilian citizen movement of the population for the current estimate period are derived by applying the national-level data by age, sex, race, and Hispanic origin to the station strength data to develop county estimates with demographic detail.

National-level July 1, 2000, July 1, 2001 and July 1, 2002 estimates of the total number of foreign-born emigrants from the United States for each year are distributed to counties based on the distribution of the foreign-born population from Census 2000 by country of birth.

STEP 4: SPECIFICATION OF NET INTERNAL MIGRATION

Step 4A. Match of Tax Returns to create counts of exemptions (filers and dependents) who migrate by demographic characteristics

4A.i. For the July 1, 2002 estimates the component of internal migration was developed using data from two administrative record sources: annual extracts of tax returns provided by the Internal Revenue Service (IRS) linked by Social Security Number across successive years; and the Census Numident file, derived from the Social Security Administration 100 percent file (SSA). In order to ensure confidentiality and privacy, these data sets are matched by SSN/PIK (Protected Identification Key) and are referred to jointly as IRS-SSA data. The IRS 1040 tax return records were matched to the SSA data to identify the age, sex, race, and Hispanic origin of the tax filers. A number of assumptions were made to assign demographic characteristics to spouses and dependents. Exemptions claimed for children were assigned to the under 20 age group and exemptions claimed for parents were assigned to age 65 and over. Sex was assigned randomly for exemptions. Spouses were assigned the same age and the opposite sex as filers. All spouses and exemptions were assigned the same race and Hispanic origin as filers.

4A.ii. After the demographic characteristics are added to the IRS tax return records, two years of records are matched by SSN/PIK to determine the migration status. Filers (and their dependents) with a change in the state of residence between the two periods were identified as "Inter-State" migrants. Filers (and their dependents) with a change in the county of residence (but not their state of residence) between two periods were identified as "Intra-State" migrants. Otherwise, if there was no change in the state or the county of residence, the filers (and dependents) were identified as non-migrants.

4A.iii. Migration rates are computed from ratios of the number of exemptions with addresses in different counties to the total number of exemptions in the counties. The rates are applied to the July 1, 2000 and July 1, 2001 base populations by age, sex, race, and Hispanic origin to generate a pool of total migrants for the nation. Then the migrants were allocated to destination counties according to the proportions of exemptions moving to those counties.

Because of the potentially large number of origin-characteristic combinations, a few simplifying assumptions were required in the production of the July 1, 2002 estimates. Inter-state and intra-state migrants were calculated separately as documented below. Further, It was necessary in some cases to combine individual origin-characteristic categories (which will be referred to as cells) to improve the robustness of the data. If a given cell had less than 30 exemptions, then it was combined with adjacent age cells within the same origin-ethnicity-race-sex group until the combined category contained at least 30 exemptions. If it was not possible to create a combined category containing at least 30 exemptions within an origin-ethnicity-race-sex group, then cells were combined for both sexes. When individual ages were combined to compute a migration probability, each of the ages was assigned the probability for the aggregated age group.

Step 4B. Calculate Inter-State Out-Migration Rates and Inter-State Out-Migrant Population

The Inter-State out-migration probability was calculated from the total number of Inter-State migrant exemptions from that state divided by the total number of exemptions (migrants and non-migrants) in the same state. These rates were calculated by age, sex, race, and Hispanic origin. The estimated number of Inter-State out-migrants was calculated by multiplying the Inter-State out-migration rate for each set of demographic characteristics for each state by the applicable base population for each state.

Step 4C. Calculate In-Migration Proportion and In-Migrant Population

From the matched records the destinations of migrants by demographic characteristics can be determined. The numbers of out-migrants calculated from Step 4B were distributed as in-migrants to states by applying proportions of total in-migrant exemptions who moved to each state. These proportions were calculated by age, sex, race, and Hispanic origin. The numbers of out-migrants by state and characteristics were subtracted from the base population for each time period and the numbers of in-migrants were added to the base population.

Step 4D. Calculate Intra-State Out/In-Migration Proportions and Intra-State Out/In Migrant Population

While it is theoretically possible to handle intra-state migration in the same manner as inter-state migration, the combinations of 3,141 counties and the demographic characteristics would require millions of cells, many with too few observations to produce reliable estimates. Thus, for practical purposes, intra-state migration is handled differently. Instead of calculating an out-migration probability from the exemptions, and then calculating the in-migration proportion by demographic characteristics. Both intra-state out-migration and intra-state in-migration are calculated as a proportion of the total intra-state migrant exemptions by demographic characteristics for the state. Thus, the first step is to calculate the county’s share of the state’s total intra-state out-migration exemptions by demographic characteristics, as well as the county’s share of the state’s total intra-state in-migration exemptions by demographic characteristics. These proportions are then applied to the county population to calculate the population of intra-state out-migrants and intra-state in-migrants.

STEP 5: PROCESSING OF GROUP QUARTERS POPULATION

GQ population change was estimated separately from the demographic accounting procedure described above. This was done primarily because of the uniqueness of this subpopulation and the special difficulties of estimating the GQ population. The Census 2000 GQ data has full demographic detail (age, sex, race, and Hispanic origin) and information on type of GQ residence classified into seven types: (1) Correctional Facilities; (2) Juvenile Institutions; (3) Nursing Homes; (4) Other Institutional Facilities; (5) College Dorms; (6) Military Barracks; and (7) Other Non-Institutional GQ. The GQ population is updated for each estimate year using information on the total GQ population by GQ type collected by state agencies through the FSCPE. As with the July 1, 2001 base population, the Census 2000 age, sex, race, and Hispanic origin distributions of the GQ population by county and type were applied to the July 1, 2002 total GQ populations reported by counties to produce GQ population estimates for counties with demographic detail.

STEP 6: GENERATE RESIDENT POPULATION ESTIMATES BY DEMOGRAPHIC CHARACTERISTIC

6A. Prior to combining the July 1, 2000 (revised), July 1, 2001 (revised), and July 1, 2002 group quarters and household population estimates, each set of estimates were converted from the four race groups consistent with the 1990 Census to the 31 race groups consistent with Census 2000 by applying conversion factors. Continuing the example from Step 1A.i., the estimated "White" population was apportioned to the "White Alone" and "White and Black" populations. The estimated July 1, 2002 White population was multiplied by the ratio of the White Alone population from Census 2000 to the sum of the White Alone and half of the "White and Black" population from Census 2000 to produce the July 1, 2002 estimate for the White Alone population. The estimated July 1, 2002 White population was multiplied by the ratio of the "White and Black" population from Census 2000 to the sum of the White Alone and half of the "White and Black" population from Census 2000 to produce part of the July 1, 2002 estimates for the White and Black population. The remaining part of the July 1, 2002 "White and Black" population was obtained by applying comparable ratios to the July"1, 2002 estimates Black population.

6B. GQ and household estimates of the population were summed by county, age, sex, race, and Hispanic origin within counties. The preliminary estimates were compared with independently calculated county total population estimates, and national estimates by age, sex, race and Hispanic origin. The final resident population estimates were adjusted to equal the independent totals by multiplying each estimate by the ratio of the independent totals to the sum of the relevant estimates. These adjusted estimates were rounded to whole numbers for each combination of demographic characteristics within counties and compared with the independent totals.

In our quality control checks, we noted 146 observations where the summed county characteristics did not equal exactly the separately produced national characteristics estimates, while the sums of 31,846 other county characteristics combinations equaled the independently produced totals exactly. This was a product of controlled rounding. The largest discrepancies were: -566 and 689 for July 1, 2000 estimates; -504 and 650 for July 1, 2001 estimates; and -438 and 584 for July 1, 2002 estimates. For each year of estimates, the mean of the differences was zero. Further examination showed that these discrepancies occurred for age 84 years and the age group 85 and over, and were largest for the White Alone and the "White and Black" race groups.


1 The 31 race combinations include single responses for White, Black or African American, American Indian and Alaskan Native, Asian, and Native Hawaiian and other Pacific Islander; and all combinations of two or more of the five race groups. From this point on, the response Black or African American will be referred to as Black, and the response Native Hawaiian and other Pacific Islander will be referred to as Pacific Islander.