Methodology

Estimates And Projections Area Methodology
State Population Estimates By Age, Sex, Race, And Hispanic Origin For July 1, 2003

PDF Version of this methodology

The U.S. Census Bureau produces estimates of the resident population by age, sex, race and Hispanic origin for each state in the United States and the District of Columbia on an annual basis. The following documentation outlines the methods that were used in the production of the July 1, 2003 estimates.

OVERVIEW

For the July 1, 2003 state estimates of the resident population by age, sex, race, and Hispanic origin, the Census Bureau used a proportional distribution method. This method was applied in the following manner. First, we started with previously developed resident state population estimates by age and sex and resident national population estimates by age, sex, race, and Hispanic origin. Second, we estimated the race and Hispanic origin distributions for the state-age-sex estimates using information about the post-censal change in the corresponding populations. Third, we applied these distributions to the original state age-sex and national characteristics estimates. A detailed discussion of this method is provided below.

Estimating the Race and Hispanic Origin Distributions

The majority of the work that went into producing the state resident population estimates by age, sex, race, and Hispanic origin consisted of estimating the race and Hispanic origin distributions for each state-age-sex estimate. This was done by producing a preliminary set of age, sex, race, and Hispanic origin estimates for each state and then calculating the race-Hispanic origin proportions from these.

The preliminary set of resident population estimates by age, sex, race, and Hispanic origin was produced by first splitting the Census population into two mutually exclusive universes: The household population and the group quarters (GQ) population. For the household population, a cohort component technique was then used to estimate change in this population. For the GQ population, GQ change was estimated through a data-collection effort conducted in conjunction with members of the Federal State Cooperative Program of Population Estimates (FSCPE). The resulting household and GQ estimates were then added together to produce the new set of preliminary resident population estimates.

The Preliminary Household Population Estimates

The cohort-component technique used to estimate the household population follows each birth cohort across time according to its exposure to mortality, fertility, and migration. This technique was applied using the following equation.

P1 = P0 + B - D + NDM + NIM + NMM

Where:

P1 = population at the end of the period
P0 = population at the beginning of the period
B = births during the period
D = deaths during the period
NDM = net domestic migration during the period
NIM = net international migration during the period
NMM = Net military movement during the period

The actual application of this algorithm was quite simple. Not so simple was the preparation of the input data. Great care was taken to estimate each component of this equation as accurately as possible. The details of this work are outlined below.

One complicating factor in the estimation of the household population was that the administrative data used in these estimates continues to come to the Census Bureau with different race categories than used in Census 2000. In Census 2000, information was gathered using 6 races groups (White; Black; American Indian and Alaska Native; Asian; Native Hawaiian and Pacific Islander; and Some Other Race). In addition, individuals were allowed to report multiple races. Conversely, most administrative data available for the estimates still come to us in the 4 race categories consistent with the 1990 census (White; Black; American Indian, Eskimo, or Aleut; and Asian and Pacific Islander). For this reason, the household estimates were processed with the 4 race categories consistent with the 1990 census and the results were converted to race categories consistent with Census 2000. Details of all the conversions needed in order to carry out this procedure are presented below.

  1. Specification of the Base Population

    The enumerated population from Census 2000 was the base for the July 1, 2003 estimates. This population was modified in four ways to prepare it for inclusion in the cohort-component technique.

    1. The original race data from the Census were modified to eliminate the "some other race" category.1
    2. The April 1, 2000 population estimates base reflects modifications to the Census 2000 population as documented in the Count Question Resolution program and errata notes.2
    3. The Census 2000 base household population was converted from the 31 race categories to the four race groups consistent with the 1990 Census by "straight proportional allocation."
    4. Finally, the results of step [c] were used to estimate the population on July 1, 2000.3 This was done as follows:
      1. A set of race-Hispanic origin proportions was calculated by summing the data by state, age, and sex and then dividing each state-age-sex-race-origin cell by the corresponding sum.
      2. These proportions were then applied to the previously produced state age-sex estimates for July 1, 2000.
      3. Finally, the results were controlled to the July 1, 2000 national estimates by age, sex, race, and Hispanic origin.

  2. Specification of Births

    The birth component was calculated with data from two sources. The Federal State Cooperative Program for Population Estimates (FSCPE) provided data on all registered births that occurred in the members’ respective state for calendar years 2000-2002. The FSCPE births were adjusted and distributed by sex, race (consistent with the 1990 Census), and Hispanic origin using data from The National Center for Health Statistics (NCHS). Finally, data for the last 6 months of the estimate period (January 1 - June 30, 2003) were assumed to be equal to the number of births from July 1 - December 31, 2002. These final birth counts were considered to be complete for the resident population. No adjustments were made for under coverage or differential coverage by state, sex, race, or Hispanic origin.

  3. Specification of Deaths

    The death component was also calculated with data from two sources. The Federal State Cooperative Program for Population Estimates (FSCPE) provided data on all registered deaths that occurred in the members’ respective state for calendar years 2000-2002. The FSCPE deaths were adjusted and distributed by age, sex, race (consistent with the 1990 Census), and Hispanic origin using data from The National Center for Health Statistics (NCHS). Finally, data for the last 6 months of the estimate period (January 1 - June 30, 2003) were assumed to be equal to the number of deaths from July 1 - December 31, 2002. These final death counts were considered to be complete for the resident population. No adjustments were made for under coverage or differential coverage by state, age, sex, race, or Hispanic origin.

  4. Specification of Net International Migration

    The net international migration component consisted of three migration flows: (1) net migration of the foreign-born, (2) emigration of natives, and (3) net movement from Puerto Rico to the United States.

    To measure the net migration of the foreign born, we used the American Community Survey (ACS) because it provided annually updated data. We first determined the national level of migration for the foreign born by calculating the net difference in the estimates of these surveys from 2000 to 2001 and 2001 to 2002. We then accounted for deaths to the entire foreign-born population during the periods of interest to arrive at the final national estimate of net migration of the foreign born. Then, to ascribe county of destination, age, sex, race, and Hispanic origin to these estimates, we applied the distribution of the non-citizen foreign born from Census 2000 who entered in 1995 or later to the national-level estimate. Finally, we assumed the net migration of the foreign born between 2002 and 2003 to be the same as net migration between 2001 and 2002.

    The national net movement from Puerto Rico to the United States by age and sex was measured using levels observed during the 1990s.4 To assign characteristics to these flows, we applied the age-sex-race-Hispanic origin-destination county distributions from Census 2000 from those who indicated that their place of birth was Puerto Rico and who had entered the United States in 1995 or later.

    The emigration of natives was produced in a similar way to the net movement from Puerto Rico. Again, the national levels of movement were measured using levels observed during the 1990s.5 Then, to assign characteristics to these flows, we applied the age-sex-race-Hispanic origin-destination state distributions from Census 2000 of all natives who currently reside in the United States. Therefore, the characteristics of natives who emigrated were assumed to be the same age-sex-race Hispanic origin-destination state distribution as natives residing in the 50 states and the District of Columbia in Census 2000.

    Once the net migration of the foreign born, net movement from Puerto Rico, and the emigration of natives were estimated, all three parts were combined to estimate a final net international migration component.

  5. Specification of Net Internal Migration

    For the July 1, 2003 estimates, internal migration was estimated using data from two administrative record sources: annual individual-level extracts of tax returns provided by the Internal Revenue Service (IRS); and the Census Numident file derived from the Social Security Administration 100 percent file (SSA).

    The IRS 1040 tax return records were matched to the SSA data to identify the age, sex, race, and Hispanic origin of the tax filers. Next, demographic characteristics were assigned to spouses and dependents of each filer using several simplifying assumptions. First, spouses were assigned the same age and the opposite sex as filers. Second, exemptions claimed for dependent children were assigned to the under-20 age group and exemptions claimed for dependent parents were assigned to the 65 and over age group. Third, the sex of dependent children and parents was assigned randomly. Fourth, the spouse and other dependents were assigned the same race and Hispanic origin as the filer.

    After the demographic characteristics were assigned to the IRS tax return records, two years of records were matched and a migration status was assigned. Filers and their dependents with a change in the state of residence between the two periods were identified as "inter-state migrants." If there was no change in the state of residence, the filers and dependents were identified as non-migrants.

    It was now possible to calculate both the out-migration rates and in-migration proportions for each state by age, sex, race, and Hispanic origin. First, the out migration rate for each age, sex, race, and Hispanic origin group within a state was computed by dividing the number of "inter-state migrants" moving out of the state during the period by the number of filers and dependents (i.e., exemptions) in the state at the beginning of the period.6 Then, to calculate the in-proportions for the age, sex, race, and Hispanic origin groups of each state, the out-migration rates were multiplied by a proxy estimate of the population for that year taken from the previous vintage of estimates (vintage 2002). From this, the inter-state in-migration proportion for each state was calculated by dividing the number of in-migrants by age, sex, race, and Hispanic origin for that state by the national sum for that characteristic group.

    In the production of the out-migration rates and in-migration proportions, when the number of exemptions for any age-sex-race-Hispanic origin category (which will be referred to as a cell) was low, the exemptions were combined with those of other cells in order to improve the robustness of the resulting migration out-rates or in-proportions. If a given cell had less than 30 exemptions, it was combined with the exemptions of adjacent age cells within the same sex-race-Hispanic origin group until the combined category contained at least 30 exemptions. If it was not possible to create a combined category containing at least 30 exemptions with the procedure, then cells were combined for both sexes. After this was done and the out-migration rate or in-migration proportion was calculated, each of the individual ages was assigned the rate or proportion of the aggregated age group.

    Two other aspects of the estimation of the out-migration rates and in-migration proportions should be noted. First, the individual ages in the 0-19 and 65+ age groups were assigned the same out-migration rate or in-migration proportion as the aggregated age group. Second, the age distributions of the out-migration rates and in-migration proportions by state, sex, race, and Hispanic origin were smoothed using a moving average.

    The final step in the production of the number of in- and out-migrants for each state occurred during the actual estimation process. The out-rates are applied to the estimate of the population at the beginning of the period to generate the number of state out migrants by age, sex, race, and Hispanic origin. Then, these migrants are converted into in-migrants for each state by age, sex, race, and Hispanic origin by multiplying the in-proportions for each state by the corresponding national sums.

  6. Specification of Net Military Movement

    The net movement of the military, both foreign and domestic, into each state for each year was estimated using data received directly from the armed forces and the Department of Defense. They provided yearly estimates of the station strength in each state from 2000 to 2003. The net-movement was then calculated as the difference in the station strength from one year to the next. Finally, the age, sex, race, and Hispanic origin distribution was assigned using those who reported being employed by the military in Census 2000.

The Preliminary GQ Population Estimates

Group Quarters (GQ) population change is estimated separately from the household population because of the unique character of this subpopulation and the ability to acquire direct data that reflects change in this population. The technique for estimating the GQ population for the vintage 2003 estimates started with the Census 2000 enumerated GQ population. As with the household population, the race breakdowns of the GQ data were converted from categories consistent with Census 2000 to categories consistent with 1990 through proportional allocation (see above).

Next, the state representatives who participate in the Federal-State Cooperative Program for Population Estimates (FSCPE) developed an independent list of GQ facilities in their state with the populations typically associated with them at the time of Census 2000 and annually from 2000 to 2003 from the sources available to them in their state. In turn, the Census Bureau calculated the implied change in the GQ population from the numbers provided by the FSCPE members. This change was then applied to the Census GQ base to come up with the estimate of the total GQ population in the state. Finally, these state totals were distributed by age, sex, race, and Hispanic origin using the distribution of the GQ population within each GQ type from the base GQ population.

The Final Population Estimates by Demographic Characteristic

The final steps in the production of the vintage 2003 state characteristics estimates consisted of 1) adding together the household and GQ population estimates for each year, 2) calculating the necessary proportions from the preliminary estimates and applying them to both the previously created state population estimates by age and sex and the National estimates by age, sex, race, and Hispanic origin, and 3) converting the data from the 4 race categories consistent with the 1990 census to the categories consistent with Census 2000.

In combining the household and GQ population estimates, the only caveat that should be noted is that it was assumed that there was no change to the GQ population between April 1, 2000 and July 1, 2000.

The next step in the production of the state estimates by age, sex, race, and Hispanic origin was to control the preliminary resident state estimates by age, sex, race, and Hispanic origin to both the previously created resident state population estimates by age and sex and the national resident estimates by age, sex, race, and Hispanic origin. This was done through a two-step iterative process.

In the first step, the preliminary state characteristic estimates were summed to the national level by age, sex, race, and Hispanic origin. Next, proportions were calculated by dividing each state-age-sex-race-Hispanic origin cell by the corresponding sum. Finally, these proportions were applied to the original national characteristics estimates in order to calculate an intermediate set of state estimates by age, sex, race, and Hispanic origin.

In the second step, a similar procedure was used to calculate new proportions from the transformed data so that the state age-sex estimates could be distributed by race and Hispanic origin. First, the preliminary state characteristic estimates were summed to the state level by age and sex. Next, proportions were calculated by dividing each state-age-sex-race-Hispanic origin cell by the corresponding sum. Finally, these proportions were applied to the original state age-sex estimates in order to calculate an intermediate set of state estimates by age, sex, race, and Hispanic origin.

When attempting to make estimates consistent with two different sets of data using the procedure described above, performing the second step of the iteration tends to distort the fit achieved in the first step. Likewise, repeating the first step again tends to distort the fit achieved in the second step. However, these distortions can be minimized by repeating the two-step procedure multiple times. For this reason, the iterative procedure was repeated five times. After this, the resulting set of estimates was rounded to integers.

The final step in the estimates process was to convert the estimates from the 4-race categories consistent with the 1990 census in which the estimates were processed to the 31-race categories consistent with Census 2000. To do this, the procedure used to go from 31 to 4 races was essentially reversed. First, the proportion of each 4-race category associated with the 31 race categories was calculated for both universes using Census 2000 household and GQ data by state, age, sex, and Hispanic origin. Next, the GQ population for each estimate year was subtracted from the corresponding resident population in order to arrive at the household population for that estimate year. Then, the proportions for both the household and GQ data were applied to the respective 4-race estimates for each year to arrive at the 31-category race household and GQ estimates.7 Finally, the 31-category household and GQ estimates by age, sex, and Hispanic origin were added together to arrive at the final resident state population estimates by age, sex, race, and Hispanic origin.


1 This modification has been accepted for all Census Bureau estimates produces and is explained in the document entitled "Modified Race Data Summary File Technical Documentation and ASCII Layout" that can be found on the Census Bureau website at http://www.census.gov/popest/archives/files/MRSF-01-US1.html.

2 Details about the Count Question Resolution Program can be found on the Census Bureau website at http://www.census.gov/dmd/www/CQR.htm. Errata notes can be found on the Census Bureau website at http://www.census.gov/prod/cen2000/notes/errata.pdf.

3 This step was needed since the estimates procedure produces annual estimates and since the target reference date for each estimate year is July 1.

4 A description of the methodology used to produce these estimates can be found on the Census Bureau website at http://www.census.gov/population/www/documentation/twps0064.html.

5 A description of the methodology used to produce these estimates can be found on the Census Bureau website http://www.census.gov/population/www/documentation/twps0063.html.

6 Technically, the quotient produced by this procedure is the probability of moving out of the state. However, we make the simplifying assumption that the out-migration rate is the same as the probability of moving out of the state.

7 The reason for separating the process of converting the household and GQ data from 4 to 31 races is that the GQ data are processed by GQ type so that they may be used in estimates not described in this document.