U.S. Bureau of the Census
Washington, DC 20233
Population Division Working Paper No. 3
This working paper was presented at the Annual Meeting of the American
Statistical Association, San Francisco, CA, August 1993. It also appears
as Chapter 4 in the Statistical Policy Working Paper #21, Indirect
Estimators in Federal Programs, Subcommittee on Small Area Estimation,
Federal Committee on Statistical Methodology, Statistical Policy Office,
Office of Management and Budget, July 1993.
John F. Long, U.S. Bureau of the Census
1.0 Introduction and Program HistoryThe U. S. Bureau of the Census produces population estimates for the nation, states, counties, and places (cities, towns, and townships) as part of its program to quantify changes in population size and distribution since the last census. These estimates provide updates to the population counts by demographic and geographic characteristics from the last census. They also indicate the pace of population change since the last census and the relative influence of the components of population change. While the national estimates can be produced by a careful accounting system that adds annual births, deaths, and international migration to the previous year's population, subnational estimates require development of methods for dealing with the largely unmeasured component effects of internal migration. Many of these methods represent the type of small domain estimates that constitute the subject of this working paper.
1.1 Uses of postcensal population estimatesThere are five major categories of uses for the Census Bureau's population estimates: 1) Federal and state funds allocation, 2) denominators for vital rates and per capita time series, 3) survey controls, 4) administrative planning and marketing guidance, and 5) descriptive and analytical studies (Table 1). More than 70 federal programs distribute tens of billions of dollars annually on the basis of population estimates (GAO, 1990). Even more money was distributed indirectly on the basis of indicators which used population estimates for denominators or controls (GAO, 1991). Many states also use the postcensal subnational estimates to allocate state funds to counties, townships, and incorporated places within the state.
A large number of Federal statistical series including state and county per capita income, national and state birth and death rates, and county level cancer rates by age, sex, and race use the results of the postcensal estimates. While many Federal agencies directly collect time series data on events and amounts, they require annual postcensal estimates of state and county population to produce per capita rates. These series provide an indication of national and subnational trends for fertility and mortality rates, incidence of cancer and other diseases, per capita economic changes, and other social, demographic, and administrative indicators.
Population surveys require independent controls from national population estimates by age, sex, race, and ethnicity as well as data on the geographic distribution of the population by states and selected metropolitan areas. These estimates are used to weight the sample cases such that the survey results equal the postcensal estimates used as controls. Each of the major surveys conducted by the Census Bureau control to somewhat different levels of geographic and demographic detail (Table 2). There are a number of reasons to control surveys to independent estimates. They were initially instituted to reduce the variance of the survey estimates. They are also used for a number of secondary reasons: reduction in month-to-month variability of longitudinal data from consecutive surveys, partial correction for the large rates of undercoverage of surveys relative to the census, and improved consistency between different surveys and other population data series based on independent estimates.
There are numerous other administrative and analytical uses of the postcensal population estimates. They provide the only regular mechanism by which the components of population change are combined to track changes in the size and demographic and geographic distribution of the nation's population. The postcensal estimates provide essential information for administration and planning in the government and private sectors. In addition, they are used as a standard by state and local governments and the private sector in producing their own population estimates for smaller scale geography or for greater social and economic detail.
1.2 History of Census Bureau estimates programSince the early 1900s, the Census Bureau has produced national population estimates. The methodology for these estimates developed into a component method in which the measured components of population change (births, deaths, immigration, and emigration) are added to or, in the case of deaths, subtracted from the most recent decennial census to estimate the current population.
When the Census Bureau attempted state population estimates beginning in the 1940s, it faced the difficult prospect of adding internal migration to the other components of population change. Since annual measures of internal migration by state are not available, many attempts were made to develop other ways to estimate state population change.
Through 1960, the principal method (known as Component Method II) was to estimate net migration based on annual changes in school enrollment. In the 1960s, a second method was added that estimated changes in the population level rather than measuring the components of population change. This method (the ratio-correlation method) uses regression analysis that relates changes in selected independent variables to changes in state population since the last census. These independent variables come from federal or state data sources. In the 1960s, the major proxy variables were vital events, school enrollment, tax returns, number of votes cast, motor vehicle registrations, and building permits. In the 1970s, the variables for votes cast and building permits were dropped and a variable for the size of the work force was added.
As the demand for estimates spread to the county level, the Federal State Cooperative Program for Population Estimates was formed to involve state governments in a joint effort with the Census Bureau. This organization permitted the extension of Component Method II, the ratio correlation method, and a housing unit method to the county level by providing data on school enrollment and various state administrative data systems at the county level. This system permitted the flexibility of using data sets selected for each individual state.
The enactment of General Revenue Sharing created a demand for population estimates for all general purpose governments (incorporated places, towns, and townships). To estimate these subcounty areas, the Census Bureau returned to a component based method (the administrative record method) in which migration was estimated using income tax data from the Internal Revenue Service (IRS). This method required matching addresses on successive years of tax returns and calculating a migration rate based on the total number of exemptions that moved into and out of each area. The key challenge in developing this methodology was to design a suitable method of coding mailing addresses to counties, incorporated places and minor civil divisions. The result was a probability coding guide based on a question on place of residence placed on the tax returns in selected years. This methodology proved so successful that it was added as an independent method in the estimation of state and county populations as well.
2.0 Program Description, Policies, and Practices
While the national population is estimated by age, sex, race, and Hispanic Origin, the subnational population estimates vary greatly in demographic and socio-economic detail. In general, the level of characteristic detail declines as the level of geography becomes finer. Each level of geography also has its own combination of methods and input data. State population is currently produced on an annual basis by age and sex. County estimates are produced annually for the total population and, on an experimental basis, by age, race, and sex. Estimates for the total population of metropolitan areas are produced annually by summing the appropriate county data and by making adjustments for New England areas which are composed of townships rather than counties. Every other year, the Census Bureau produces total population estimates for incorporated places, towns, and townships.
The methodology for postcensal estimates varies by level of
geography with the widest array of methods used in county
estimates. This methodological discussion focusses on the county
estimates with occasional extensions to include methods specific
to states or places. Postcensal population estimates update the
last census population based on changes in the population or in
components of population change. Actual information on such
components of population change as births and deaths or on changes
in symptomatic indicators related to changes in the population
since the last census provide benchmarks to anchor the estimates.
3.0 Estimator Documentation
The art of postcensal estimation of population comes in choosing appropriate benchmarks (or auxiliary data) to use in estimating the population change since the last census. One type of benchmark data, population flow data, consists of measures of the components of population change (eg. births, deaths, internal and external migration). The other type of benchmark data, population stock data, includes indicators that are correlated with population size and uses changes in those indicators to estimate the total change in population. Methods based on each of these two classes of data are found in several variations in the Census Bureau's postcensal population estimates program.
3.1 Flow methodsFlow methods are also known as component methods. They require some estimation of each of the components of population change since the last census. In the most general form, the component method reduces to a basic accounting equation for population change.
(1) Pi,t=Pi,0+(Bi-Di+Ii-Ei)+uiPi,0 Where, Pi,t = population estimate for area i at time t, Pi,0 = population in area i at beginning of period, Bi = births in area i since beginning of period, Di = deaths in area i since beginning of period, Ii = international immigrants to i, Ei = international emigrants from i, and ui = estimator of rate of net internal migration to i.
Direct measurement is possible for some of the components of population change -- births, deaths, and some components of international migration. Vital statistics registration data do a good job of measuring the effects of natural increase but problems can arise with immigration data. Unmeasured migration across international borders is a major cause of estimation error that requires the use of assumptions about the quantity and characteristics of the population flows missed. The issue of small area estimation as defined by this report arises when there is no direct measure of the component of interest. In equation 1, the rate of internal net migration (ui) is not directly measured but must be estimated from an alternative data source devised for another purpose.
In order to simplify the task of finding a good estimator of net internal migration, we confine the use of equation 1 to the household population under 65. The population 65 and over and the population in group quarters (military barracks, prisons, college dormitories, etc.) have different patterns of migration and are better handled separately later in the process by looking specifically at data systems that explicitly measure changes in the 65 and over population and in the group quarters population.
One method for estimating the net internal migration rate (ui) for the household population under 65 uses administrative data that provide addresses for individuals at two different points in time (usually a year apart). Such data provide approximate data on inmigration, outmigration, and even area-to-area flows. While there are several potential sources of these administrative data -- changes in postal addresses, drivers license records, tax returns, and health insurance information -- the problem is to find a source that provides representative coverage and consistency in reporting and tabulation. The Census Bureau uses an administrative records method that compares tax returns from the Internal Revenue Service (IRS) for changes in filing addresses between two consecutive annual tax filings (U. S. Bureau of the Census, 1988). In the estimates process, tax returns from one year are matched with those from previous years by matching Social Security numbers of the filers. For persons with a new address, the new mailing address is coded to state, place, and county. If the state, place, or county is different from the previous year, the filer and all exemptions are classified as migrants. These data are then used to construct net migration rates for each county and place as an input to the population estimation formula. An estimate of the rate of net migration is calculated by dividing the net flow of exemptions (the tax filer plus his or her dependents) moving into the area by the number of exemptions filed in the area (See equation 2).
summation over j of (Tji-Tij) (2) ui = ---------------------------- Ti. Where, Tij = flow of tax exemptions from area i to j, Ti. = total number of matched tax exemptions living in area i at the beginning of the period.
This net migration rate is then multiplied by the initial population as shown in equation 1. A critical assumption in this method is that the population not covered by the administrative data set moves similarly to the population covered or that the uncovered population is too small to affect the results markedly. Since this assumption is especially inappropriate for the population over 65 and for certain military and institutionalized populations, those populations are handled separately as explained below. Other potential problems include the difficulty of coding addresses to geography, changes in administrative coverage over time, and the elimination of administrative data sources as governmental programs change.
A second method of estimating the net internal migration rate (ui) uses school enrollment data. Changes in the size of the population enrolled in elementary and secondary school can be used to estimate the net migration of the general population. In one such method, component method II, changes in school enrollment are compared to expected changes due to natural increase alone in order to produce indirect estimates of the net migration rate of the school-aged population (Batutis, 1991). The migration rate for the total population is then estimated by adding the difference between the net migration rate of the total population and the net migration rate of the school-aged population in the most recent census.1 The critical assumption here is that the relationship of net school-aged migration and net total migration remains constant over time.
3.2 Change in Stock MethodsA fundamentally different approach to population estimates emphasizes the total change in population size since the last census rather than demographic components of change. These change in stock methods relate changes in population size to changes in other measured variables that are assumed to be correlated with population change.
The choice of possible variables is wide: number of housing units, automobile registrations, total number of deaths (and or births), tax returns, etc. Note that births and deaths in this method are not viewed as components but as indicators of the size of the population. Similarly, drivers licenses and tax returns are not used as indicators of migration as they were in the flow methods but as proxies for the size of the total population.
The U. S. Census Bureau county estimates program uses a special case of the change in stock method known as the ratio-correlation method (Namboodiri, 1972). In this method, we construct a linear regression equation for each state separately, using indicators appropriate for that state. The independent variables are ratios of the proportion of each indicator that is located in a given county in the state as of the date of the most recent census to the comparable proportion at the time of the prior census. The dependent variable is the ratio of the proportion of a state's population in a given county in the most recent census to the comparable proportion in the prior census. The resulting regression parameters(kappa,alpha,Beta,Gamma) are then used to estimate postcensal county populations in equation 3.
Pi,0 Ai,t/As,t Bi,t/Bs,t Ci,t/Cs,t (3) Pi,t= Ps,t ------[k+a---------+b----------+y---------] Ps,0 Ai,0/As,0 Bi,0/Bs,0 Ci,0/Cs,0 Where, Ps,0 = state population in the last census, Ps,t = independent estimate of current state population, As,0,Bs,0,Cs,0 = indicator variables for state total at date of last census, Ai,0,Bi,0,Ci,0 = indicator variables for county i at date of last census, As,t,Bs,t,Cs,t = indicator variables for state total for estimate date, and Ai,t,Bi,t,Ci,t = indicator variables for county i for estimate date.
The key assumption in this method is that the relationship among geographic units between change in population and change in the selected indicator variables remains constant over time (Tayman and Schafer, 1985). Complications also arise if indicator variables change over time in selected areas for reasons unrelated to population -- for example, changes in the tax law, changes in general fertility rates, increases in automobile registrations per person, etc.
Another population stock method used to estimate the ratio of the current population to the household change is the housing unit method. In this method, tax rolls, construction permits, certificates of occupancy, or utility data could be used to calculate changes in the number of housing units in an area (Smith and Mandell, 1984). In the Census Bureau's methodology the housing stock from the last census is updated using data on housing construction, demolitions, and conversions (Eq. 4).
(4) Ui,t= (Ui,0+Vi-Wi) Where, Ui,0 = housing units in area i in the last census, Ui,t = estimated housing units in area i for estimate date (t), Vi = housing units constructed in area i since last census, and Wi = housing units in area i demolished since the last census.
The number of households in area i for date t is estimated by multiplying the estimated number of housing units at time t by an updated estimate of the occupancy rate for area i at time t. By assuming that the local occupancy rate changes as the national rate, we can update the area's rate by multiplying the occupancy rate for area i at the time of the census by the ratio of the national occupancy rate at time t from the Current Population Survey (CPS) to the national occupancy rate at the time of the census.
Hi,0 H.,t/U.,t (5) Hi,t= Ui,t ------ ----------- Ui,0 H.,0/U.,0 Where, U.,0 = national housing units in the last census, U.,t = national housing units for estimate date, Hi,0 = households in area i in the last census, Hi,t = households in area i for estimate date, H.,0 = national households in the last census, and H.,t = national households for estimate date.
Finally, the population for the area i is calculated by multiplying the area's household estimate by an updated estimate of population per household. Again we assume that the area's population per household from the last census can be updated by multiplying by the ratio of the national population per household from the CPS to the national population per household in the last census.
Pi,0 P.,t/H.,t (6) Pi,t= Hi,t ------ ----------- Hi,0 P.,0/H.,0 Where, Pi,t = estimated population in area i, P.,0 = national population in last census, and P.,t = national population at the date of the estimate.
All of the methods discussed so far refer to the household population under 65. The two other segments of the population, the population 65 and over and the group quarters population, are measured by their own specific change in stock methodologies. Since these two groups have unique characteristics (especially in terms of their migration patterns), we use administrative records systems that are unique to each of the two groups. The population over 65 is estimated by using changes in the medicare population since the last census as a direct measure of the change in the population 65 and over. No such nationwide systems exists for the group quarters populations (defined for estimates purposes as the population in military barracks, college dormitories, prisons and other institutions). Changes in these population since the last census are obtained from an inventory of major group quarters locations that is maintained and annually updated by a special data collection process in the Population Estimates Branch of the Population Division in cooperation with state agencies affiliated with the Federal-State Cooperative Program for Population Estimates.
3.3 Combined methodsThe U. S. Census Bureau's postcensal population estimates program combines methods in two ways. Within each level of geography (states, counties, and places) several of the above methods are combined (Table 4). Since certain methods represent given subpopulations better, a combination of methods may be viewed as more robust -- less likely to change due to extraneous factors that might affect one or the other of the estimates. There is a further mixing of methods since the estimates at each level of geography are controlled to the results of the estimates made at the next higher level of geography.
The methodology for making state estimates during the 1980s averaged the results of the administrative record method with those of the composite method. In the composite method, the population is divided into three age groups, each of which is estimated by a separate method. The population under 15 is estimated using changes in the levels of school enrollment (similar to Component Method II). The population ages 15-64 is estimated by a ratio- correlation method in which the independent variables are tax returns, school enrollment, and housing units. The population over 65 is estimated using a method in which changes in the number of persons on medicare since the last census date are added to the population aged 65 and over at the last census (U. S. Bureau of the Census, 1984). The total state population by age is then controlled to equal the estimated national population age structure.
Annual county population estimates are produced independently for each state to coincide with the state's total population estimated above. A distinct methodology for each state is developed in consultation with that state's member of the Federal-State Cooperative Program for Population Estimates. In most cases, it consists of the average of two or three of the methods described above: the administrative records method, component method II, and the ratio-correlation method. Moreover, within the ratio-correlation method, different states use different independent variables which may include school enrollment, tax returns, medicare enrollment, automobile registrations, births, deaths, dummy variables for county size, or other state-specific data series. Additional adjustments are made for changes in selected military and institutional populations and for changes in the population over 65. Final results are controlled to the state population estimate produced by the Census Bureau using a uniform method across all states (van der Vate, 1988).
Place estimates use a strict administrative record methodology where migration is based solely on the migration rates derived from changes in addresses on tax returns. The only other adjustments for place estimates are for changes in selected military and institutional populations and a final control to county level population estimates (U. S. Bureau of the Census, 1980).
The estimation process demands continuous vigilance. Methods that
appear to work well at the beginning of a decade may be
unsatisfactory later in the decade. Only constant testing, data
evaluation, quality control, and checks for reasonableness can
ensure a sound program of population estimation.
4.0 Evaluation Practices
Whatever the method of estimation chosen, a number of considerations should be kept in mind. No matter how sophisticated the methodology, the estimate will only be accurate if the underlying assumptions hold and the input data are reliable. Many things can happen to endanger these conditions. For example, the relationships that held between variables in a previous decade might no longer hold in the current decade. The data series that one is depending upon to update the population may deteriorate or fail to measure the same underlying phenomenon as conditions change. Even if the administrative or other indicator data measure the population well, there may well be problems of geographic coding that fail to assign the population to the correct geography.
Finding an appropriate yardstick against which to measure the postcensal population estimates is difficult. During the decade, aside from special censuses for a handful of places, there are no suitable numbers to compare to the estimates -- thus we know little about the short run accuracy of population estimates. We can only measure their accuracy at the extreme end of their range (after 10 years) using the next decennial census. Even here, the changing level of coverage between censuses for any given area can lead to imprecision in our measurement of estimates accuracy. Using the results of the 1980 and 1990 censuses as enumerated, the Census Bureau evaluated the accuracy of the population estimates program. The results (summarized in Table 5) show that population estimates made for the nation, for states, and for counties were reasonably accurate, but that estimates made for small places were quite inaccurate. Estimates for places under 5,000 had a mean absolute error of more than 15 percent while places over 50,000 had a mean absolute error of less than 5 percent.
The last two columns in Table 5 present a more telling comparison. Column two compares the 1990 census and the provisional 1990 postcensal estimate while column three compares the 1990 census with the 1980 census. For most levels of geography the postcensal population estimate provides a far more accurate estimate than simply holding the population constant at the level of the last census. For example, state postcensal estimates had an mean absolute error of only 1.5 percent, while holding the last census constant would give an error of 10.0 percent. On average, the estimates methodology is also much better than using the last census for counties and incorporated places over 5,000 population. However, for many incorporated places under 5,000, holding the population constant at the 1980 level would have given more accurate results that did our postcensal estimate program.
These inaccuracies for small places may be due to a number of sources: The problem of coding administrative records to small units of political geography, the greater importance of migration in population change for small areas, and the greater likelihood that the broad assumptions that might apply on average for larger areas would not apply to small localities with very specific characteristics. Since the Census Bureau is required by law to produce data for all incorporated places and townships, we will need to show places under 5,000 as well as the larger places for which we can produce good estimates. However, it is incumbent on us to show the uncertainty in the estimates for small areas in future publications in addition to making continual progress in refining and improving our estimates methodologies and data bases.
Many of the problems of the current population estimates system
are the results of its past success and rapid growth during the
1960s and 1970s. Each new program, each expansion of
characteristic detail, each reduction in the size of geographic
unit has been accompanied by new data sets, by new methods, and by
new production procedures. Although the Census Bureau has done a
good job of meeting users expectations as these demands have
increased, there is room for improvement in the estimates
methodology and operations.
5.0 Current Problems and Planned Activities
We have embarked on a set of seven initiatives to revamp the population estimates program and lead it into the next century. These initiatives fall under the following headings: 1) defining the mission, 2) methodological integration, 3) input data quality, 4) geographic flexibility, 5)characteristic detail, 6) analysis of trends, and 7) production efficiency.
5.1 Defining the MissionThe products currently estimated by the Census Bureau's Population Estimates Program are the results of opportunities and legislative requirements over a period of three decades. We plan to reexamine the demands for and uses of population estimates. A thorough study of the needs for population estimates and the Census Bureau's proper mission in filling those needs is an initial priority. We are currently polling a number of our users -- Federal government agencies, the Federal-State Cooperative Program members, private data vendors, and a number of other groups to ascertain their needs for population estimates.
Some of the suggestions received so far involve modifying the population estimates program in order to produce more detailed characteristic information at the state and county level. We hope to produce age, sex, race, and Hispanic Origin data for counties. With more research, we may also be able to produce the county-level data on households -- number, size, and income -- that is currently demanded by many users. We are examining the feasibility of producing estimates for larger places on a yearly basis and producing estimates for other subcounty geography as well -- possibilities include census tract aggregates, subareas within large cities, and (for some purposes) Zip codes.
5.2 Methodological IntegrationThe many different methods of estimating population developed over the past decades have resulted in a complex population estimates program. The need now is to integrate these disparate methods into an orderly system. Traditionally, the various estimation models used at the Bureau have been integrated by a simple averaging of the different estimates at a given level of geography and by controlling the sum of estimates at one level of geography to the averaged estimate at the next higher level.
The time has come to reexamine each set of methods for suitability as parts of an integrated, parsimonious model for producing population estimates. In order to discuss methods of integrating our current methods, it is useful to distinguish between methods that measure the changes in the population stock and those that measure the components of population change. Methods showing the change in population stock (the ratio correlation method, the medicare change methodology, and the change in group quarters population) use changes in proxy variables since the last census to produce estimates of the total net change since the last census. These methods permit the use of many symptomatic measures of population size that may not be amenable to a flow approach.
Component methods such as the administrative records method and component method II represent flow methods in which the components of population change births, deaths, international migration, and internal migration are each measured separately and added to or subtracted from the initial population. The advantage of this type of method is that it gives an estimate ont only of the population but also of the components of population change. This method provides additional information about the reasons for change, the reasonableness of the estimates, and provides inputs for population projections. Component methods are often preferable for larger areas because they use relatively accurate counts of births and deaths to compute a large part of population change. Consequently, administrative records which are often less accurate need only be used to estimate the portion of population change due to migration. Current research at the Census Bureau is underway to quantify the relative effects of errors in each component on the final population estimates. For small area, these advantages disappear and change in stock methods such as the housing unit method may be more appropriate.
As we integrate methods, we should be careful to retain the flexibility offered by multiple independent methods of estimating population. Since methodologies for population estimates are dependent upon the use of data sets collected for purposes other than population estimates, the quality and availability of a given input data set is never certain. Only with multiple methods can we be assured of the ability to produce population timely and reliable population estimates. Multiple methods also provide a necessary check on the validity of the estimates results; surprising changes in demographic trends can be checked using independent sources in order to see if the results are merely idiosyncracies of a given input data source. The existence of independent methods of estimating population could prove a distinct advantage in trying to gauge the accuracy of estimates between censuses. We should examine the potential of using measures of divergence between independent estimates to determine the reliability and degree of confidence we have in the accuracy of postcensal estimates. If three independent estimates give very close values, we should have more confidence in those estimates than if the estimates vary widely.
5.3 Input Data QualityPerhaps even more important than the type of method chosen is the choice of data set used in the estimate. Producing postcensal population estimates requires integrating traditional demographic data sets such as census results, birth and death records, and immigration statistics with nontraditional sources collected for other administrative purposes such as tax returns, school enrollment, drivers' licenses, housing construction, survey data, etc. The art of population estimation is to combine these traditional and nontraditional sources to make maximum advantage of all the data available.
The most challenging aspect of working with population estimates is the use of data sets designed and collected for administrative purposes rather than for statistical or demographic purposes. Ideally, such data sets should have universal coverage, change in direct relation with population changes, and be consistent over time in content and form. No data set actually meets these criteria. The level of population coverage is often less than 100 per cent. Programmatic changes or changes in social behavior independent of population change may affect the coverage rate. Worst of all, the administrative data set may even disappear if its programmatic need or funding disappears.
Consequently, a healthy population estimates program requires careful attention to the quality and timeliness of input data as well as to the reliability of access to the input data. This requires working with our data providers to monitor the input databases on a number of requirements including reliability, consistency, coverage, characteristic detail, and idiosyncracies produced by programmatic and other changes. It also entails work with data producers to address questions of mutual interest such as cost, confidentiality, and legal requirements for data handling. Since administrative datasets may disappear over time, work must also continue on nurturing alternative data sets to provide similar or superior data. The need for flexibility to address changing data set availability and quality is yet another argument for using multiple independent methods and data sets to provide redundancy in the estimates program.
5.4 Geographic FlexibilityLinking data on population to geography is the key to population estimation methodology. Any system for making subnational population estimates must have a credible method for developing such geographic correspondence. Population estimates are required for legally defined geographic entities such as counties and incorporated places and the estimates methodology must take these requirements into account. In the county estimates conducted jointly with states under the Federal-State Cooperative Program, we assume that the input data used in the ratio correlation methodology systems are correctly coded by county of residence.
In the administrative records method matching tax returns to determine state, county, and place migration, the Census Bureau must provide the geographic coding for movers based on the mailing addresses of filers from the IRS tax forms. In order to categorize these filers by county and place of residence, the current methodology uses a probability coding guide. With the aid of data from a residence question on the 1980 tax form, mailing addresses were categorized by P.O. name, state, zip code, and address type (street address, P. O. box, RFD route) and assigned a probability of falling within each of 3100 counties and 39000 places.
There are several problems that lead to deterioration in the coding guide over time. Some of the more obvious ones can be corrected by manual adjustments in the coding guide, eg. creation of new Zip codes or revised boundaries for old Zip codes, changes in the boundaries of incorporated places, etc. A key cause of deterioration that cannot be fixed is the change over time in the distribution of the population within a given address key (post office, state, Zip code, address type combination). To the extent that those changes in distribution cross county, town, and city boundaries, the resulting coding will be incorrect. Moreover, the probability system itself may well put individual persons in the wrong county or place. We know little about how these errors propagate through the system after several years and multiple migrations.
The Census Bureau is currently developing a new geographic coding system that permits frequent updating and, if possible, exact matching rather than probability matching of addresses to geography. The system is based on the "master address list" proposed by the Census Bureau's Geography Division as an outgrowth of the development of the TIGER digitized mapping project and the "address control file" created for use with the 1990 census. This system would provide an annually updated digitized data base that could place most addresses in the United States into the appropriate census block (and thus into any unit of geography that also has its boundaries in the TIGER system). In the estimates area, we are exploring the feasibility of developing a coding system that would code street addresses to subcounty areas using such a master address list. The existence of a continuously updated master address list could provide far greater geographic detail, ease of updating and correcting for boundary changes, and flexibility in dealing with changing geographic concepts and shifts in population distribution.
This methodology also provides the promise of a far greater benefit in the future. The ability to provide exact matching based on geography might one day permit the matching of records on the basis of address rather than an identifier such as social security number. Such an ability would provide the opportunity to bring far more information sources to bear on the estimation effort.
5.5 Characteristic Detail
Another major area for innovation is the expansion of data on population characteristics -- both demographic characteristics such as age, race, and sex and social/economic characteristics such as household structure and income. In order to get a better hold on the demographic structure of substate areas and to use as a denominator in calculating incidence rates, there is a major increase in the demand for age, race, and sex distributions at the county level between censuses. These data are not available from the IRS tax records that form the principal part of our administrative records processing. Consequently, we are developing alternative methods to provide these data for counties and large places as an integral part of the estimation process.
We have experimented with a number of possible approaches. One of these experimental programs developed a projected estimate by which county trends in migration by age, race, and sex from the previous decennial census were extrapolated into the current decade, added to actual birth and death rates to produce a population by age, race, and sex that was then controlled to the official estimate of total population for a county. Another experimental program extends the current administrative record method by adding information by age, race, and sex from Social Security records to a sample of IRS returns to provide internal migration data for states and large metropolitan areas. Current plans call for integrating these programs into our standard procedures by the mid 1990s.
There are also possibilities for using survey data combined with administrative data to obtain characteristic information. While matched survey and administrative data records on an individual basis may prove difficult, there have been efforts to combine data on an aggregate basis. A recent example is an analysis of internal migration that combined aggregate data from the decennial census, matched tax return migration data, and survey data from the Current Population Survey (CPS) to provide a time series of migration by characteristics for state to state flows. Research is proceeding on whether more information from surveys could be combined with the administrative record methods by either aggregate or individual statistical modeling approaches.
Another major effort is underway to produce estimates for housing units and households for survey controls to the American Housing Survey and other housing based surveys. This program uses data on additions and deletions from housing stock to update the housing inventory from the decennial census. While this method is similar to the housing unit method for population estimates described above, the resulting housing unit estimates are used directly as survey controls rather than only used to estimate population.
There is also the potential for integrating more administrative data into the estimates procedure. A number of federal, state, and even private data sets have been suggested. Possible data sets include state tax data, post office change of address forms, state drivers license information, food stamp enrollment information, utility hookup records, and telephone directory information. These and other data sets will be explored for their potential utility for making subnational estimates assuring proper attention is given to protection of privacy and proper disclosure safeguards.
5.6 Analysis of TrendsA prime advantage of the population estimates programs is its information on the changes in spatial population distribution between censuses. While the Census Bureau has put great emphasis on the production of estimates for individual states, counties, and places, we have only occasionally provided the summary information on the broader trends in population redistribution. An analysis of population redistribution trends between cities and suburbs, high and low density areas, areas of high and low unemployment, and other analytical categories should be an annual part of our activities. In order to do this, a simple first step is to classify counties by relevant analytical characteristics so that such summaries could be a standard part of our processing. In addition, we plan an annual analytical report on population distribution trends based on the entire range of population estimates.
Much of the intermediate data on components of population change (migration, births, deaths, numbers of housing units, etc) used in constructing the population estimates is of analytical interest in its own right. These data should be developed as their own data products and used to provide an analytical view of the dynamics of current population change. An integrated set of historically consistent data series on births, deaths, international, and internal migration should be developed for all major geographic areas for which population estimates are produced. As a first step, we are producing a consistent time series of population counts for all counties and for cities over 25,000 from 1790 through 1990.
5.7 Production EfficiencyThe uncoordinated and erratic growth pattern in the population estimates area has had a substantial effect on production efficiency. During the 1980s, delays in production and unreliable publication dates have frequently resulted from the unwieldiness of the current production process. For the 1990s, we have streamlined the production process as a result of more parsimonious methodologies and a more focused set of products. Many of our users repeatedly tell us that it is more important to have a firm production date than to be too optimistic in our timetables. Efforts toward redesigning the estimates product have as a major goal a firm production schedule with realistic deadlines. While considerable progress has been made on this commitment, we expect to strive toward continuous improvement in timeliness as well as reliability and cost reduction.
Postcensal population estimates are an integral part of the U. S.
statistical system -- combining census results with tabulations on
vital events, providing the population controls by which household
survey results can be weighted, and producing a continuous and
up-to-date time series of changing population size and
distribution between censuses. These estimates are only possible
with the creative use of censuses, vital events, administrative
data, and other unconventional sources for estimating changes in
population on a timely basis.
As we approach the twenty-first century, the population estimates program provides an ideal starting point for an integrated demographic and social accounting system. The system already unites the decennial census and population survey results through a series of longitudinal controls. These longitudinal controls are based on previous censuses and vital events, and could be modified to incorporate measurements of undercoverage if desired. In the 1990 census, the estimates system provided substantial information for coverage improvement during the operation of the census and in evaluating coverage after the results were in. The system provides the opportunity to integrate the results of administrative records collected for other purposes to augment and improve traditional demographic data. Our efforts to integrate our geographic coding with the decennial census data base (TIGER), to maintain estimates of housing units and households as well as population, and to use data on social and economic characteristics from surveys in the estimation process take us beyond a purely demographic system to an enhanced estimates program that could eventually provide continuously updated data on many of the variables now only measured by the census. Moreover, such an integrated estimates system could provide data on the components and rhythm of population, housing, geographic, social, and economic change that no individual data source can now provide.
|Table 1: USES OF CENSUS BUREAU POPULATION ESTIMATES|
|- Survey Controls
- National Social and Economic Series
- Descriptive and Analytical Studies
- Controls for Subnational Estimates
|- Direct Federal Fund Allocation Formulas
- Indirect Federal Fund Allocation
- Denominators for Federal and Other Data Series
- Federal Regulatory Actions
- Survey Controls
- Descriptive and Analytical Studies
- Controls for Substate Estimates
|- Fund Allocation by State Governments
- Denominators for Federal, State, and Other Data Series
- Regulatory Action by State Governments
- Guides for Government and Private Sector Planning
- Descriptive and Analytical Studies
- Federal Data Series
- Controls for Subcounty Estimates
|- Federal Block Grants
- Fund Allocation and Regulatory Actions by Federal and
- Descriptive and Analytical Studies
- Government and Private Sector Planning
- Private Sector Marketing Efforts
- Base Data for Private Sector Data Development
|Table 2:||MAJOR POPULATION SURVEYS CONTROLLED TO NATIONAL, STATE AND SUBSTATE POSTCENSAL POPULATION ESTIMATES|
|Current Population Survey||Monthly civilian noninstitutional population, by age, sex, and race (white, black, other) and Hispanic for the nation and total population 16+ for states, New York City and Los Angeles.|
|Survey of Income and Program Participation||Controls derived from CPS processing.|
|Consumer Expenditure Survey||Monthly total population by age, sex and race for the nation.|
|Hospital Discharge Survey||Annual civilian population by age and sex for the nation and regions.|
|National Crime Victimization Survey||Annual civilian population 12+ for the nation and 11 states.|
|National Health Interview Survey||Civilian noninstitutional population under 18, 18-44, 45-64, 65+ and total by sex quarterly for the nation and annually for states.|
|National Hunting and Fishing and Wildlife Associated Recreation Survey||Noninstitutional, nonbarracks population by age, race and sex for the nation and states for calendar years ending in 0 and 5.|
Table 3: POSTCENSAL POPULATION ESTIMATES PROGRAM FOR THE 1990s
|Nation||Yearly||Age, Sex, Race, Hispanic|
|States||Yearly||Total Population, Age, Sex|
|Counties & Metropolitan Areas||Yearly||Total Population|
|Incorporated Places and
Minor Civil Divisions
|Even Years||Total Population|
Table 4: Estimates Methods by Geographic Category: 1980-1989
|Component Methods||Change in Stock Methods||Mixed
|Table 5:||ESTIMATES ACCURACY BY LEVEL OF GEOGRAPHY: 1970-80 and 1980-90.|
|(Mean Absolute Percent Error)|
|Unit/Size||1980 Estimates vs.
|1990 Estimates vs.
|1980 Census vs.
5,000 to 50,000
* Place estimates for 1990 are provisional (based on extrapolations of 1988 estimates).
Batutis, Michael J. 1991. "Subnational Population Estimates Methods of the U. S. Bureau of the Census," U. S. Bureau of the Census, Population Division Working Paper.
General Accounting Office. 1990. Federal Formula Programs: Outdated Population Data Used to Allocate Most Funds. September. GAO/HRD-90-145.
General Accounting Office. 1991. Formula Programs: Adjusted Census Data Would Redistribute Small Percentage of Funds to States. November. GAO/GGD-92-12.
Mandell, M. and J. Tayman. 1982. "Measuring Temporal Stability in Regression Models of Population Estimation." Demography, 19:135-136.
Namboodiri, N. K. 1972. "On the Ratio-Correlation and Related Methods of Subnational Population Estimation." Demography, 9:443-453.
National Academy of Sciences. 1980, Estimating Population and Income of Small Areas. Washington, D.C., National Academy Press.
O'Hare, W. P. 1976. "Report on a Multiple Regression Method for Making Population Estimate." Demography, 13:369-379.
O'Hare, W. P. 1980. "A Note on the Use of Regression Methods in Population Estimates." Demography, 17:341-343.
Roe, Linda K., John F. Carlson, and David A. Swanson. "A Variation of the Housing Unit Method for Estimating the Population of Small, Rural Areas: A Case Study of the Local Expert Procedure," Survey Methodology, 19:155-163
Smith, Stanley K. and Bart Lewis. 1980. "Some New Techniques for Applying the Housing Unit Method of Local Population Estimation," Demography, 17:323-339.
Smith, Stanley K. and Bart Lewis. 1983. "Some New Techniques for Applying the Housing Unit Method of Local Population Estimation: Further Evidence", Demography, 20:407-413.
Smith, Stanley K. and Marylou Mandell. 1984. "A Comparison of Population Estimation Methods: Housing Unit Versus Component II, Ratio Correlation, and Administrative Records," Journal of the American Statistical Association, 79:282-289.
Smith, Stanley K. 1986. "A Review and Evaluation of the Housing Unit Method of Population Estimation," Journal of the American Statistical Association, 82:287-296.
Statistics Canada. Population Estimation Methods: Canada. Ottawa: Ministry of Supply and Services.
Swanson, David A. 1980. "Improving Accuracy in Multiple Regression Estimates of Population Using Principles from Causal Modelling," Demography, 17:413-427.
Swanson, David W. 1989. "Confidence Intervals for Postcensal Population Estimates: A Case Study for Local Areas," Survey Methodology, 15:217-280.
Swanson, David W. and L. Tedrow. 1984. "Improving the Measurement of Temporal Change in Regression Models Used for County Population Estimates," Demography, 21:373-381.
Tayman, Jeff and Edward Schafer. 1985. "The Impact of Coefficient Drift and Measurement Error on the Accuracy of Ratio-Correlation Population Estimates." The Review of Regional Studies, 15:3-10.
U. S. Bureau of the Census. 1980. "Population and Per Capita Money Income Estimates for Local Areas: Detailed Methodology and Evaluation," Current Population Reports. Series P-25, No 699.
U. S. Bureau of the Census. 1983. "Evaluation of Population Estimation Procedures for States, 1980: an Interim Report." Current Population Reports. Series P-25, No. 933.
U. S Bureau of the Census. 1984. "Estimates of the Population of States: 1970 to 1983," Current Population Reports. Series P-25, No. 957.
U. S. Bureau of the Census. 1985. "Evaluation of 1980 Subcounty Population Estimates," Current Population Reports. Series P-25, No. 963.
U. S. Bureau of the Census. 1986. "Evaluation of Population Estimation Procedures for Counties: 1980," Current Population Reports. Series P-25, No. 984.
U. S. Bureau of the Census. 1987. "State Population and Household Estimates, With Age, Sex, and Components of Change: 1981 - 1986", Current Population Reports. Series P-25, No. 1010.
U. S. Bureau of the Census. 1988. "Use of Federal Tax Returns in the Bureau of the Census' Population Estimates and Projections Program". Population Division Working Paper.
U. S. Bureau of the Census. 1988. "Methodology for Experimental County Population Estimates for the 1980s", Current Population Reports. Special Studies. Series P-23, No. 158.
U. S. Bureau of the Census. 1989. "Population Estimates by Race and Hispanic Origin for States, Metropolitan Areas, and Selected Counties: 1980 to 1985." Current Population Reports. Series P-25, No. 1040-RD-1.
U. S. Bureau of the Census. 1989. "County Population Estimates: July 1, 1988, 1987, and 1986," Current Population Reports. Series P-26, No. 88-A.
U. S. Bureau of the Census. "Population Estimates for Metropolitan Statistical Areas: July 1, 1988, 1987, and 1986," Current Population Reports. Series P-26, No. 88-B.
U. S. Bureau of the Census. 1990. "State Population and Household Estimates: July 1, 1989." Current Population Reports. Series P-25, No. 1058.
U. S. Bureau of the Census. 1990, "1988 Population and 1987 Per Capita Income Estimates for Counties and Incorporated Places," Current Population Reports. Series P-26, No. 88-SC.
van der Vate, Barbara J. 1988. "Methods Used in Estimating the Population of Substate Areas in the United States," Paper presented at the International Symposium on Small Area Statistics, New Orleans, LA, Aug. 26-27.