Total Population Growth Rate Forecasts
The U.S. population tripled between 1900 and 1999 as the nation maintained growth rates ranging between approximately a high of 2.0 percent and a low of .6 of a percentage point, with current rates leveling off near .9 percentage points (U.S. Census Bureau, 1999). Graph 1 presents the annual growth rate for the total population from 1947 to 1999, the respective years analyzed for this research. Analyses of how well the Census Bureau forecast the nation's growth trends are first discussed for the multiple series followed by a discussion of each individual series. As mentioned earlier, accuracy assessment is approached from two perspectives: 1) in terms of overall error in the series; and 2) in terms of duration-specific forecast error. Overall error is analyzed for the direction of error (the tendency of the forecast growth rate to generally over- or underestimate the observed growth rate, which is measured using the PE and MPE) and the magnitude of error (which is measured with the MAPE and RMSE). The duration-specific forecast error analyzes the pattern of the error throughout the forecast. Lastly, a comparison of the naïve and the forecast model will be made using the RMSE results.
Because previous authors have examined the historical performance of the forecast population growth rate, the following discussion will remain brief (Ascher, 1978; Stoto, 1983; Long, 1987). This research improves and extends existing research by: 1) evaluating forecasts that are more recent; 2) utilizing more recent national estimates and vital statistics data for the observed series; 3) comparing individual and multiple series results; 4) increasing the sample size for multiple series error statistics; and 5) calculating several statistics to compare results.
1) Overall Accuracy and Duration-Specific Forecast Error of the Population Growth Rate Forecasts
The multiple series and individual series statistics in presented in Table 2 allow for an assessment of whether the total growth rate is generally over- or underestimated by the Census Bureau. As shown in the final column of row (1), the multiple series MPEs for the annual growth rate indicate that the Census Bureau generally underestimated growth rates within the first five years (MPE= -3.8 at the fifth year). In contrast, beyond the five-year period, on average the growth rates were overestimated, as indicated by positive MPEs.
Table 3 presents the percent error occurring at designated points of the forecast period (1st, 5th, 10th, 15th, and 20th years). The wide variations between the MPE, MAPE, and MdAPE (Table 2), and the wide range between individual PEs, within each of the four target forecast periods, indicates that potential outliers influence the multiple error statistics. The PEs range between -26.5 percent (1974) and 6.4 percent (1966) at the first year and from -48.6 (1947) and 29.2 (1963) at the fifth year (n=17). This implies that the multiple error statistics are not representative of the general performance for the growth rates forecast between 1947 and 1999. Within the more recent forecast publications, the Census Bureau includes multiple series RMSE results for the growth rate of the total population as a way of addressing the uncertainty of their forecasts (U.S. Census Bureau, 1996). The RMSE results question the validity of such multiple series growth rate statistics and underscores the need to examine individual series.
An evaluation of the statistics for the individual series reveals a more complex trend of over- and underestimation. Forecasts produced in 1955 and earlier consistently underestimated growth rates. This trend reversed for series produced between 1957 and 1972. Following 1972, the growth rate for each series is again underestimated. Of the seven forecast series produced between 1974 and 1994, three series resulted in small overestimates in the first five years (MPE=3.9, 1.9, and 7.6 percent respectively). Otherwise, within and beyond the five-year period, growth rates for those series were underestimated.
For series with base years between 1947 and 1957, the accuracy improved from series to series within the first five years. Series produced in 1947 and 1949 have the largest percent errors at the fifth and tenth year period, with five year MAPEs of 31.2 percent and 18.5 percent respectively (Table 2). Series produced in 1953, 1955, and 1957 improved in overall accuracy within ten years, averaging 11.5 percent for 1955 and 15.6 percent for 1953. Series 1957 experienced the lowest MAPE of 2.0 percent within the first five years for all series. The accuracy decreased for this series throughout the remainder of the forecast period.
Forecasts for 1963, 1966, 1969, and 1970 did not generally improve in accuracy over the 1953, 1955, and 1957 series in the first five years. The 1972 series showed an improvement, but then the 1974 and 1976 series showed more error. Series 1974 and 1976 increased in error within the first five years with MAPEs of 20.8 and 21.5 percent respectively, from the improved 1972 MAPE of 4.1 percent. The increase in error and the pattern of underestimation for the 1974 and 1976 series may be the result of the error of closure adjustment made to the intercensal estimates mentioned above. When not allowing for the error of closure, Long (1987) calculated lower RMSEs for 1974 and 1976. Within the first four years of the forecast period, Long (1987) calculated a RMSE of .09 percentage points for 1974 and .18 percentage points for 1976 (Table 1). In comparison, when accounting for the error of closure, Long obtained RMSEs similar to the results presented in Table 2 (Table 1A).
The accuracy of forecast growth rates improved after the 1970s within the first five years. The MAPEs ranged between a low of 2.5 percent (1991) and a high of 9.9 percent (1986). Forecasts produced in 1982, 1991, and 1994, for the first five years improve in accuracy with MAPE values below 4 percent. Although series 1986 and 1992 maintain higher five-year MAPEs of 9.9 and 7.6 percent than those produced after the 1970s, these series still maintain lower averages than most previous series.
An analysis of the percent error in Table 3 and the statistics in Table 2 reveal that the pattern of error, the duration-specific forecast error, throughout the forecast periods did not increase linearly for each series. To the contrary, certain series both under- and overestimate the growth rate throughout the period. In addition, the magnitude of error fluctuated throughout certain series. For example, the PE changes direction throughout the forecast period of twenty years for 11 of the 17 series. In addition, the error does not generally increase in size throughout the forecast period; i.e. as the growth rate is forecast for longer time intervals, the error does not generally increase. Both the percent error statistics and the average error statistics for the individual series demonstrate this trend. The MAPEs and MdAPEs for series 1953, 1974, and 1976, among others, both increase and decrease beyond the five-year period.
2) Comparison of Growth Rate Forecast Models
Table 2 shows the results for the naïve and Census Bureau forecast model RMSE. At the fifth year period, on average the naïve model outperformed the forecast model. The RMSE of .30 percentage points at the fifth year is larger than a RMSE of .18 percentage points for the naïve models, a difference of .12 percentage points (n=17). This trend changed throughout the average forecast period. Beyond five years, the disparity between models diminished and the performance of the naïve model deteriorated more than that of the forecast model. At ten years, the difference decreased by -.05 percentage points (n=13). At twenty years, the trend reversed and the RMSE for the naïve model increased to .46 percentage points compared with a smaller forecast RMSE of .43 percentage points (n=10).
Individual series analysis indicates that the naïve model generally outperformed each forecast model with exception to 1955, 1957, and 1963, throughout most of the twenty-year forecast period. Within the first five-year period, the RMSE for the forecast model was smaller than or equal to the naïve model 8 out of the 17 series (47.1 percent). Of the 51 points compared for all series combined, the naïve model outperformed the forecast model 32 times (62.8 percent). Nonetheless, approximately half (51.0 percent) of the 51 comparison points maintain differences smaller than .10 of a percentage point.
Recent forecasts indicate an improvement in the Census Bureau forecast model for short term (5 years) over the naïve model. The series 1982, 1991, 1992, and 1994 model outperformed the naïve model within the first five years with very small RMSEs ranging between .03 percentage points and .08 percentage points. Beyond five years, however, the RMSEs for the naïve model is smaller for series 1982 and 1986.
3) Summary of Forecast Error for Growth Rates
Except for the 1974 and 1976 series, the pattern of under- and overestimation and level of accuracy for the individual series are closely related to the Census Bureau's assumptions for fertility and will be discussed in detail in the following sections. The first two forecast series, 1947 and 1949, greatly underestimated the overall population growth rate as fertility rates began to rise in 1947, resulting in the Baby Boom. Short-term (five year) accuracy improved between 1953 and 1957 as growth rates remained at high levels resulting from high fertility rates. Following 1957, the growth rate began to decline, while the Census Bureau continued forecasting high growth rates. The total populations' forecast growth rates became more accurate within the recent past with average error statistics (excluding the MPE) falling below 10 percent within the first five years for the past five series as population growth stabilized in the 1980s and 1990s. The average error generally increased after the five year forecast period; however, the direction and magnitude of error did not increase or decrease in a consistent manner. Because of large outlier error terms, the multiple forecast error statistics do not represent the actual error experienced overall for the Census Bureau's forecasts. In general, the naïve model outperformed the cohort component forecast, particularly in the latter half of the forecast period. Except for the 1957 series, the naïve model outperformed the forecast model for a minimum of one point in the measured forecast periods for each series. In contrast, recent cohort component forecasts consistently outperformed the naïve model in the first five years. The overall error remained high in comparison to a naïve model until the 1980s and 1990s.
Components of Change Forecasts
Fertility Forecasts Error Analysis
Throughout the first part of the 1900s, fertility rates in the United States declined until 1946 when rates increased dramatically. Graph 2 depicts the trends of the U.S. general fertility rate (births per 1,000 15- to 44-year old women) between 1943 and 1998. Following World War II, fertility rates among American women increased from 85.9 births per 1,000 women in childbearing age to 101.9 births between 1945 and 1946, representing an increase of 16.0 births (National Center for Health Statistics, 1993). Fertility rates remained unusually high, peaking at 122.7 births per 1,000 women in 1957. After 1957, rates declined until the mid 1960s. Referred to as the Baby Boom, this historic abnormality in U.S. fertility occurred between 1946 and 1964. Subsequent to the Baby Boom, except for small increases in the later part of the 1960s into the early 1990s, fertility remained stable. After 1973, fertility rates ranged between a low of 65.2 births in 1976 and a high of 70.9 births in 1990, which is a difference of 5.7 births.
Of the three components of population change, fertility assumptions are subject to the largest levels of uncertainty. When formulating fertility assumptions as inputs for the cohort component model, demographers must attempt to forecast the trends of American women by age and in the more recent past, by race and Hispanic origin. This encompasses anticipating changes in many variables that directly or indirectly affect fertility, such as contraceptive prevalence, marital status, and female labor force participation rates. Most importantly, demographers try to anticipate potential turning points and/or the stability of the current trends.
For series produced in 1963 to 1972, the Census Bureau formulated fertility assumptions using a cohort fertility methodology as opposed to building from estimates of period fertility. That is, series were formulated based on the completed fertility of cohorts of women in childbearing ages and further adjusted for timing patterns. Timing patterns were generally based on age-specific fertility rates from past years and the average age of childbearing.3 Assumptions pertaining to the expected level of completed fertility and timing patterns did not remain consistent across products. Estimates for the ultimate completed fertility rates were generally formulated using birth expectation data from different surveys and demographic theory, such as stable population theory and replacement level fertility (U.S. Census Bureau, 1970).4
Series produced in 1974, 1976, and 1982, continued the use of the cohort fertility model; however, timing patterns used previously were replaced with assumptions about short- and long-term fertility trends. These trends were also based on survey-generated birth expectations data as well as theory. Estimates used for the fertility assumptions for 1986 and 1991 continued to be based on the cohort fertility method while using Box-Jenkins time series methods to forecast short-term trends. Production of the two latest or most recent series, 1992 and 1994, switched to a period fertility methodology and assumed that the current age and race specific fertility rates remained constant throughout the forecast period.
To calculate the number of live births for a designated forecast period, age-specific birth rates were applied to the average number of women in childbearing ages. Once calculated, the births were survived forward to account for infant mortality. The number of births was summed for each calendar year. The crude birth rate is defined as the number of births per 1,000 people occurring within a calendar year.
1) Overall Accuracy of Fertility Forecasts
According to the MPE for the multiple series, the Census Bureau consistently tended to overestimate the fertility of American women with the absolute level of error decreasing in the 1990s. Tables 4 and 5 show that multiple series MPEs for the number of births and the crude birth rate never fell below 12 percent. Within the twenty-year forecast period, the average error falls to percentages below the average error experienced within the first ten-year period. The MAPE for births increased from 13.9 percent within five years, to 28.3 percent and 29.4 percent at the tenth and fifteenth forecast period years, followed by a decline in the average error to 26.8 percent within 20 years. The average errors for crude birth rates are generally smaller than those experienced for the number of births.
Examination of the individual series forecasts for births and the crude rate display a consistent trend of overestimation until series 1982. Graph 3 displays the estimated or actual crude birth rates and the forecast crude rates for each series. According to the average statistics for the number of births (Table 4); the series produced from 1963 to 1972 greatly overestimated the number of births in comparison to later series. The series with the largest error during the first five years, 1970, experienced a MAPE of 29.0 percent. This error increased to 37.1 percent during the ten-year period and 39.4 percent within fifteen years. MAPEs for the remaining series (1963, 1966, 1969, and 1972) ranged between 12.5 and 17.6 percent during the first five years and 20.3 and 37.1 percent within ten years. The series for 1972, however, did not increase as rapidly with an average error remaining between 18 and 21 percent throughout the period. Series 1963 and 1966 experienced the largest MAPE statistics, 42.9 percent and 46.2 percent respectively, for long term forecast periods (15 and 20 years).
Table 6 shows that PEs for the first year of forecast births and rates for 1966, 1970 and 1972, were larger than other series. The PE in the first year for 1972 of 10.7 percent (CBR=11.3) and for 1970 of 8.6 percent (CBR=9.1) indicate that these series began with inadequate base data. In addition, 1970 represents a turning point in fertility trends as the number of births declined from 1970 to 1973. Each forecast with base dates before 1974 failed to incorporate the decline and subsequent stability in fertility patterns seen throughout the early and mid-1970s.
After 1972, forecast error for the number of births decreased substantially from previous series, with continued improvement in the recent past. During the first five years, the MAPE for series produced after 1972 ranged between a low of .5 percent (1991) and a high of 8.3 percent (1986), and within ten years 4.0 (1982) and 9.3 percent (1986). The lowest error was experienced throughout all periods by the 1991 and 1994 series. Within five years, series 1991 had a MAPE of .5 percentage points and 1994 a MAPE of 0.9 percentage points.
2) Duration-Specific Forecast Error for Fertility
Graph 4 shows the multiple MAPEs for each component of population change for the twenty-year forecast period for each single year. This MAPE represents the average absolute error occurring on the specific year of the forecast period. Error for the number of births increased throughout the first 9 years and began to stabilize past 10 years. The average error for the crude birth rate stabilized and actually declined after ten years. This trend is attributable to specific series included with the later forecast periods and the actual trend of fertility. Specifically, series 1972, 1974, 1976, and 1982 first overestimated fertility. Later in their respective forecast periods, these series then underestimated fertility. The series underestimated fertility as the observed number of births increased in the 1980s. Therefore, because observed fertility trends increased during the 1980s and particular series forecast an eventual decline in the long term (with forecast periods falling within this time interval), the referenced series average error statistics decreased later in the forecast period. In contrast, the early series, 1963 to 1969, consistently overestimated fertility during a period of decline following the Baby Boom.
3) Comparison of Fertility Forecast Models
Analysis of the RMSE for the multiple series statistics indicates the naïve model forecast the number of births and the crude rate more accurately (Tables 4 and 5). In addition, the values for the naïve model RMSE remained at least 40 percent smaller for the number of births than the Census Bureau forecasts throughout the forecast period. During the first ten years, the multiple series RMSE for the forecasts was 1.2 million births (CBR RMSE=5.0), in comparison to 495.1 thousand births (CBR RMSE=3.0) for the naïve model. The large disparity continues throughout the twenty-year period, with the naïve RMSE remaining smaller than the average error experienced in the first five years of the Census Bureau forecast series.
Before the 1974 series, the naïve model outperformed each forecast series for births and the crude birth rate. The RMSEs for the naïve model never fell below 84.8 thousand for the number of births, maintaining high levels of error for each series. Within ten years, the naïve RMSE ranged between a low of 235 thousand births per year and a high of 604 thousand births. In reference to recent forecasts beginning in 1974, the forecast model outperformed the naïve model for the number of births. Of the 16 points measured throughout the periods of the remaining seven series following 1972, the forecast RMSE was smaller than the naïve RMSE at 11 points (68.8 percent) of the targeted forecast periods. The assumptions made for the 1976 series consistently outperformed the naïve model throughout the entire twenty-year period. A constant forecast of births or birth rates for the 1986 series, however, would have performed better. In contrast, the naïve model for the crude birth rate outperformed the Census Bureau forecast in general. Of the 16 points observed as of 1972, the RMSEs for the crude rate naïve model were greater than forecasts for only six points compared with eleven.
4) Summary of Forecast Error for Fertility
The Census Bureau remained extremely optimistic about fertility trends remaining at levels experienced during the Baby Boom from 1963 to 1972, despite the continued decline experienced following the peak in 1957. Error decreased for series 1974 and 1976 because of two main factors. The 1974 series reduced the number of alternate series from four to three, resulting in one middle series with a lower completed fertility of 2.1, compared with an average of 2.5 and 2.1 for 1972. In addition, the number of births that actually occurred began to increase in the long-term forecast period. The 1976 series improved over the 1974 series by further reducing the short-term assumptions. In addition to a general improvement in the level of accuracy, the 1974 forecast began a trend of outperforming the naïve model of constant rates, with exception to the 1986 model.
In contrast, the 1982 and 1986 series were conservative and resulted in underestimating births. Series 1982 continued the use of the cohort fertility approach, while the 1986 series used a Box-Jenkins time series model for short-term forecasts. The completed fertility level was further reduced to 1.9 for 1982 and 1.8 for 1986. Following the 1990 turning point, the number of births remained stable. Accuracy improved for series 1991, which continued the use of the time series model, increased the completed fertility to 2.1, and abandoned the racial convergence assumption, among other changes. This stability, combined with improved assumptions, permitted a more accurate forecast for those series produced within that decade. High levels of accuracy for short-term forecasts were duplicated for the 1994 series, which abandoned the cohort fertility method and assumed constant trends among the largest racial groups.5
The results of the comparison between forecast models differed for the number of births and the crude rate. The Census Bureau forecasts for the number of births were more accurate in the recent past. This is not necessarily true for the crude rate forecasts.
In summary, accuracy for the number of births improved in the recent past. Improved accuracy, however, does not seem to be explicitly determined by the different approaches toward deriving forecast assumptions (cohort vs. period) used to forecast short-term trends.
Mortality Forecasts Error Analysis
Mortality rates decreased consistently throughout the 20th century as life expectancy at birth increased from 47.3 years in 1900 to 77.0 in 1999, an increase of 29.7 years in approximately 100 years (Anderson, 1999; U.S. Census Bureau, 2000b). Graph 5 displays the observed and forecast crude death rates from 1964 to the present. Crude death rates generally decreased throughout the 1960s and 1970s, falling from 9.4 deaths per 1,000 people in 1964 to 8.6 deaths by 1977, a time span of 13 years. Following 1977, the rate remained stable, ranging between 8.5 and 8.8 deaths for 21 years. As rates stabilized or decreased, the base population continued to grow in size, resulting in an increase in the number of deaths. The number of deaths steadily increased from approximately 1.8 million in 1964 to 2.4 million in 1999. Graph 6 displays the observed number of deaths from 1964 to 1999. Between 1964 and 1983, the number of deaths increased from 1.8 to 2.0 million. Beyond 1983, the number of deaths increased to 2.4 million. These trends differ by age, sex, race, and Hispanic origin at the national level (Anderson, 1999). For the purposes of this research, only the forecast number of deaths and the crude death rate for the total population will be examined.
To forecast trends in mortality, age-specific death rates and survival rates are used as inputs to the cohort component model to survive the population forward. Rates are generally calculated by single year of age, sex, and more recently race and Hispanic origin. Mortality forecast assumptions formulated between 1963 and 1986 depended on life tables created by the Social Security Administration and were adapted to the needs of the Census Bureau. Before 1982, one set of rates was used as inputs for the model. Forecasts following 1976 produced a low, middle, and high mortality series. For series produced in 1991 forward, the Census Bureau used its own forecast life tables based primarily on the rate of mortality change experienced in previous decades.
1) Overall Accuracy of Mortality Forecasts
Compared to births, deaths are not as numerous and exhibit less fluctuation over time. Therefore, mortality forecasts are subject to smaller numeric magnitudes than fertility and exhibit smaller summary error statistics. Tables 7 and 8 present the error statistics for the forecast number of deaths and the crude death rates. Multiple series error statistics for the number of deaths begin with a MAPE of 5.1 percent (CDR=5.6 percent) at the fifth year of the forecast period. At the twentieth year, the MAPE reaches its highest value of 12.2 percent (CDR=9.7 percent). On average, the error terms for the number of deaths and the crude rates increased throughout the forecast periods. Correspondingly, mortality trends forecast by the Census Bureau were generally too conservative and failed to adequately forecast improvements in life expectancy.
Similar to the results for the individual fertility series, the overall accuracy of the individual mortality series for the number of deaths and the crude rates improve dramatically in the recent past. Graph 6 displays the individual series forecast for deaths and the actual number of deaths. Forecasts produced in 1976 and earlier consistently overestimated deaths. Beginning in 1963, error terms generally increased within the first five years for each series, peaking at 1974 (with exception of series 1972 and 1974 beyond the fifteen year forecast period). Series 1974 was inaccurate by 9.9 percent (for both the MPE and MAPE), increasing from 1.8 percent in 1963, within the first five years. Table 9 displays the PEs for the number of deaths and the crude death rates. Again, series 1974 experienced the largest error term, with a PE of 8.2 percent at the first year for deaths and 9.1 percent for the crude rate.
Following series 1974, the level of accuracy improved. In 1976, the MAPE for the number of deaths fell to 4.6 percent during the first five years and again to .91 percentage points by 1982. Forecast deaths and crude rates produced after the 1976 series were consistently more accurate than previous series, except for 1992, which had a MAPE of 3.8 percent within the five years. The MAPE within the first five years for series produced after 1982, excluding 1992, ranged between .9 percentage points and 1.3 percent. For series 1982 and 1986 with forecast periods beyond five years, the MAPE remained near 1.0 percent and 1.1 percent.
2) Duration-Specific Forecast Error for Mortality
Multiple series error statistics increased throughout the forecast period for both the numbers of deaths and the crude death rates. The crude rate, however, accumulated less error throughout the forecast period. (This can also be witnessed for individual series.) Graph 4 shows the multiple MAPEs for each component of population change for the twenty-year forecast periods by single year. The MAPE remains stable after ten years for both deaths and the crude rate. Within ten years, the crude rates demonstrated lower average error statistics, increasing the gap between the MAPEs for the number and the rate of deaths as the forecast periods lengthened.
The duration-specific forecast error for individual series deaths generally increased throughout the forecast period, with exception to 1974 and 1986. In contrast, crude rate forecasts with periods fifteen years and longer, the average error declined at twenty years for series 1966 and 1969. Series 1974 and 1982 experienced smaller averages within fifteen years than ten years, followed by an increase within 20 years for 1974.
3) Comparison of Mortality Forecast Models
A comparison of the multiple series forecast and naïve models RMSE indicates that the naïve model outperformed the forecast series throughout the entire forecast period for both the number of deaths and the crude rates. The difference between the two models' RMSEs diminished within the twenty-year period for the forecast number of deaths and the crude rate, with the Census Bureau forecast outperforming the naïve model within 20 years for deaths. The multiple series forecast number of deaths RMSE of 265.5 thousand is smaller then the naïve RMSE of 278.9 thousand. In contrast, the naïve model multiple series RMSE for the crude rate outperformed the forecast series by .19 deaths per 1,000 people at twenty years.
For the individual series forecasts, the naïve model of a constant number of deaths and crude rates outperformed the forecast series for every series with exception to 1982, 1986, and 1991, and long-term forecasts for 1963 and 1966. Naïve models for series 1974, 1976, and 1986 produced RMSEs below 50 thousand deaths throughout the entire forecast period and were superior to the performance of Census Bureau forecasts. Within five years, the naïve model RMSE for 1976 averaged 19.6 thousand deaths, the lowest RMSE reported for deaths.
4) Summary of Forecast Error for Mortality
Beginning in 1963, the Census Bureau generally underestimated improvements in life expectancy. Particular forecasts produced after 1976, in contrast, slightly overestimated improvement. Forecasts produced between 1963 and 1974 gradually increased in error, highlighting a trend of the Census Bureau's historically conservative approach toward forecasting improvements in life expectancy. Recent forecasts experienced superior performance. This improvement in accuracy may be indicative of the stabilization of mortality trends beginning in the late 1970s. In addition, the Census Bureau began producing a middle series mortality assumption for the 1982 series; potentially further contributing to the overall level of mortality forecast accuracy. Similar to fertility, the error terms for the number of deaths are slightly larger throughout the forecast period than those for the crude rate as they are more dependent on the size of the forecast population. Multiple series forecast error generally increased throughout the forecast horizon, stabilizing after the 10th year of the forecast period. Lastly, except three series, the naïve mortality models outperformed the Census Bureau forecasts. In comparison to fertility, the most recent forecasts, series 1992 and 1994, did not exhibit superior performance relative to the naïve model.
Net Immigration Forecasts Error Analysis
Net immigration for the United States is largely determined by domestic policy and the type of immigration occurring at any given point in history. For example, over 80 percent of the current number of immigrants entering the U.S. in 1999 were attributable to family reunification policy and of immigrants with refugee status (Kramer, 1999: pg. 2). In addition, the types of immigrants are controlled through bureaucratic and/or political means. During the 1970s, however, research found that the number of undocumented immigrants increased dramatically (Passel and Woodrow, 1987). This increase remains at levels researchers are unable to directly determine. The Census Bureau's current knowledge of net immigration is dependent on legal immigration data from the Immigration and Naturalization Service (INS). Given the limitation of data on the current level of net migration and the inability to predict domestic and international policy, forecasts of this component are especially problematic.
Consequently, the historical forecasts for net immigration have remained conservative. Except the most recent release in 2000 and the 1986 series, net immigration was assumed to remain constant throughout the forecast period for each series. Graph 7 depicts the observed and forecast crude rate of net immigration for each series produced as of 1963. The forecast number of immigrants was applied each year as a constant number with a constant age and sex distribution. Recent products assumed separate distributions by age, sex, race, and Hispanic origin. Characteristics of the net immigrant populations experienced around the time of the base year generally represented the forecast distributions.
As a result of these complicating factors and those mentioned above in relation to emigration, undocumented immigration, the change in the universe, serious limitations to the evaluation of the accuracy of net immigration forecasts exist. Nevertheless, it may still be profitable to examine these data at some level to further understand how they affect results of the forecast and inform us about trends. Analysis of the immigration component for this report is conducted at a general level.
1) Overall Accuracy of Net Immigration Forecasts
The forecast number of immigrants and the net immigration rate are consistently underestimated in each forecast and the magnitude of error for both variables is larger than either components of population change discussed previously (Table 10). For multiple series error, the MPE for the number of immigrants is underestimated by -21.0 percent at the fifth period year (Table 11). The RMSE at five years is 189.2 thousand immigrants. At the tenth year, the MPE increased to -36.5 percent and -50.2 percent at twenty years. The number of immigrants and the rates' MAPE statistics correspond with the MPE statistics.
Among individual series forecasts, the overall accuracy of series 1976 demonstrated the worst performance and series 1966 performed the best. The recent series for 1991, 1992, and 1994, are more accurate within the first five years than past forecast series. The average error within the first five years for series 1992 had the smallest MAPE of 5.5 percent. The PEs for the first year of the forecast indicates that the base number of immigrants used to create the forecasts is often of poor quality. Table 10 displays the PE for both the crude rates and the number of immigrants. PEs for the number of immigrants range between -0.3 for the 1992 series and -24.0 for 1982. Of the twelve series in the first year, only five series experience PEs below 10 percent.
2) Duration-Specific Forecast Error for Net Immigration
As the number of immigrants increased throughout 1963 to 1999, the forecast individual series for constant numbers and rates of immigrants resulted in increasing error throughout the forecast period. As previously stated, the multiple series MAPE began at over 20 percent at the fifth year (n=13) and increased to over 50 percent at the twentieth year (n=6) for multiple series error. Graph 4 displays the MAPE by single year for each component. The MAPE for both the number and rate are larger throughout the entire forecast period than the error for fertility and mortality. A large proportion of the error occurred between the first and ninth year, increasing from approximately 10 percent to over 35 percent, a 25 percentage point increase. For individual series, the MAPE within twenty years ranged between a low of 21.9 percent for 1966 and a high of 41.8 percent for series 1976 (n=6).
3) Comparison of Net Immigration Forecast Models
For multiple series error statistics, the naïve model outperformed the Census Bureau forecast model. At the tenth year of the forecast, the RMSE for the naïve model of 244.0 thousand was smaller than the Census Bureau RMSE of 321.8 thousand immigrants. Series 1974, 1991, 1992, and 1994 are the only forecasts that outperformed the naïve model (with exception to 1970 within the first five years). For crude rates, only three series (1970, 1991, and 1992) outperformed the naïve model and only within the first five years. The naïve model is based on adjusted numbers for net undocumented immigrants and emigrants in the 1970s and afterward. Graph 8 displays the multiple series RMSE for both models for the forecast crude rate of net immigration. This offers a hypothetical or possible representation of the RMSE for the Census Bureau forecasts if the base error was improved and the adjustment for undocumented immigrants and emigrants were included. With exception to the first three periods, the RMSE could be smaller for the net immigration rate as indicated by the naïve model.
4) Summary of Forecast Error for Net Immigration
Given that actual net immigration increased throughout the period between 1963 and 1999, the forecast assumptions of constant trends resulted in consistent underestimation. Error terms throughout the forecast period increased, and maintained the highest error statistics compared to the fertility and mortality forecasts throughout. Because most of the series begin with large forecast error terms within the first year, the base data used may be contributing to a large proportion of the error throughout the forecast period. Nonetheless, net immigration forecasts have improved in the recent past. This improvement is also evident when comparing the naïve and Census Bureau forecast models of net immigration. The naïve model consistently outperformed the Census Bureau forecast model, with exception to the fifth year average for 1991, 1992, and 1994, for both the number of immigrants and the crude rate. In spite of this, the naïve results are not a dramatic improvement over the Census Bureau forecasts.