Evaluation of Postcensal County Estimates for the 1980s

March 1994

Written by:

Sam T. Davis

Working Paper Number: POP-WP005

Disclaimer

The views expressed are attributable to the author and do not necessarily reflect the views of the U.S. Bureau of the Census.

If you have any questions concerning this report, please e-mail a message to [email protected]. Include the name of this report and author in the body of the message.

INTRODUCTION

One of the continuing programs of the Population Division is the production of postcensal county estimates for July 1 of each year. To evaluate estimates for the 1980's we compared estimates for April 1, 1990, with the Census counts from the April 1, 1990 Census. This paper addresses several perspectives of the evaluation of the county estimates. These include 1) their overall performance for the whole nation, 2) which types of counties had better estimates, 3) whether our estimates are any better than the 1980 Census counts, 4) how well the 1990 Postcensal estimates did in comparison to the corresponding 1980 estimates, 5) how the performance of the estimates varied from state to state and among regions, 6) how the methods used affect the quality of the estimates, and 7) whether the quality of the input data has any affect on the estimates.

To do this, we examined the county estimates grouped by population size and estimated growth between 1980 and 1990. We also did the same evaluations by size and growth using the 1980 Census count in place of the 1990 Postcensal estimate. These estimates are produced in conjunction with the Federal-State Cooperative Program for Population Estimates (FSCPE). A part of the data used to make the estimates is obtained on a state by state basis, and the information available varies from state to state. Due to this, different methods are used to create the estimates. We looked at the performance of the estimates when grouped by the method(s) used. One of the methods allows for a range of variables to be incorporated into the process, and we obtained our measures by grouping states according to the variables used to make their county estimates. Lastly, in an attempt to see if there was some other means of grouping states, experienced Bureau staff members developed a subjective rating of the data obtained from each state and we computed our error measures for states grouped by those ratings.

This paper presents the analyses of the difference between the estimates and the census counts which make up the first part of the exploration of the performance of county estimates. We have for the present time largely examined each of the various dimensions independently. Further study will seek to discover interactions among the various factors we have studied. An evaluation of 1980 postcensal estimates appears in U.S. Bureau of the Census, Current Population Reports, Series P-25, No. 984.

Finally, recognizing the uncertainty behind the 1990 census leads us to at times use the term "Measures of Difference" as a semantically less judgmental expression than "Measures of Error", since ultimately we do not really know if the estimates are in error, or if it is the Census which is off the mark. Using the 1990 census as it currently stands offers its own set of problems, but we consider it to be the most appropriate standard to use for measuring the performance of these estimates.

MEASURES OF ESTIMATES PERFORMANCE

To estimate the difference between the 1990 final estimates for counties and the 1990 Census, we computed seven measures: Mean Algebraic Percent Error (MALPE), Mean Absolute Percent Error (MAPE), Weighted Mean Absolute Percent Error (WAPE), Root Mean Square Error (RMSE), Index of Dissimilarity (INDISS), Median Percent Difference, and 90th Percentile, or percent error at which 90 percent of the counties are lower. These measures, except for the MALPE, Median and 90th Percentile, are described in Appendix 1. A more rigorous presentation of these and other measures is given by Armstrong. The MALPE is simply the average of all the percent errors, and differs from the MAPE only in that the MAPE involves taking the absolute values of the percent errors, while the MALPE does not. Except for the MALPE and the 90th Percentile, we found all the measures to be highly correlated for most of our tabulations. For this reason, most of the discussion below will deal only with the familiar MAPE. Future analyses may explore the different results we obtain with some of the other measures.

ERROR BY SIZE AND GROWTH

For all 3141 counties, the MAPE was 3.6 percent. When considered by population size in 1980, there is a clear trend toward better estimates for larger counties (see Fig. 1). The 105 counties with a population size less than 2,500 had a MAPE of 7.7 while the 414 largest counties (population size greater than or equal to 100,000) had a MAPE of only 2.0. The biggest drop in MAPE was between the smallest group and the next one, the 182 counties between 2,500 and 5,000, which had a MAPE of 4.6.

The differences in MAPE by percent growth (between 1980 and the 1990 estimate) are shown in Figure 2. The lowest MAPE, 3.0, was for the 576 counties which did not change or grew less than 5%. Counties with substantial decline or growth showed greater errors. The fastest growing counties (those with growth >25%) had the highest MAPE (4.9). There are some variations from this pattern when counties are considered by size group. In most size groups, the fastest growing counties had the worst estimates. Also, for most size groups, the best estimates occurred in the counties with the smallest absolute change, either increasing or declining. In the less than 2,500 group, the worst MAPE (16.6) was for the 5% to 10% growth group. Analysis of this size group is somewhat hindered by the very low numbers of counties in each growth group. Complete results are in Table 1.

USING THE 1980 CENSUS IN PLACE OF THE ESTIMATE

We have asked ourselves the question, "Are these estimates better than a naive assumption that holds the population constant at the 1980 level?" As an alternative to making estimates of counties, we obtained the same difference measures using the 1980 census instead of the 1990 estimate. These are shown for the same size and change groups in Table 2. A broad analysis using this approach for several levels of geography appears in Long (1993).

For all size categories the estimate produced a lower MAPE than the 1980 census figure (see Fig. 3). This difference in MAPE was the most pronounced for counties 100,000 over and counties 2,500 to 5,000. The difference was the smallest for counties in the middle size ranges of 10,000 to 20,000 and 20,000 to 50,000. Looking at the various growth groups, the estimate again had a lower MAPE than the 1980 census (see Fig. 4). The fact that the difference between this figure and the 1990 Census is essentially the same as the growth between 1980 and 1990 reduces the value of examining these patterns.

Table 3 shows the number of counties in each size/growth group which have a lower Absolute Percent Error (APE) from the 1980 census than from the 1990 estimate. Of all 3141 counties, 643, or 20.5% had a lower APE with the 1980 census than with the 1990 estimate. A higher percentage of all counties in the middle size ranges (5,000 to 50,000) do better with the 1980 census than the larger counties (50,000 and over). Also, 22.9% or 24 of the 105 counties under 2,500 have a more favorable APE with the 1980 census.

Higher percentages of the counties which had small absolute percent changes (-5% to 10%) had better results with the census than those which showed higher growth or decline rates. All of the 42 counties with a 1980 population of 100,000 or more that had a lower APE with the 1980 census, had middling growth or decline.

To explore the question of how much better the 1980 census was than the 1990 estimate (in those 643 counties where the 1980 census was actually better), we looked at the mean difference in the APE of the two numbers for those counties where the 1980 census had a lower error. These means are in the last column of Table 3. A higher number means the error for the 1990 estimate was that much greater than the error for the 1980 estimate, so a higher number here means the 1980 census was relatively better. For all 643 counties where the 1980 census did better, the 9 counties that grew by at least 25% had the greatest difference in errors. This growth group had the largest "gain" for the 1980 census in several of the size groups, but it should be pointed out that the number of counties in these groups is quite small. It should also be remembered that for all 314 counties in this growth group, the estimate showed the greatest improvement. So while the 9 counties in this growth group which had better results with the 1980 Census did remarkably better when compared to the other growth groups, this is greatly overshadowed by the other 295 counties where the estimate performed better. The smallest gain was experienced by the largest counties, and by the counties with the lowest 1980-90 percent change.

COMPARISON OF 1990 ESTIMATES WITH 1980 ESTIMATES

In order to find out how well the 1990 estimates did in relation to the corresponding 1980 estimates, we computed the same error measures for the same size and growth groups using the differences between the 1980 estimates and the 1980 Census. For the 1980 results, the counties were grouped according to their 1980 size and their 1970-80 growth. Table 4 shows the complete results of the evaluation of the 1980 estimates, and Table 5 shows some selected measures for both the 1990 and 1980 estimates. For the whole nation, the 1980 county estimates had a MAPE of 4.1, one-half of a percentage point higher than the 1990 estimates, which had a MAPE of 3.6. Errors for both 1990 and 1980 showed the same general pattern, both when grouped by size and growth. The 1990 estimates had a lower MAPE for every size group, with the largest improvement appearing in the largest counties (see Fig. 5). Counties in the 100,000 and over population group had a MAPE of 2.7 in 1980 which dropped to 2.0 in 1990. An even larger drop shows up for the next largest group with a drop from 3.7 to 2.8 for the counties in the 50,000 to 100,000 group. A very slight drop of only 0.1 of a percentage point was experienced by the group of smallest counties and for the mid-sized counties in the 10,000 to 20,000 range. It is important to note that a county's membership in a specific size and growth group is determined separately for each decade, and thus the size and growth groups consist of different counties for the two decades.

All growth groups had smaller MAPE's in 1990 than in 1980. The differences were not quite as large as those for the size groups, however. The largest improvement was of a little over 0.6 percentage points by the group growing 10% to 15% (See Fig. 6). All the other groups had improvements of between approximately 0.3 and 0.4 percentage points, except the group declining by more than 5%, which had a drop of only about 0.1%.

Figure 7 shows a comparison of 1990 and 1980 estimates by region. The counties in the South had the largest drop in MAPE between the two years while Midwest counties showed little change during the period. These data, and a look at the 1980 and 1990 figures by large size groups, are in Table 6. We can see from this table that the 1990 estimates had lower errors in all regions and broad size groups except three. One large increase in error was experienced by the eight counties in New England under 10,000, and the Midwestern counties between 10,000 and 50,000 had a more modest increase. The mid-sized counties in the West also had a sizable increase. Large counties in the South experienced the greatest decrease in MAPE between the two estimate years.

ERRORS BY STATES

Besides grouping counties by the size category and growth class, grouping the counties by state and examining the measures of error by state would appear to be of some interest, particularly since the county estimates are produced through the Federal-State Cooperative Program. Table 7 shows some of these statistics for states and regions. The MAPE's for particular states range from a high of 7.2 for Alaska to a low of 1.4 for Maine. A slight geographic pattern emerges for the regions, as the Northeast has a low MAPE of 2.1. The Midwest MAPE of 3.2 is somewhat higher but still below the national figure of 3.6. The South and the West have the highest MAPE's. The Northeast had the tightest range of MAPE's, all being below the national number. In the Midwest, the lower-populated plains states of Nebraska, North and South Dakota had higher MAPE's than other states in the region. More highly-populated states such as Illinois and Ohio had much lower MAPE's. In the south, the highest MAPE's were in Virginia, Louisiana, and Florida. Virginia's high MAPE (5.1, 5th largest in the nation) was probably because of the independent cities which are estimated as counties and Florida's 7th highest MAPE of 4.9 was probably due to the very high growth rate experienced by many of its counties. The more northerly of the Southern states, North Carolina, Delaware and Maryland, had some of the lowest MAPE's, as well as Alabama. The top four MAPE's were in the West, with Alaska and Hawaii coming in as numbers One (7.2) and Three (5.8). Wyoming's number Two position is probably due to the small size of its counties, and Arizona's high MAPE of 5.5 is probably due to a mixture of its rapidly growing Maricopa and a few other counties, and its other very sparsely populated counties. An important factor to remember here is the influence of the error of the state estimate, since the counties estimates were raked to the state estimates. Wyoming in particular had a very large error as a state, which contributed to the high errors for the counties. The three Pacific states had the lowest MAPE's, Oregon with 3.0, California with 2.8, and Washington with 2.4. There was a sizeable gap between the MAPE's for these states and the next lowest state, Utah, at 3.8.

Fig. 7: Mean Absolute Percent Error, 1980 and 1990, by Region [<1.0 MB]

The question arises, however, of whether the effect of size and change for the counties of a state are more influential than presence in the state itself. It is still generally true that large counties in all states do better than the smaller counties, as do counties with low growth. The next section will look at the different methods used for the various states, variations within one method, and the states grouped by a subjective rating of the quality of data used in making its estimates.

ERRORS BY STATES GROUPED BY ESTIMATE METHOD

There are three major methods used to produce county estimates: Administrative Records (ADREC), Composite II (COMPII), and Ratio-Correlation^/1. Three states had other methods used for their counties. For most states, a combination of two or three methods were used to make county estimates. Only two states had estimates produced using only one method, and in those cases the method used was ADREC. Table 8 shows the measures of difference for the counties when grouped by the method(s) used for their estimates. In the bottom portion of the table are the same measures for counties in states using each of the major methods versus those not using that method. Of the six combinations of methods, counties where all three of the major methods were used had a slightly lower MAPE (3.3) than the largest group of counties from the 32 states where we used only the ADREC and Ratio-Correlation methods (3.5). Counties in states where we used ADREC only or a combination of ADREC, Ratio-Correlation, and some other method, had a MAPE of 3.7. The counties in the three states where ADREC and COMP II were used had a MAPE of 4.4, while the single state where ADREC was not used and only Ratio-Correlation and COMP II were applied, Alaska had the highest MAPE of 7.2.

The high MAPE for Alaska and the fact that it was the only state with estimates made without ADREC account for the marked difference between the MAPE for the group of counties using ADREC versus that for the 25 areas in Alaska where ADREC was not used. The large number of counties in 45 states where we used the Ratio-Correlation method gave estimates with a lower MAPE (3.5 v. 4.1) than the estimates for the 549 counties in the 5 states where we did not use Ratio-Correlation. The reverse was true for the 619 counties in 13 states where COMP II was used. These counties had estimates with a MAPE of 4.0, while the remaining 2521 counties in 37 states where COMP II was not used had estimates with a MAPE of only 3.5.

It is somewhat difficult to make any conclusions from these results since we are comparing groups of different counties. Comparing estimates made using different methods for the same counties, even if not all counties in the nation were estimated, would provide better grounds for judging the relative performances of the various methods.

EFFECTS OF USE OF VARIABLES IN RATIO-CORRELATION ESTIMATES

There were 45 states which had county estimates produced in part by the Ratio-Correlation method, which entails obtaining a regression equation using a variety of independent variables. A wide range of variables are used from state to state, and the number of variables ranged from two to six. To examine the effects of using different variables in the regression equations for these states, we compared the measures of error for counties in states where the variable was used with the estimates for counties in states where the variable was not used. We only did this for the six variables used most frequently, which were Medicare Enrollment, School Enrollment, Births, Deaths, IRS Returns, and Automobile Registrations. (Some of the other variables used less frequently included Voter Registration, Employment Data, Housing, and Telephone Data.) We looked at these error measures both without regard to the variable's regression weight and taking into consideration the weight of the variable in the regression equation. For the first tabulation, we included a state in the variable group if the variable was in the equation, no matter what its weight was. For the latter analysis we included the state in the variable group only if the weight was at least one-sixth the total of all the weights. This had the effect of omitting states where a particular variable had a very low weight in relation to the other variables. These results are shown in Table 9.

For the variables considered regardless of their weight, Births, Auto Registrations, and School Enrollment show the largest differences in MAPE, with the MAPE always being smaller for the group of counties using the particular variable (Fig. 8). The counties where Medicare and IRS Returns were used had estimates with MAPE's which were closer to those for counties where those variables were not used. The 1331 counties in the 23 states where Deaths data were used had a MAPE slightly higher than the counties where this variable was not used. These differences in MAPE narrowed somewhat for Births and Auto Registrations when we considered only states where the variable's weight was sufficiently large (Fig. 9). No change was seen for SchoolEnrollment or IRS Returns, but only 2 states (107 counties) used School Enrollment but with a low weight and only one state (46 counties) used IRS Returns but with a small weight. The 230 counties in the 4 states where Medicare Enrollment had a sufficiently large weight had a higher MAPE than the other counties where this variable was not used or had a weight too low to be included.

Fig. 8: Mean Absolute Percent Error, Ratio-Correlation Variables, No Restriction for Inclusion [<1.0 MB]

Since all states used at least two variables in their regression equations and nearly all included at least three, we obtained the measures of error for the most frequently used groups of two and three variables. These results are shown in Tables 10 and 11. The effect of using pairs of variables is somewhat different from the results obtained when considering single variables. The pairs for which there were the greatest differences in MAPE's between counties using the pair versus the counties not using it were Deaths and Automobile Registration, School Enrollment and IRS Returns (the pair used in the largest number of counties), and Deaths and IRS Returns. The differences in MAPE's for the counties where Medicare Enrollment and Deaths were used and where Births and Automobile Registration were used were also rather large. Where Medicare Enrollment and IRS Returns were used, the MAPE was slightly higher than in the counties where this pair of variables were not in the regression equation.

Of the twenty most frequently used groups of three variables, four groups had larger decreases in MAPE when the group was used against the counties where it was not used. Three of these groups involved Medicare Enrollment, with IRS Returns and Births, IRS Returns and Automobile Registrations, and with School Enrollment and Automobile Registrations. The other group was School Enrollment, Births, and Automobile Registrations. The most common group of three variables was School Enrollment, Births, and IRS Returns. Counties where this groups was used in the estimates also had a MAPE which was notably lower than that for the counties not using that group of variables.

Just as when comparing methods, conclusions from these tabulations for variables used in the various states' regression equations are quite difficult to make since we are looking at groups with different counties in them. It does appear that certain single variables, and particularly groups of two and three variables lead to lower errors. The interactive, and strong, effects of county size and rate of growth, however, probably mitigate some of these comparisons. Many of the variables and groups of variables are comprised of overlapping groups of states, which should have an effect on comparing their errors. Further study could include comparing each unique combination of regression variables, but this would almost have the effect of looking at each state separately. Similarly, if we stratify the counties to size and growth, then look at the different methods and variables, we are almost reduced to case-by-case situations consisting of many small groups of counties. Having several estimates for the same groups of counties using various regression variables would probably let us make more definitive conclusions. This would be somewhat hindered, however, by the fact that in many cases the selection of variables for a particular state is a result of what data are available.

Fig. 9: Mean Absolute Percent Error, Ratio-Correlation Variables, Variables Included Only If Weight Is Sufficient [<1.0 MB]

SUBJECTIVE GROUPING OF STATES

The notion of "Professional Judgement" has been brought up in the production of estimates elsewhere (Smith and Cody), and has been included in evaluating estimates. In Smith and Cody's study, Professional Judgement meant that some estimates were produced using methods or variables determined with the help of knowledge and judgement of members of the staff producing the estimates. The county estimates we are evaluating here had a rather different element of professional judgement applied to them, in that the methods and variables (for Ratio-Correlation estimates) used for the counties in particular states were determined with the aid of judgement of the Census Bureau staff involved in producing the estimates. We do not have estimates which would have resulted if these decisions had been made differently during the decade, so we cannot directly measure the effect of this judgement as it was applied. We did, however, obtain a consensus "Subjective Grouping" of states indicating the relative quality of data for the various states as recalled by two of the Bureau staff members who had been involved with these estimates throughout the entire decade. They rated each state from one to five, with one being the best data, and five being the worst. We then grouped the states by this rating and obtained the MAPE for each group. While the range of MAPE's for these groups was rather small, less than 1 percent, the highest rated group had the lowest MAPE, and the MAPE increased as the rating declined (Fig. 10). The MAPE for the group rated 5 was actually slightly lower than the group rated 4, but this is probably due to the fact that two of the states in the worst group were the two which had estimates using only the ADREC method. The data used for the ADREC method are IRS returns and are supplied on a nationwide basis rather than state by state. As such, they are the same for all states, so the judgement factor has no effect here. This slight decrease, though, indicates that the decision not to use data from the individual state was a rather sound one.

Fig. 10: Mean Absolute Percent Error, County Estimates, by Subjective Group [<1.0 MB]

CONCLUSIONS

The errors of county estimates are affected heavily by the population size of the county. Larger counties tend to have lower errors than smaller counties. Counties which grew rapidly or declined greatly during the decade had higher errors than those which experienced very low population changes. There is strong evidence that producing county estimates through some sort of procedure gives better estimates than using the previous decennial census figures. Estimates for 1990 had lower errors than those for 1980 in all size and growth groups.

Determining which estimation method produces the best estimates is not a clear cut process, since county estimates are usually produced through a combination of methods. The states where county estimates were made using a combination of all three methods had the lowest MAPE, but states where ADREC and Ratio-Correlation only were used, and where ADREC only was used, or where ADREC, Ratio-Correlation and an other method were used, had MAPE's almost as small. Obtaining estimates created using each of the methods alone, for as many states as possible, would lead to more definitive conclusions, since we would then be comparing estimates for the same groups of states.

For the 45 states where Ratio-Correlation was one of the methods used, the use of certain single variables appeared to make a difference in the performance of the estimates. Births, School Enrollment, and Automobile Registrations seemed to make the most difference in the errors of the estimates. Various pairs and groups of three variables also had differential effects on the estimates.

Finally, a subjective rating of the quality of the data supplied by the various states led to a grouping of counties where the performance of the estimates was directly related to the ranking.

More detailed conclusions will be possible after looking at more of the interactions of the various factors we have examined largely on an individual basis. Also, more definitive judgments concerning the different estimation methods will be possible when we have alternate estimates available produced using each of the methods alone, and possibly each of the possible combinations of methods.

¹ The estimates for 1990 were created using the ADREC method for counties in all states. They were based, however, on estimates for earlier years produced using the various methods mentioned in this paper.

REFERENCES

Armstrong, J. Scott. Long-Range Forecasting, from Crystal Ball to Computer. New York, John Wiley & Sons, Inc., 1977. pp. 320-333.

Long, John F. "Postcensal Population Estimates: States, Counties, and Places". U.S. Bureau of the Census, Population Division Technical Working Paper Number 3, 1993. pp. 11-12, Table, p.24.

Smith, Stanley K. and Scott Cody. "Evaluating the Housing Unit Method: A Case Study of Population Estimates in Florida". Paper presented at the annual meeting of the Population Association of America, Cincinnati, April 1-3, 1993.

U.S. Bureau of the Census, Current Population Reports, Series P-25, No. 984, Evaluation of Population Estimation Procedures for Counties: 1980, by Gilbert R. Felton, U.S. Government Printing Office, Washington, D.C., 1986.

U.S. Bureau of the Census, Current Population Reports, Series P-26, No. 87-A, County Population Estimates: July 1, 1987 and 1986, U.S. Government Printing Office, 1988.