U.S. Census Bureau
Washington, D.C. 20233
Population Division Working Paper Series No. 71
This paper reports the results of research and analysis undertaken by U.S. Census Bureau staff. It has undergone a more limited review than official U.S. Census Bureau publications. This report is released to inform interested parties of research and to encourage discussion. Presented at the Annual meeting of the Southern Demographic Association, 2002, Austin, TX.
Table of Contents
Measures of Accuracy
National and State Level Housing Unit Estimates
County Level Housing Unit Estimates
Errors by Size and Growth
Appendix: Estimates Methodology
1. MAPE of April 1, 2000 County Housing Unit Estimates by Size
2. MAPE of April 1, 2000 County Housing Unit Estimates by Change in Housing Units: 1990 to 2000
3. MAPE of April 1, 2000 County Housing Unit Estimates by Size and Growth Class
4. MALPE of April 1, 2000 County Housing Unit Estimates by Size and Growth Class
5. Accuracy of April 1, 2000 County Estimates versus Alternative Estimates
1. Measures of Error in the April 1, 2000 Housing Unit Estimates by State
2. MAPE of April 1, 2000 County Housing Unit Estimates by size
3. MAPE of April 1, 2000 County Housing Unit Estimates by Change in Housing Units: 1990 to 2000
4. County Measures of Error in the April 1, 2000 Housing Unit Estimates by State
5. Accuracy of April 1, 2000 County Estimates versus Alternative Estimates
1. Percent Differences between the Census 2000 HU Counts and the 2000 Housing Unit Estimates by State.
2. MAPE of April 1, 2000 County Housing Unit Estimates by State
3. Percent Differences between the Census 2000 Housing Unit Counts and the 2000 Housing Unit Estimates by County.
Throughout the 1990s the Population Division of the U.S. Census Bureau produced state and county level housing unit estimates. While the state level estimates were released to the public, the county level estimates were considered experimental. The county level estimates were produced using a component method whereby we begin with the 1990 Census housing unit count and update the Census count using administrative records data on building permits, mobile home shipments, and housing unit loss. The county estimates were summed to obtain the state estimates. The availability of Census 2000 data provides the opportunity to evaluate the accuracy of these estimates.
This paper compares Census 2000 results with April 1, 2000 housing unit estimates using a variety of statistical measures including the Mean Absolute Percent Error and the Mean Algebraic Percent Error. The results of this analysis will be used to inform Census Bureau analysts on ways by which the current housing unit estimates can be improved.
This report presents an evaluation of estimates of total housing units for the Nation, states, and counties produced by the Population Division of the Census Bureau. At the national and state level these estimates were released to the public every year in the 1990s except 1997 and 1999. While the county level estimates were not released to the public, they were used in an evaluation of Census 2000 housing unit coverage designed to improve the accuracy of census data (Barron, 2001). The comparison of the April 1, 2000 estimates to the April 1, 2000 decennial census counts forms the basis for this report.
The housing unit estimates were produced using an administrative records based method by adding data on building permits, mobile home shipments, and housing unit loss to the 1990 Census unadjusted housing unit count. These data were collected and processed at the subcounty level for areas such as cities, boroughs, villages, towns, and townships and then summed to the county, state, and national level. See the Appendix for a more detailed explanation of the methodology.
MEASURES OF ACCURACY
For the purposes of this evaluation, the differences between the estimates and the census counts are assumed to be due to errors in the estimates. This paper considers two aspects of quality: bias and accuracy, as measured by the Mean Algebraic Percent Error (MALPE), and Mean Absolute Percent Error (MAPE), respectively. The MALPE is simply the average of all the percent errors, and differs from the MAPE only in that the MAPE involves taking the absolute values of the percent errors. MALPE is a measure of mean bias and can be used as the basis for testing for the presence of significant mean bias (Coleman, 1999 and 2001). MAPE is a measure of accuracy. That is, it provides a measure of how "close" the estimates get to the truth, on average.
NATIONAL AND STATE LEVEL HOUSING UNIT ESTIMATES
The April 1, 2000 estimate for the Nation of 115,547,749 housing units was 0.3 percent lower than the Census 2000 housing unit count of 115,904,641. The MAPE for all states was 1.5 percent. A similar analysis of the housing unit estimates for the 1980’s showed a similar error of 0.3 percent for the nation and a MAPE for all states of 1.8 percent (Prevost, 1994). Half of all states had absolute errors less than 1.0 percent (Table 1, Col. 5). Montana had the highest absolute error of 5.9 percent while Minnesota had the lowest at 0.008 percent. The error varied by number of housing units in the state. Generally, the higher the 1990 housing unit count, the more accurate the estimate. Of the ten states with the largest number of housing units in 1990, all but two, New York (-2.0) and California (1.9), had absolute percent errors lower than 1 percent. The MALPE for all states was -0.9 percent. Thirty-three states had estimates that were below the Census 2000 count, indicating negative bias. These results can also be compared with the Census Bureau’s independently derived 1990 based population estimates which had an error of -2.4 percent for the Nation and a MAPE of 2.6 percent for all states (Davis, 2001).Table 1. Measures of Error in the April 1, 2000 Housing Unit Estimates by State
|State Name||1990 to 2000
April 1, 2000 Housing
|District of Columbia||-1.3||278489||262386||274845||-4.5|
The relationship between growth and accuracy was less clear. Some states, such as Nevada, had high growth but low absolute percent errors while other states such as Arizona had high growth and high absolute percent errors. Table 1 shows the percent change in housing units between 1990 and 2000, the 1990 Census count, the April 1, 2000 housing unit estimate and Census 2000 count, and the percent error for each state. Map 1 shows the percent error for each state.Map 1. Percent Differences between the Census 2000 HU Counts and the 2000 Housing Unit Estimates by State
COUNTY LEVEL HOUSING UNIT ESTIMATES
A comparison of the Census 2000 housing unit counts with the April 1, 2000 housing unit estimates shows that the overall MAPE for all 3,141 counties is 4.6 percent. The overall MALPE is -1.5 percent, which indicates that, on average, the estimates were below the Census 2000 housing unit counts. On average, the counties experienced a 12.0 percent change in housing units between the 1990 Census and Census 2000. For comparison, the county level population estimates had a MAPE of 3.3 percent and a MALPE of -1.6 percent (Davis, 2001).
Of all the counties, 643 had estimates within one percent of the census count. Only 359 counties had estimates that were more than 10 percent different than the Census 2000 count. 1,813 counties had housing unit estimates below the Census 2000 count, 1,327 had estimates that were higher, and one county (Potter County, South Dakota) had an estimate exactly the same as the Census 2000 count. 420 counties had housing unit loss between 1990 and 2000. For 374 (89.0 percent) of these counties, the estimates correctly indicated that the county had housing unit loss.
ERRORS BY SIZE AND GROWTH
When considered by population size (Table 2 and Figure 1), the larger counties were found to have smaller MAPEs. This is a pattern found also in prior analyses of county (Davis, 1994) and subcounty population estimates (Galdi, 1985), (Harper, Devine and Coleman, 2001).Table 2. MAPE of April 1, 2000 County Housing Unit Estimates by Size
|1990 Housing Unit
|Numberof Areas||Mean Absolute
When classified by change in the number of housing units between the 1990 Census and Census 2000 (Table 3 and Figure 2), counties with little change have lower MAPEs than counties with large changes in housing unit counts. This is a common pattern found also in prior analyses of subcounty (Galdi, 1985), (Harper et al., 2001) and county population estimates (Davis, 1994) .Table 3. MAPE of County Housing Unit Estimates by Change in Housing Units: 1990 to 2000
|Percent Change||Number of Areas||
|-4.9 to 0.1||231||3.1|
|0 to 4.9||456||2.7|
|5 to 9.9||565||2.9|
|10 to 14.9||500||3.4|
|15 to 24.9||712||5.5|
Figure 2. MAPE of April 1, 2000 County Housing Unit Estimates by Change in Housing Units: 1990 to 2000
Figure 3 shows a 3-dimensional graph of MAPEs by size and growth categories, the same categories that are used in Figures 1 and 2. Figure 3 shows that the U-shaped curves vary dramatically by size class, achieving troughs at differing growth classes. Moreover, the 100,000+ housing units size class has two troughs: at the 5-9.9 percent and 15-24.9 percent growth classes. The trend for increased accuracy as the number of housing units increases remains generally true for growing counties, but not for declining counties.
Figure 3: MAPE of April 1, 2000 County Housing Unit Estimates by Size and Growth Class
Figure 4 shows MALPE by size and growth class. Again, these classes are the same as in the preceding Figures. Figure 4 shows the origin of the U-shaped curves: growing areas were systematically underestimated while declining areas were systematically overestimated. In the growth classes of 5 percent and more, the magnitude of MALPE systematically declines in size as the 1990 housing unit count increases, which leads to MAPE’s general declines in these classes. The growth classes under 5 percent do not show this systematic decline, leading one to suspect that they were generated by different processes. These differences account for the breakdown between size and accuracy in these classes. An important interaction effect shows up dramatically: small size interacts with large growth rates to decrease MALPE.
Figure 4: MALPE of April 1, 2000 County Housing Unit Estimates by Size and Growth Class
Map 2. MAPE of April 1, 2000 County Housing Unit Estimates by State
Table 4 and Map 2 show the MAPEs of the county housing unit estimates by state. These MAPEs range from a low of 0.7 percent for Connecticut to a high of 14.8 percent for Hawaii. Connecticut, Rhode Island, New Hampshire, New Jersey, Massachusetts, and Pennsylvania have county MAPEs lower than 2 percent. Montana, Nevada, Tennessee, Arkansas, Alaska, and Hawaii have county MAPEs higher than 7 percent.
Table 4. County Measures of Error in the April 1, 2000 Housing Unit Estimates by State
|State Name||April 1, 2000
|District of Columbia||262386||274845||-4.5||4.5|
Map 3 shows the percent error for each county. Various geographic patterns may be observed. The South contains many counties with underestimates of housing units. The northeastern end of their range is in West Virginia. This range extends westward through Kentucky into Arkansas and eastern Oklahoma and southwestward to the southern tip of Texas, interrupted by a set of overestimates in the Mississippi River valley. The eastern boundary of this region is an arc through Kentucky, Tennessee and Alabama into the Florida Panhandle. Underestimates also predominate in eastern Mississippi and parts of southern Louisiana. Another region of underestimates occurs in the Mountain West, particularly in New Mexico and western Montana. Other areas of this region contain clusters of under- and overestimates. Alaska contains similar clusters of under- and overestimates. The western Great Plains generally contain overestimates. All of these areas reflect various problems with the housing unit estimation process, generally the lack or poor quality of building permit data. The overestimates in the Great Plains may reflect underestimation of demolitions.
Map 3. Percent Error in the April 1, 2000 Housing Unit Estimates by State
Because of the effort required to produce housing unit estimates based on administrative records, mainly building permits, it is worth asking whether these estimates offer any improvement over using an easier estimation method. This section compares the housing unit estimates developed from administrative records (building permit method) to two alternative sets of estimates. The first set was produced by applying the vacancy and persons per household rates from the 1990 Census to the April 1, 2000 county population estimates developed using the tax return method (population estimate method). The second set simply used the 1990 Census housing unit count as the estimate. The results of this comparison appear in Table 5 and Figure 5. For each size class, the housing unit method is preferable to the other estimates. This demonstrates the value of using a building permit based approach to estimate housing units.
Table 5. Accuracy of April 1, 2000 County Estimates versus Alternative Estimates
1990 Census Housing
|Number of Areas||
MAPE - Building Permit
MAPE - Population
MAPE - 1990
Census Housing Unit
Figure 5. Accuracy of April 1, 2000 County Estimates versus Alternative Estimates
This evaluation has found that the 2000 state and county level housing unit estimates developed from building permit, mobile home shipment, and demolition data performed with a degree of accuracy similar to the state and county April 1, 2000 population estimates produced by the Population Division of the Census Bureau. The housing unit estimates follow a pattern similar to other estimates produced by Population Division in that they tended to be more accurate for larger states and counties. The county level housing unit estimates follow a pattern similar to other estimates in that they are more accurate for areas that experienced the smallest amount of housing unit change throughout the decade. At the state level the relationship between housing unit change and accuracy is less clear.
The housing unit estimates show clear geographic variations. The number of housing units in counties were generally underestimated in large parts of the South and Mountain West, and generally overestimated in the western Great Plains. Some areas, such as Alaska contain mixtures of under- and overestimates. These areas are the most difficult to estimate. We hypothesize that the problems are due to input data deficiencies.
The comparison with estimates produced using alternative methods indicates that the building permit method performed better than the alternative methods for all size and growth categories. While the building permit based estimates are more accurate than the county population based estimates, the accuracy of the county population based estimates relies heavily on the accuracy of the vacancy and persons per household rates. The county population based method used persons per household and vacancy rates from the 1990 Census. Improvements in our ability to estimate these rates would improve the county population based estimates.
Through this analysis we have begun to look at the discrepancies between our housing unit estimates and Census 2000. Future research should focus on identifying the components of the estimates that contributed the most to these discrepancies.
Barron, W. J., Jr. 2001. "Recommendation on Adjustment of Census Counts." Memorandum to Donald L. Evans, Secretary of Commerce, March 1.
Breiman, Leo, 1999. "Random Forests-Random Features," Technical Report No. 567, Statistics Department, University of California-Berkeley.
Coleman, Charles D. 1999, "Nonparametric Tests for Bias in Estimates and Forecasts," in American Statistical Association: 1999 Proceedings of the Business and Economic Statistics Section, 251-256.
Coleman, Charles D. 2001, "Non-i.i.d. Generalizations of the Matched-Pairs t-Test," in American Statistical Association: 2001 Proceedings of the Business and Economic Statistics Section.
Davis, ST 1994. "Evaluation of Postcensal County Estimates for the 1980s", Technical Working Paper #5.
Davis, Sam T., Josephine D. Baker, Marc J. Perry, Signe Wetrogan, and Carolette Norwood, 2001. "An Early Comparison of Postcensal County Population Estimates with Results from the 2000 Census." Paper Presented at the Annual Meetings of the American Statistical Association, Atlanta, GA, August 2001.
Friedman, Jerome, 2001. "A Statistical View of Boosting," presentation made to the Joint Statistical Meetings 2001, Atlanta, GA.
Galdi, David 1985. "Evaluation of 1980 Subcounty Population Estimates, U.S. Census Bureau." Current Population Reports, Series P-25, No. 963, U.S. Government Printing Office.
Harper, Greg, Jason Devine and Charles Coleman, 2001. "Evaluation of 2000 Subcounty Population Estimates." Paper Presented at the Annual Meetings of the Southern Demographic Association, Miami, FL, September 2001.
Prevost, Ron 1994. "State Housing Unit and Household Estimates: April 1, 1980, to July 1, 1993, U.S. Census Bureau." Current Population Reports, Series P-25, No. 1123, U.S. Government Printing Office.
APPENDIX: HOUSING UNIT ESTIMATES METHODOLOGY
State and County Level Housing Unit Estimates Methodology
The Population Estimates Branch produces the housing unit estimates in the following steps: