U. S. Census Bureau
Washington, D.C. 20233
Population Division Working Paper Series No. 74
This paper reports the results of research and analysis undertaken by the U.S. Census Bureau staff. It has undergone a Census Bureau review more limited in scope than that given to Official Census Bureau publications. This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress.
The Department of Education uses school district population estimates in combination with estimates of the numbers of children in poverty to distribute billions of dollars in federal funds to school districts. This report evaluates the synthetic ratio method used to produce postcensal school district estimates of the total population and the school-age population (5 to 17 years). The synthetic ratio method assumes that the ratio of the school district population to the county population, as measured in the most recent decennial census, remains constant throughout the estimate period. To evaluate the method, school district population estimates for April 1, 2000 are produced from the 1990 census population of school districts and counties and the April 1, 2000 county population estimates. The accuracy of the April 1, 2000 estimates is measured by comparison with Census 2000 enumeration data. The Mean Algebraic Percent Error (MALPE), the Mean Absolute Percent Error (MAPE), and the Weighted Mean Absolute Percent Error are calculated for the school district population estimates. The average errors are calculated for regions, states, school district types, school district population size in 1990 and 2000, and the percent population change from 1990 to 2000. The results identify the characteristics of school districts for which the synthetic ratio method performs relatively well and for which the method appears to generate biased estimates.
Table of Contents
Appendix A: Identifying Overlapping School Districts
Appendix B: Cohort-Component and Ratio Methods for Calculating April 1, 2000 County Estimates
Appendix C: Treatment of 1990 Census and Census 2000 School District Files
Population Division Working Paper Series
Each year the United States Department of Education (DOE) distributes more than seven billion dollars to public school districts in order to supplement programs for educationally disadvantaged children. Prior to 1997, distributions to school districts made under Title 1 of the Elementary and Secondary Education Act were based on the number of school-age children in poverty as enumerated in the most recent decennial census. The funds were distributed to counties and then the county funds were distributed to school districts by state governments. Along with other changes enacted for the 1997-1998 school year, Congress mandated that the DOE distribute Title 1 funds directly to school districts based on intercensal updates of the estimated number of school-age children in poverty.
The Census Bureau was given the task of conducting research and updating estimates of the number of poor school-age children between decennial censuses. The Population Division is responsible for the production of the population estimates and the Small Area Income and Poverty Estimates (SAIPE) Program produces estimates of the number of poor school-age children by school district. This report evaluates the current methodology used to create intercensal estimates of the school district population.
The 1996 and 1998 school district estimates were produced using a synthetic ratio method. This approach was used to calculate both the total and school-age (5 to 17 years) populations by school district. Though the synthetic ratio method makes efficient use of easily available data, there are some concerns about the accuracy of the estimates it produces and the assumptions it requires. Inherent to the synthetic ratio method is the assumption that the ratio of a school district population to the corresponding county population remains constant over time. To the extent that a school district population grows or declines at different rates than that of the county or counties in which the school district lies, the estimate will be erroneous. The impact of this assumption and related biases becomes greater for dates further from the decennial census used as the base population for the estimates. Inaccuracies in the county population estimates are also incorporated into the school district estimates.
Miller (2001) evaluated the synthetic ratio method by calculating April 1, 1990 school district population estimates from 1980 census data and April 1, 1990 county estimates. The April 1, 1990 school district estimates were compared with the school district populations enumerated in the 1990 census to determine the accuracy and biases of the synthetic ratio method. The largest proportional errors occurred for small school districts, districts with the highest and lowest poverty rates, and districts with the highest and lowest growth rates.
This report continues the effort to evaluate the results of the synthetic ratio method by producing April 1, 2000 school district population estimates based on the 1990 census and April 1, 2000 county estimates. These estimates were compared to the school district populations enumerated in Census 2000. Section II describes the synthetic ratio method and assumptions and the data used to produce and evaluate the school district population estimates. Section III presents the estimate results and calculates the amount of error attributable to the county estimates and the amount of error attributable to the synthetic ratio method and related assumptions. Section IV evaluates the level of error associated with school district size, growth, geographic region, and district type. Section V summarizes the findings from this evaluation, indicating where the synthetic ratio method was fairly accurate and where other methods may need to be developed in order to improve the estimates.
The synthetic ratio method calculates the population in school districts by assuming that the ratio of the school district population to the county population remains constant over time. In other words, the populations of all school districts or portions of school districts within a county (referred to as "school district pieces") change at the same rate. The method requires county and school district level census data as the starting population and county population estimates for the time period of interest. The synthetic ratio method does not employ a time-sensitive rate of change, so the method can be applied to any time interval without special treatment.
The county population estimates used to calculate the changes in the school district populations are produced annually using a cohort-component method. The cohort-component method begins with the most recent census population by county and adds or subtracts estimates of four components of population change as calculated for the estimate time period. Births, deaths, and net internal and international migration are calculated from various administrative records. To the degree that the assumptions associated with county estimates methodology and coverage in the data sources are inaccurate, the school district estimates will be similarly biased.
Using the 1990 census population counts and April 1, 2000 county estimates, the estimated total population in school districts was calculated with the following ratio:
P sd, 2000 = P sd, 1990 * P ct, 2000 / P ct, 1990
P sd, 2000 = total population estimated for each school district piece for April 1, 2000
P sd, 1990 = total population enumerated for each school district piece in the 1990 census
P ct, 2000 = total estimated county population for April 1, 2000 for the county in which the school district piece lies
P ct, 1990 = total county population enumerated in the 1990 census for the county in which the school district piece lies
Using the 1990 census population counts and April 1, 2000 county estimates, the estimated school-age (5 to 17 years) population in school districts was calculated with the following ratio:
C sd, 2000 = C sd, 1990 * C ct, 2000 / C ct, 1990
C sd, 2000 = estimated school-age population for each school district piece for April 1, 2000
C sd, 1990 = school-age population enumerated for each school district piece in the 1990 census
C ct, 2000 = estimated school-age population by county for April 1, 2000 for the county in which the school district piece lies
C ct, 1990 = school-age population by county enumerated in the 1990 census for the county in which the school district piece lies
The ratios of school district piece populations to county populations in 1990 were applied to each April 1, 2000 county estimate to calculate April 1, 2000 preliminary population estimates for school district pieces. These preliminary estimates were rounded and the sums were controlled to equal the April 1, 2000 county population estimates as conventional in the Population Division’s Population Estimates Branch (PEB) products and as described in Appendix A.
To calculate total population estimates for school districts that cross county boundaries, the school district piece estimates were summed by school districts. These sums are essentially weighted averages of the population change in the relevant counties, weighted by the school district piece populations and applied to the school district base populations.
Estimating the population in school districts is complicated by a number of factors. First, school districts boundaries usually do not match the boundaries of other governmental units, such as counties or cities, for which data are readily available. School districts may cross county and other place boundaries or serve only portions of counties or places. Second, school districts may annex new territory, lose territory, open, or close, resulting in changes to school district boundaries over time. Third, some school districts overlap and share the same pieces of land because the districts serve only limited grade ranges. Each of these issues is addressed by the treatment of the 1990 census data used as the base population for the April 1, 2000 school district population estimates.
In order to evaluate the synthetic ratio approach with the most consistent data available, the starting school district populations from the 1990 census data were tabulated by the Geography Division according to 1999-2000 school district boundaries. These school district boundaries were combined with county boundaries to create separate population counts for each portion of a school district in each county. These are referred to as "school district pieces" in this report. In addition to portions of school districts in each county, there are some areas of land for which there are no school districts. Population counts for these geographies were also included in the 1990 census data used in this evaluation and were tabulated by state and county.
Where school districts serve only limited grade ranges, SAIPE assigned the school-age population from the 1990 census to a single school district based on the age-to-grade distribution from Current Population Survey (CPS) data (averages of 1988, 1989, and 1990 data). For example, an Elementary School District (ESD) and Secondary School District (SSD) share the same geography. All children ages 5 to 10 would be assigned to the ESD, all children ages 14 to 17 would be assigned to the SSD, and the children ages 11 to 13 would be divided between the two districts according to the CPS age-to-grade distributions. These children, as well as those who reside in non-overlapping school districts, are referred to as "relevant children" in this report (U.S. Census Bureau 2002). The process of assigning relevant children ensures that the 1990 data and subsequent estimates of school-age children sum to match county and state totals for the population ages 5 to 17. School-age children were assigned to overlapping school districts in this manner for 22.3 percent of school districts in the 1990 census data.
This same process of assigning relevant children cannot be used to assign the total population to only one school district as the logic of assigning non-school-age people to a particular grade range does not apply. Consequently, overlapping school districts may count the same total population more than once and the 1990 census data and subsequent estimates of the total population will not sum to match county and state totals.
The April 1, 2000 estimates of the population by counties were produced by the PEB using a cohort-component methodology. These April 1, 2000 estimates were produced in order to evaluate the accuracy of the county estimates produced for the 1990s by comparing the results with Census 2000 data. To calculate postcensal estimates with the cohort-component method, estimates of births and net internal and international migration were added to the starting 1990 census populations by county, and deaths were subtracted from the census populations. These components (births, deaths, and migration) were calculated from Internal Revenue Service, Immigration and Naturalization Service, and Social Security Administration, and birth and death registration (vital statistics) records for the 10-year time period since the 1990 census. The Administrative Records and Methodology Branch (ARMR) in the Population Division also produced county population estimates by single years of age (0 to 85), sex, race, and Hispanic origin. The detailed county estimates were collapsed by sex, race, and Hispanic origin into the two sets of county populations required by the school district estimates, the total population and school-age children (5 to 17 years).
A second set of April 1, 2000 county estimates was also used to calculate school district population estimates in the preliminary analyses. These county estimates were created using a ratio methodology. Only the cohort-component method will be used for the post-Census 2000 county estimates and, therefore, only the school district estimates based on the cohort-component county estimates are discussed in Sections III and IV of this report. Appendix B contains a summary of the findings using the ratio-based April 1, 2000 county estimates.
Census 2000 enumeration data were used to evaluate the accuracy of the April 1, 2000 school district population estimates.1 The Census 2000 data were tabulated by 2001-2002 school district boundaries. These boundaries closely matched the 1999-2000 boundaries used to tabulate the 1990 census data and subsequent estimates based on those data. Though the 1990 census and Census 2000 files were not exact matches, they were sufficiently similar for the purposes of this research. However, there were some differences in the data that required special treatment, as described in Appendix C.
In addition to using Census 2000 school district data as a standard to evaluate the accuracy of the synthetic ratio method, Census 2000 county populations were used in place of the April 1, 2000 county estimates to produce a set of school district estimates that do not contain errors due to the county estimates. Comparing the two sets of estimates distinguished the error attributable to the synthetic ratio method and related assumptions from the error attributable to the county estimates.
Both sets of school district estimates (using April 1, 2000 county estimates and Census 2000 county populations) were compared to the Census 2000 school district populations as enumerated using the Mean Algebraic Percent Error (MALPE) and the Mean Absolute Percent Error (MAPE). The MALPE is the sum of the percent differences between each school district estimate and the corresponding Census 2000 population as enumerated divided by the number of school districts. Positive mean algebraic percent errors indicate overestimation of the school district populations on average, and negative errors indicate underestimation of the school district populations. The MAPE is computed in a similar manner, except that the numerator is the sum of the absolute percent differences between the estimates and the Census 2000 values. The MAPE measures the overall accuracy of the estimates. Weighted MAPEs were also calculated to account for the size of the school district populations (total or school-age) in Census 2000. MALPEs and MAPEs were only computed for school districts with total or school-age populations of 30 or more in Census 2000 because population estimates for smaller numbers are particularly unreliable.
As noted above, the school district estimates were compared with the Census 2000 school district populations as \enumerated (see Footnote 1) by calculating the Mean Algebraic Percent Errors (MALPEs) and Mean Absolute Percent Errors (MAPEs). The synthetic ratio method produced estimates for a few school districts with very large errors that had very large effects on the MALPEs and MAPEs (i.e., one school district estimate was 52 times larger than the Census 2000 population). Therefore, various subsets of school districts were tested to determine which allowed for the most useful analyses and still allowed for the inclusion of as many school districts as possible.
In addition to excluding school districts with Census 2000 populations (total or school-age) of less than 30 and those with boundary errors from the calculations of the MALPEs and MAPEs (see Appendix C), school districts were xcluded for two other reasons. First, 8 extreme outliers, defined as estimates that are 90 percent smaller or 5 times larger than the Census 2000 populations, were omitted from the statistics for the total population because they distorted the unweighted MAPEs and MALPEs to such a degree that they lost meaning. Three extreme outliers were omitted from the MAPE and MALPE calculations for the school-age population (see Table 1). Second, four school districts (two in Alaska and one each in Alabama and Maine) with extremely large errors and population changes associated with military base closures were excluded from the analyses. Omitting these outliers appeared justified since county population estimates may appropriately account for population changes due to military base closures, but they cannot account for population redistribution within counties.
The MALPE and MAPE statistics were also calculated for a subset of school districts that contained populations of 100 or more, but this restriction did not noticeably improve the estimates over omitting just the extreme outliers. The latter option was selected since it allowed a larger number of school districts to be retained in the analyses. With these omissions, the analyses were performed on 14,256 (99.6 percent of 14,310) school districts for the total population and 13,876 (97.0 percent of 14,310) school districts for the school-age population. Though the analyses are more meaningful with these omissions, it is still important to consider outliers when developing alternative methods for estimating school district populations because these are the school districts for which improved estimates would be most valuable.
Table 2 shows the MALPEs for the school-age population estimates by state. The school district population was underestimated by 3.8 percent on average for the nation, and was underestimated in 48 states and the District of Columbia. The District of Columbia, as a single school district, was underestimated by 13.5 percent followed by Delaware at 10.0 percent. North Carolina and Rhode Island were the only states for which the average error was overestimated, but only at 0.2 and 0.3 percent, respectively. Table 2 also includes standard deviations for the MALPEs, which indicate the spread of the errors. The estimates for Texas, California, Arizona, and New Mexico had relatively large standard deviations. The difficulties with estimating the school district populations in these states is likely due to the areas with large population growth during the 1990s (see Section IV.C. below). Standard deviations were also relatively large for Montana, Nebraska, and North Dakota, states that contain school districts with relatively small populations, which are associated with large errors (see Section IV.A. and IV. B. below). States such as Maryland, Florida, Louisiana, and West Virginia had the smallest standard deviations for the errors, probably because most or all of the school district boundaries in these states are coterminous with county boundaries. Consequently, most of the errors for the school districts in these states were identical to the errors in the county population estimates and no additional error was added by the synthetic ratio method. Also, these errors tend to apply to larger populations than in most other school districts, and larger school districts generally have smaller errors (see Section IV.A. and IV.B. below).
Table 3 shows unweighted and weighted MAPEs for the school-age population. These figures average absolute errors and are consistent with the MALPEs. The MAPE for the United States was 12.9 percent, ranging from 4.5 percent for West Virginia to 25.0 percent for Arizona. As with the MALPEs, the MAPEs were highest for states that experienced rapid growth in the 1990s (Texas, New Mexico, California, and Arizona) and for states with smaller school district populations (North Dakota, Nebraska, and Montana). Texas, New Mexico, and Arizona were also among the states with the largest MAPEs for the county estimates (Blumerman and Christenson 2002), but the MAPE for California was relatively small. This suggests that though the accuracy of the county estimates has an impact on the accuracy of the school district estimates, the synthetic ratio method and assumptions also have relatively large effects.
Weighting the MAPEs by the Census 2000 school-age population generally reduces the levels of the MAPEs, compared with the unweighted MAPEs (see Table 3). The weighted MAPE for the United States was 9.3 percent, ranging from 3.0 percent for Nevada to 17.4 percent for Arizona. Weighting the MAPEs substantially reduces the average errors for states with some of the largest unweighted MAPEs, such as North Dakota, Nebraska, California, Texas, Montana, and Arizona. This supports the conclusion that small school districts in some of these states tend to have relatively large errors.
Table 4 shows the mean percent errors between the estimated April 1, 2000 total school district population and Census 2000 enumerated data. On average, the school district estimates were 0.1 percent too high for the nation. The MALPEs for the total population estimates ranged from -8.6 percent for Washington DC to 6.7 percent for California. The total population of school districts was underestimated on average for 33 states and the District of Columbia and overestimated on average for 17 states. As with the school-age population, the standard deviations of the errors for the total population estimates were largest for Arizona, Texas, Montana, Nebraska, and California. Some of the states with the smallest standard deviations, like Rhode Island, North Carolina, Louisiana, and West Virginia, had relatively low population growth in the 1990s, and others, like Virginia, Maryland, and Florida, contain school district boundaries that are coterminous with county boundaries.
For the nation, the unweighted MAPE for the estimates of the total population is 9.1 percent and the weighted MAPE was 6.7 percent (see Table 5). The unweighted MAPEs for the estimates of the total population in school districts ranged from 1.9 percent for Maryland to 23.4 percent for Arizona. California (16.8 percent MAPE) and Arizona contained more outliers than the other states, but many were still retained in the analyses because the school districts did not meet the strict criteria to be excluded as extreme outliers. The average size of the errors was greatly reduced when weighting by the Census 2000 school district population, indicating that the estimates for larger school districts tend to be more accurate than those for smaller school districts (see Sections IV.A. and IV. B.).
One element of the synthetic ratio formula is a postcensal estimate of the county population for the time period of interest. County estimates contain errors from the input data, methodology, and assumptions that are introduced into the school district estimates. To evaluate the impact of the errors from the county estimates, a set of school district population estimates was created using Census 2000 county data in place of April 1, 2000 county estimates. This eliminated the errors attributed to the county estimates from the school district population estimates. The differences between the school district estimates based on Census 2000 county data and the actual Census 2000 school district populations were errors due to either differences in coverage and accuracy for the 1990 census and Census 2000 or the synthetic ratio method. The differences between the school district estimates based on Census 2000 county data and the school district estimates based on April 1, 2000 county estimates were errors due to inaccuracies in the county estimates.
Figure 1 shows unweighted and weighted MAPEs for the total and school-age populations in school districts as estimated from the April 1, 2000 county estimates and from Census 2000 county data. For both the total and school-age population estimates, the mean errors for the school district estimates that used Census 2000 county data were at least three-fourths of the mean errors for the estimates based on April 1, 2000 county estimates. This suggests that a relatively large portion of the inaccuracy in the school district population estimates may be attributed to the synthetic ratio method rather than errors in the county estimates.
Miller (2001) evaluated the synthetic ratio method with 1980 Census-based estimates of the school district population in April 1, 1990 using 1990 census data as a standard. For the school-age population, the MAPE of 12.9 percent for the April 1, 2000 estimates was an improvement over the MAPE of 16.9 percent for the April 1, 1990 estimates. Improvements of similar magnitude occurred for weighted MAPEs: 9.3 percent for the 2000 estimates and 12.0 percent for the 1990 estimates.
For the total population, the 2000 estimates had a MAPE of 9.1 percent, more accurate than the 1990 estimates with a MAPE of 13.3 percent (Miller 2001). The weighted MAPE for the 1990 estimates was 9.6 percent and the weighted MAPE for the 2000 estimates was 6.7 percent. The apparent improvement from the 1980 Census-based estimates was at least partly due to the increased accuracy of the county estimates for the 1990s compared with the county estimates for the 1980s.
The evaluation of the 1980 Census-based estimates for 1990 included only 60.4 percent of school districts (Miller 2001), while almost all the school districts were included in the results presented in this report because the 1990 census data were tabulated by 1999-2000 school district boundaries. Though including more school districts may lead to lower MAPEs, when the evaluation was limited to a set of school districts that was more comparable to the Miller (2001) evaluation,2 the mean errors were even smaller than those described above. For the school-age population estimates the MAPE was 11.6 percent for the nation, ranging from 4.9 percent for Louisiana (with only four school districts selected) to 24.7 percent for Arizona (see Table 6). The MAPE of the smaller set of school districts for the total population estimates was 8.1 percent, ranging from 2.2 percent for Hawaii to 24.9 percent for Arizona (see Table 7).
The accuracy of the school district estimates were evaluated for a number of school district characteristics based on prior research on school district poverty and population estimates (Miller 2001; National Research Council 2000). Keeping the characteristics examined consistent allows for comparisons across decades to identify improvements in estimation methods. The accuracy of the school district estimates was examined for the following characteristics of school districts:
See Tables 8 and 9 for summaries of the MALPEs and MAPEs by these characteristics of school districts. Analyzing the accuracy of the school district estimates using these characteristics helps determine the types of school districts for which the synthetic ratio method works relatively well and for the types for which this approach is problematic. Where the accuracy of the school districts estimates differs by geography, size, or population change, bias in the population estimates may adversely affect the estimates of the percent of children in poverty (when combined with SAIPE’s poverty estimates) and the distribution of Title I federal funds to school districts.
Almost half (46.0 percent) of school districts had a total population under 5,000 in 1990. These school districts accounted for only 5.7 percent of the school-age population. School districts with 20,000 or more people represented 19.3 percent of all school districts, but contained 73.0 percent of the school-age children in 1990 (see Table 10). As larger school districts were more likely to make up larger proportions of the county populations, it may be easier to estimate the school district populations for larger school districts with the synthetic ratio method.
Figure 2 shows the MALPEs for the total and school-age populations by the school district total population size in the 1990 census. The synthetic ratio method consistently underestimated the school-age population for all size categories of school districts. The largest MALPE was for school districts with total populations of 10,000 to 19,999, which were underestimated by 5.4 percent.
For school districts with less than 5,000 people, the synthetic ratio method overestimated the total population by 2.0 percent. The total population was underestimated for all larger school districts by 1.2 to 2.0 percent (MALPEs). This is in contrast to the findings reported for the 1980 Census-based estimates of the 1990 population where the total population was overestimated for all school districts, particularly the largest and smallest (Miller 2001). The differences were likely due to differences in errors for the county estimates across decades where the county population estimates were too high for the 1980s and too low for the 1990s (Blumerman and Christenson 2002).
Figure 3 presents the unweighted and weighted MAPEs for the estimates of the school-age populations by the school district size in 1990. Similar to the results found in the previous work (Miller 2001), the unweighted error for school districts with less than 5,000 people (16.2 percent) is about 50 percent higher than the errors for larger school districts (ranging from 9.2 to 11.1 percent). Weighting the MAPEs by the school-age population in Census 2000 somewhat reduced the differences in errors across size categories, but there was still a steady decline in average errors with increasing school district size. Figure 4 shows similar results for the total population estimates: the larger the school districts, the smaller the average errors.
The school-age population estimates had larger errors (MALPEs and MAPEs) than for the total population, which suggests that it is more difficult to correctly distribute the population by age within counties (and consequently school districts) than to estimate the total county (and school district) populations.
Of the 14,310 school districts, 6,252 (43.7 percent) had total populations of less than 5,000 in Census 2000, a decline from 46.0 percent in 1990 (see Table 10). These school districts accounted for only 4.5 percent of the school-age population. School districts with 20,000 or more people represented 21.8 percent of all school districts, but contained 77.3 percent of the school-age children in 2000. Both the proportion of large school districts and the proportion of children in those districts were higher in Census 2000 than in the 1990 census.
Not surprisingly, the relationship between the size of the school districts in 2000 and the size of the average errors was similar to that for the size of the school districts in 1990. Figure 5 shows that the school-age population was underestimated in all sizes of school districts, ranging from -2.2 percent for school districts with under 5,000 people to -6.1 percent for school districts with 20,000 to 39,999 people. The total population was overestimated by 3.0 percent for school districts with less than 5,000 people and was underestimated for all other school district size categories by 1.4 percent to 3.2 percent.
The unweighted MAPEs for the school-age population were almost two-thirds higher for school districts with less than 5,000 people than for larger school districts (see Figure 6). Weighting the MAPEs by the school-age population in Census 2000 reduced the average errors for both the largest and smallest school districts, but changed the MAPE values for the other size categories very little. The smallest school districts also had the largest unweighted MAPEs for the total population, 11.5 percent for school districts with populations under 5,000 and 7.0 to 7.5 percent for the larger school districts (see Figure 7). As with the school-age population, the differences in average errors by school district size were substantially reduced when weighted by population. These findings suggest that smaller school districts may need special treatment in future school district estimates and research.
The relationship between school district size and average errors was similar to that found for county size and average errors. When April 1, 2000 county estimates were compared with the Census 2000 results, larger counties tended to have lower MAPEs. This was also true when comparing April 1, 1990 estimates with the 1990 census data (Blumerman and Christenson 2002). The similarities may be due to both the nature of creating population estimates and the inclusion of county estimates in the calculation of school district population estimates.
The differences in the accuracy of the estimates for the total and school-age populations in school districts by the percent of population change from 1990 to 2000 are striking. Figure 8 shows that for school districts with school-age population declines of more than 10 percent, the estimates of the school-age population were on average too high (MALPE of 18.0 percent). The estimates for school districts with population increases of 10 percent or more were on average 12.9 percent too low. The synthetic ratio method overestimated by 25.7 percent (MALPE) the total population for school districts with more than a ten percent decline in population in the 1990s. The school districts that experienced total population declines of 5 percent to 10 percent were overestimated by 7.7 percent. School districts with population increases of 10 percent or more were underestimated by 8.1 percent. For school districts with more moderate population changes (decreases up to 5 percent through increases up to 10 percent), the synthetic ratio method performed relatively well, with MALPEs of -0.5 percent to 4.5 percent for the total population and -3.7 percent to 0.9 percent for the school-age population. The unweighted MAPE for school districts with declines in the school-age population of 10 percent or more was 20.4 percent (see Figure 9). The next largest MAPE was for school districts with population increases of 10 percent or more (14.5 percent). For the school districts with changes of 10 percent or less, the MAPEs show that the average errors were about the same (7.8 to 8.6 percent). Weighting the MAPEs again reduced the differences among the categories of population change, but still shows that the largest errors occurred for the school districts with the largest percent changes.
For school districts where the total population declined by more than 10 percent between 1990 and 2000, the MAPE was 26.1 percent for the total population estimates, over twice the mean errors for school districts that experienced population declines of less than ten percent or population growth (ranging from 4.6 percent to 10.3 percent, see Figure 10). The second largest MAPE was for school districts with population increases of ten percent or more (10.3 percent). Weighting the MAPEs with the Census 2000 population reduced the MAPE for school districts with the largest population declines by over half to 12.6 percent.
The synthetic ratio method does not perform well when estimating school districts with extreme population changes, though the errors are attributed partly to errors in the county estimates. This was also true when comparing April 1, 2000 county estimates with Census 2000 data. Counties with the largest percent population change from 1990 to 2000, whether growth or decline, had the largest MAPEs (Blumerman and Christenson 2002). These findings demonstrate how the assumption that school district populations change at the same rates as the counties in which they lie fails to capture large population changes and redistribution within these counties. These results also suggest that small school districts with relatively large population changes are among the most difficult to estimate accurately.
As described above (Sections III.A. and III.B.), the school district population can be difficult to estimate for states with relatively large population changes, such as California, Arizona, New Mexico, and Texas. States with many small school districts and relatively small school district populations, such as Nebraska, North Dakota, and Montana, also had relatively large errors when comparing the estimates to the Census 2000 standard (see Tables 2 through 5).
Tables 8 and 9 summarize differences in the accuracy of the school district population estimates by Census Division.3 The estimates produced with the synthetic ratio method underestimated the school-age population in all divisions, particularly in the south and east (West South Central, South Atlantic, Middle Atlantic, New England, and East South Central Divisions), but also in the Mountain Division. The total population was underestimated in the New England and South Atlantic Divisions and overestimated in the West North Central and Pacific Divisions. The mean errors were quite small for the other five Census Divisions. The relatively small differences across Divisions were likely due to the effects of combining many sizes of school districts with large ranges of population changes into single categories.
Consistent with the findings for individual states, the Pacific and Mountain Divisions had the largest MAPEs for the total and school-age population estimates. These Divisions include the states of Arizona, Colorado, Idaho, Nevada, New Mexico, Oregon, Utah, and Washington, some of the fastest growing states during the 1990s. The smallest MAPEs were in the South Atlantic and East South Central Divisions, which contain some of the states with the slowest growth in the past decade, as well as five of the states for which most or all of the school district boundaries were identical to county boundaries. The states with coterminous county and school district boundaries have lower errors on average because the county population estimates were more accurate than the school district estimates. Weighting the errors by population greatly reduced the differences in the errors by Census Division.
Similar to what was found with the bias associated with the percent change in school districts from 1990 to 2000, Census Divisions that experienced the largest population growth from 1990 to 2000 contained school districts whose populations were the most difficult to estimate accurately. These findings suggest that the synthetic ratio method may yield school district population estimates of acceptable accuracy for most states and Census Divisions. The development of alternative methodology could focus on the fastest changing areas.
This report attempts to determine the limits of the synthetic ratio method for estimating the population in school districts, and it may prove useful to determine whether the method produces estimates that differ in accuracy by types of school districts. Of the 13,876 school districts included in statistics for the estimated school-age populations, 17.3 percent were Elementary School Districts (ESDs), 3.4 percent were Secondary School Districts (SSDs), and 79.2 percent were Unified School Districts (USDs, see Table 8). The distribution was similar for the 14,256 school districts included in the statistics for the estimates of the total population: 18.8 percent were ESDs, 3.4 percent were SSDs, and 77.7 percent were USDs (see Table 9).
There were also five areas outside of school districts for which the school-age population size was large enough (30 people or more) to be included in the evaluation statistics discussed above. For the estimates of the total population, there were ten areas outside of school districts that met the minimum size criteria and were included in the analyses presented in this report. However, when MALPEs and MAPEs were calculated separately by school district type, the interpretation is limited for these areas outside school districts because only five or ten observations were used. For example, the school-age population was underestimated by 10.9 percent (MALPE) and the total population for areas outside school districts was overestimated by 10.1 percent. The unweighted MAPEs had even more extreme average errors of 30.0 percent for the school-age population and 35.9 percent for the total population. As these errors for the areas outside of school districts were relatively large, a series of MALPEs and MAPEs (not shown) were also calculated for the school district characteristics described above with the areas outside of school districts excluded. The differences between the MALPEs and MAPEs presented in this paper and for the subset with the areas outside school districts excluded were negligible for all the evaluation characteristics.
The MALPEs in Table 8 show that the school-age population was overestimated for ESDs by 1.5 percent, underestimated by 16.5 percent for SSDs, and underestimated by 4.5 percent for USDs. The differences in mean errors were much smaller for the total population which was overestimated by 3.2 percent for ESDs and underestimated for SSDs and USDs by 1.2 percent and 0.7 percent, respectively.
The MAPE for the estimates of the school-age populations were almost 50 percent higher for the ESDs and SSDs than for the USDs (see Figure 11). These differences remained after weighting by the school-age population. In contrast, the MAPE for the ESDs for the total population estimates was about two times higher (14.6 percent) than those for the SSDs (7.2 percent) and the USDs (7.8 percent) (see Figure 12). These differences were smaller for weighted MAPEs (9.5 percent for ESDs and 6.5 percent for both SSDs and USDs), but still show that the estimates were least accurate for the Elementary School Districts.
One reason the errors were higher for the ESDs and SSDs may be that the average population in Census 2000 was at least twice as large for USDs as for ESDs and SSDs (1,196 for ESDs, 2,206 for SSDs, and 4,406 for USDs). As shown in Sections IV.A. and IV.B. above, estimates for school districts with smaller populations tended to be less accurate. In addition, the SSD mean errors apply to the smallest number of school districts (477 for the school-age population and 491 for the total population), so outliers would have greater effects on the errors. The underestimation of the older population of children (including 12 to 17 year olds) may be because there was a relatively large immigrant population which was underestimated for this age group in the county estimates for the 1990s.
The results also indicate that there may be some differences in the age-to-grade distributions used to assign relevant children to overlapping school districts. The assignment of relevant children to school districts in the 1990 census starting population was based on averages of 1988-1990 CPS data. In the Census 2000 standard used for the evaluation, relevant children were assigned to school districts based on Census 2000 sample data by sex, race, Hispanic origin, and Census Region. It is possible that the age-to-grade distributions may have changed over the ten-year period and introduced additional error into the estimates of Elementary and Secondary School Districts that do not affect the estimates for children in Unified School Districts.
Finally, the statistics for the estimates for the total population may be somewhat skewed because the errors associated with overlapping school districts may be counted more than once in the computation of the MAPEs and MALPEs. However, there is little evidence of bias created by including overlapping school districts in the statistics more than once. Tables 11 and 12 show MAPEs for a subset of school districts that includes only USDs and school districts with boundaries that were not coterminous with county boundaries. The MAPEs demonstrate similar patterns of errors by school district characteristics with the largest errors for the smallest school districts and for those with the largest percent changes from 1990 to 2000.
One of the major conclusions supported by the findings presented above and in prior work (Miller 2001) is that the synthetic ratio method performs relatively well for larger school districts. In order to further test this finding, a set of statistics was calculated only for the school districts with total populations of 20,000 or more. Table 13 shows some substantial difference between the MAPEs by school district characteristics. For example, the MAPE for the school-age population estimate for school districts with more than a 10 percent decrease in the total population from 1990 to 2000 was 20.4 percent for all school districts compared with 12.3 percent for the subset of larger school districts. Large differences also occurred for the New England, West North Central, West South Central, Mountain, and Pacific Divisions (which contain many of the smaller school districts) and Elementary School Districts. The errors were smaller with only the larger school districts included for every category except for those with population declines of 5 percent to 10 percent. The ranges of errors across characteristics were smaller when only the school districts with populations of 20,000 or more were included in the MAPEs, supporting the conclusion that the size of the population influences the size of the errors for all other school district characteristics.
Where school district boundaries are identical to county boundaries, the errors for the school district population estimates are due entirely to the errors in the county population estimates by methodological design. To determine if including school districts with boundaries that were coterminous with county boundaries substantially reduced the mean errors for the school district estimates, MAPEs were also calculated with these school districts excluded. All the school districts in Florida, Nevada, Maryland, and West Virginia were coterminous with county boundaries, and Hawaii and the District of Columbia contained single school districts.
Table 14 shows the differences between MAPEs for the two sets of school districts for the states with the largest differences. While most of the errors increased when coterminous school districts were excluded, the MAPEs for the school-age population for Louisiana and Virginia increased slightly because only a few school districts were included in the analyses for those two states. There were slight increases in the MAPEs for the school-age and total population by size, percent population change, and school district type when the school districts with boundaries identical to county boundaries were excluded (see Tables 15 and 16). The largest differences were for the largest school district size categories and for the largest percent changes, whether population growth or decline. Not surprisingly, the largest changes were for the categories for which the most school districts were excluded as having boundaries coterminous with county boundaries.
This paper presents school district population estimates created by the synthetic ratio method using 1990 census data and April 1, 2000 county estimates. The estimates of the school-age and total populations in school districts were compared to the Census 2000 school district populations as enumerated to evaluate the accuracy of the estimates. The results show that, nationwide, most of the differences between the school district estimates and Census 2000 can be attributed to the synthetic ratio method rather than the county-level population estimates. The largest mean errors were for school districts with total populations under 5,000, school districts with the highest rates of growth and decline, school districts in the Pacific and Mountain Divisions (containing states with the largest population increases in the 1990s), and Elementary School Districts.
These findings suggest that the synthetic ratio method may produce sufficiently accurate school district population estimates for larger school districts and counties with relatively moderate population changes. Future research should focus on smaller districts and rapidly changing areas. The additional complexity added by the assignment of relevant children to overlapping school districts can be expected to introduce additional inaccuracies into the estimates of the school-age population. Special treatment of overlapping school districts, particularly Elementary School Districts, may also be required to improve the accuracy of the estimates.
When assigning relevant children to school districts in the 1990 census data, SAIPE staff created a variable to indicate whether a county contained overlapping school districts. If a county contained school districts that shared geography and the populations within that geography, the school district populations do not sum to equal the county populations. Knowing when school district populations should sum to match county and state totals is useful for checking the accuracy of the data files as they are manipulated in the production of the estimates and for allocating rounding error.
When population estimates for one level of geography (i.e., school district pieces) are intended to sum exactly to estimates of a higher level of geography (i.e., counties), it is conventional in the Population Estimates Branch to allocate (add or subtract) the rounding error to the area with the largest population in order for the change to have the least impact on the estimated population distribution. When the sums of school district total populations include people who were counted more than once because school districts contained overlapping geography, it was impossible to accurately calculate rounding error. For counties that included these populations, the allocation procedure could not be applied to the school estimates discussed in this report.
The flag created by SAIPE staff (referred to as the "original flag" in this report) indicated that a county contained overlapping school districts if "relevant children" were assigned to overlapping school districts in the 1990 census data tabulated with 1999-2000 boundaries. When there were overlapping school district pieces in a county, but no actual school-age children enumerated in the 1990 census for those pieces, this flag did not accurately identify the county as containing overlapping school districts. Consequently, a second flag (referred to as the "alternative flag" in this report) was created by comparing the sums of the total population for each school district piece in a county to the independent county totals from the 1990 census. If the sum of the total population in the school district pieces was greater than the county total population, there were people counted in more than one school district and the county was identified as containing overlapping school districts. Otherwise, the sums of the total population in school district pieces equaled the independent county total population and the county did not contain any overlapping school districts. This approach did not depend on the presence of school-age children but rather on the presence of people of any age, which was a critical difference for the very small school district pieces for which the original flag was inaccurate.
The school district population estimates were calculated using both flags and, though the alternative appeared more accurate, the differences were negligible. The flags were the same for all but 10 counties (0.3 percent of 3,141) in six states (Hawaii, Illinois, Nebraska, North Dakota, Oklahoma, and Texas). The flags generated population estimates that were almost identical to each other (no more than 0.10 percent different). Only the results using the alternative flag are discussed in this report.
Preliminary school district estimates were calculated using two sets of April 1, 2000 county estimates that included the age detail needed to produce the school-age population estimates. The county estimates were based on either a cohort-component method or a ratio method and were produced by the Administrative Records and Methodology Branch (ARMR). The cohort-component method started with the 1990 census county population and used administrative records to add births, subtract deaths, and calculated net migration for each county. The ratio method proportionally applied changes in the county populations since the 1990 census while controlling to the state population estimates by age, sex, race, and Hispanic origin (Sink and Lollock 2002). Both sets of estimates were constrained to equal independently produced total populations by counties. Both sets of estimates were also constrained to contain the same number of people by age, sex, race, and Hispanic origin for each state. The only way the two sets of county estimates differed that was relevant to this evaluation was by the number of school-age (5 to 17 years) children in each county.
The ratio-based method produced larger average proportions of school-age children in every state than the cohort-component method, 19.5 percent versus 18.3 percent for the nation. There were also differences in the age distributions within the 5 to 17 year age group. On average, the county estimates from the ratio method had higher proportions of 5-to-10-year olds than the county estimates from the cohort-component method4. As a result, the county estimates created with the ratio method estimated higher average numbers of school-age children in school districts than the cohort-component estimates. The ratio-based county estimates also overestimated the school-age population in school districts for many states while the county estimates created with the cohort-component method on average underestimated the school-age population for all but two states. This was due to differences in the methodology that were most evident in the states with higher population growth, such as California, Colorado, Florida, and Utah.
For the nation, the cohort-component county estimates underestimated the April 1, 2000 school-age population in school districts by 3.8 percent and the ratio-based county estimates overestimated the school-age population by 1.0 percent. The MAPEs for the cohort-component- and ratio-based school district estimates of the school-age population, however, were almost identical. For most states, the differences for the cohort-component- and ratio-based estimates as measured by the MAPEs were relatively small.
Though the MALPE findings suggest that the ratio-based county estimates may yield better school district estimates of the school-age population compared with the cohort-component county estimates, the MAPEs do not show such clear results. Only the results using the April 1, 2000 county estimates produced using the cohort-component method were discussed in this report because this is the only method for producing demographic details for county estimates planned for post-Census 2000 estimates.
The Census 2000 data were tabulated by 2001-2002 school district boundaries. These boundaries closely matched the 1999-2000 boundaries used to tabulate the 1990 census data and subsequent estimates based on those data. There were some differences in the data that required special treatment as follows:
1Census 2000 population data include modifications as documented in the Count Question Resolution program, updates from the Boundary and Annexation Survey, and geographic program revisions.
2This subset of school districts excluded Elementary and Secondary School Districts and school districts with boundaries that are coterminous with county boundaries in addition to those described in Appendix C and Section III.
3Census Regions and Divisions are the Northeast Region: New England - Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, and Connecticut; Middle Atlantic - New York, New Jersey, and Pennsylvania; Midwest Region: East North Central - Ohio, Indiana, Illinois, Michigan, and Wisconsin; West North Central - Minnesota, Iowa, Missouri, North Dakota, South Dakota, Nebraska, and Kansas; South Region: South Atlantic - Delaware, Maryland, District of Columbia, Virginia, West Virginia, North Carolina, South Carolina, Georgia, and Florida; East South Central - Kentucky, Tennessee, Alabama, and Mississippi; West South Central - Arkansas, Louisiana, Oklahoma, and Texas; West Region: Mountain - Montana, Idaho, Wyoming, Colorado, New Mexico, Arizona, Utah, Nevada; Pacific - Washington, Oregon, California, Alaska, and Hawaii.
4The total population and total number of people by age in each state were constrained to be equal across both sets of county estimates and, consequently, to have higher proportions of children on average by county, the ratio method must have distributed sufficiently higher proportions of children to a relatively higher number of counties than the cohort-component method.
5Census 2000 population data include modifications as documented in the Count Question Resolution program, updates from the Boundary and Annexation Survey, and geographic program revisions.
Blumerman, Lisa M. and Matthew Christenson. 2002. "The Population Estimates of the 1990s: Close to the Mark?" Paper Presented at the Annual Meeting of the American Sociological Association, Chicago, IL.
Miller, Esther. 2001. "Evaluation of the 1990 School District Level Population Estimates Based on the Synthetic Ratio Approach." U.S. Census Bureau, Population Division Working Paper No. 54.
National Research Council. 2000. Small-Area Estimates of School-Age Children in Poverty: Evaluation of Current Methodology. Panel on Estimates of Poverty for Small Geographic Areas, Constance F. Citro and Graham Kalton, editors. Committee on National Statistics. Washington, D.C.: National Academy Press.
Sink, Larry and Lisa Lollock. "An Evaluation of Alternative Methods of Estimating Population by Age, Race, and Hispanic Origin at the County Level." Paper presented at the annual meeting of the Southern Demographic Association, October 2002.
U.S. Census Bureau, Housing and Household Economic Statistics Division, Small Area Estimates Branch. 2002. "Small Area Income and Poverty Estimates: 1999 Overview of School District Estimates." www.census.gov/hhes/www/saipe/schools/sd99over.htmp