U.S. Department of Commerce

State & Local Government Finances

Skip top of page navigation
You are here: Census.govBusiness & IndustryFederal, State, & Local GovernmentsState & Local Government FinancesHow the Data are Collected › Data Processing
Skip top of page navigation

Data Processing: 2011

Editing: is a process that tries to ensure the accuracy, completeness, and consistency of the survey data. Efforts are made at all phases of collection, processing, and tabulation to minimize reporting, keying, and processing errors,

Although some edits are built into the Internet data collection instrument and the data entry programs, the majority of the edits are performed post collection. Edits consist primarily of four types: (1) consistency edits, (2) historical ratio edits of the current year's reported value to the prior year's value, (3) current year ratio edits, and (4) balance checks,

The consistency edits check the logical relationships of data items reported on the form. For example, if interest on debt is reported, then there must be debt.

The historical ratio edits compare data for the current year to data for the prior year or prior census year. If data fall outside of acceptable tolerance levels, the item is flagged for further review. For example, the reported property tax for the current year may be compared against the property tax last year, if the reporting unit was in last year's sample. If it was not in last year's sample, the current year value is compared to the prior census year value.

The current year ratio edits compare one data item on the form against a different data item. If data fall outside of acceptable tolerance levels, the item is flagged for further review. For example, airport expenditure to airport revenue is a current year ratio.

Balance checks are checks of linear relationships that exist in the data. Debt flow is an example of a balance check. The ending debt must equal the beginning debt plus the debt issued minus the debt retired.

After all data are edited and imputed, they are aggregated. A macro-edit, or aggregate-level, review is conducted with current year state aggregates compared to prior year and prior census aggregates. Macro-level ratio edits and tolerance levels were developed using the current year data.

For the ratio edits, consistency edits, balance checks, and macro edits, the edit results are reviewed by analysts and adjusted as needed. When the analyst is unable to resolve or accept the edit failure, contact is made with the respondent to verify or correct the reported data. The results of the action are tracked with a data edit flag.

Imputation: Not all respondents answer every item on the questionnaire. There are also questionnaires that are not returned despite efforts to gain a response. Imputation is the process of filling in missing or invalid data with reasonable values in order to have a complete data set for analytical purposes. For census years, the complete data set is also needed for sample design purposes.

For nonresponding general purpose governments, imputations for missing units are based on recently reported historical data from either a prior year annual survey or the most recent census, adjusted by a growth rate. If no historical data are available, data from a randomly selected similar unit are adjusted by the ratio of the populations of the nonresponding and randomly selected donor governments.

The imputations for nonresponding special districts are done similarly. If prior year reported data are available, the prior year data for the nonrespondent are adjusted by a growth rate that is determined from reporting units that are similar to the nonrespondent. Special districts are similar if they are of the same function code and similar geography, e.g., police protection in a state or water transport in a region. For nonresponding special districts with no recently reported data available, data are used from a randomly selected donor that is similar to the nonrespondent. In cases where secondary data sources exist, the data from those sources are used.

For individual questionnaire items that are not reported by general purpose governments or dependent and independent school districts, either data from another source, pro-rating of totals, or prior year data are used to give a complete dataset.

Note: Between years 2002 through 2006, individual government imputed data were not released to the public. For 2007 through 2011, individual unit data are available upon request. The data carry imputation and edit flags to help the users determine the usability of the data for their purposes.

Estimation: After the data were edited and imputed, the estimates were calculated using a regression estimator for most variables. For capital outlay and debt variables, a Horvitz-Thompson estimator was used. Downloadable files of the final estimates are available on the website.

Variance: Data that are derived from the annual sample survey are subject to sampling error. The statistics in this report that are based wholly or partly on data from the sample are apt to differ from the results of a census covering all governments. Estimates based on a sample survey are subject to sampling variability. The particular sample used is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Each of the possible samples would yield somewhat different results.

The standard error is a measure of the variation among the estimates from all possible samples and thus is a measure of the precision with which an estimate from a particular sample approximates the average results of all possible samples. A bootstrap variance estimator is used to estimate the variance for the 2011 Annual Survey of Local Government Finances. Each viewable table contains a column that gives users the coefficients of variation (or relative standard error) that have been computed for these estimates. The coefficient of variation is the estimated standard error expressed as a percent of the estimated total or proportion.

State government financial statistics result from a complete canvass of all state government agencies. Consequently, there is no associated measure of sampling error, such as the coefficient of variation. However, these statistics are subject to non-sampling error. Such error includes inaccuracies in classification, coverage, and processing.

Although efforts were made at all phases of collection, processing, and tabulation to minimize errors, the data were still subject to errors from imputing for missing data, errors from miscoding, and errors in coverage. Every effort was made to keep such errors to a minimum through examining, editing, and tabulating the data.

The CVs (coefficient of variation) presented in tables can be used to derive the standard error of the estimate. The standard error can then be used to derive interval estimates with prescribed levels of confidence that the interval includes the average results of all samples:

    a. intervals defined by one standard error above and below the sample estimate will contain the true value about 68 percent of the time;

    b. intervals defined by 1.6 standard errors above and below the sample estimate will contain the true value about 90 percent of the time;

    c. intervals defined by two standard errors above and below the sample estimate will contain the true value about 95 percent of the time.

The user can calculate the standard error by multiplying the CV presented in the tables by the corresponding estimate. The CVs presented in the tables are in percentage form and must be divided by 100 before being multiplied by the estimate. This standard error estimate can then be used to get a 90 percent interval estimate by multiplying it by 1.6 and adding the result to the estimated total to get the upper bound and subtracting it from the estimated total to get the lower bound.


Source: U.S. Census Bureau | State & Local Government Finances | 1 (800) 242-4523| govs.finstaff@census.gov |  Last Revised: September 20, 2013