U.S. flag

An official website of the United States government

Skip Header


Statistical Quality Standard C2: Editing and Imputing Data

Purpose: The purpose of this standard is to ensure that methods are established and implemented to promote the accurate correction of missing and erroneous values in survey, census, and administrative records data through editing and imputation.

Scope: The Census Bureau’s statistical quality standards apply to all information products released by the Census Bureau and the activities that generate those products, including products released to the public, sponsors, joint partners, or other customers. All Census Bureau employees and Special Sworn Status individuals must comply with these standards; this includes contractors and other individuals who receive Census Bureau funding to develop and release Census Bureau information products.

In particular, this standard applies to the development and implementation of editing and imputation operations for survey, census, administrative records data, and geospatial data.

Exclusions:
In addition to the global exclusions listed in the Preface, this standard does not apply to:

  • Estimation methods, such as nonresponse adjustments, that compensate for missing data. Statistical Quality Standard D1, Providing Direct Estimates from Samples, addresses requirements for estimation methods.

Key Terms: Editing, imputation, outliers, skip pattern, and truth deck.

Requirement C2-1: Throughout all processes associated with editing and imputation, unauthorized release of protected information or administratively restricted information must be prevented by following federal laws (e.g., Title 13, Title 15, and Title 26), Census Bureau policies (e.g., Data Stewardship Policies), and additional provisions governing the use of the data (e.g., as may be specified in a memorandum of understanding or data-use agreement). (See Statistical Quality Standard S1, Protecting Confidentiality.)

Requirement C2-2: A plan must be developed that addresses:

  1. Requirements for the editing and imputation systems.
  2. Verification and testing of the editing and imputation systems.
  3. Monitoring and evaluation of the quality of the editing and imputation operations.

    Note: Statistical Quality Standard A1, Planning a Data Program, addresses overall planning requirements, including estimates of schedule and costs.

Requirement C2-3: Data must be edited and imputed using statistically sound practices, based on available information.

Sub-Requirement C2-3.1: Specifications and procedures for the editing and imputation operations must be developed and implemented to detect and correct errors or missing data in the files.

Examples of issues that specifications and procedures might address include:

  • Checks of data files for missing data, duplicate records, and outliers (e.g., checks for possible erroneous extreme responses in income, price, and other such variables).
  • Checks to verify the correct flow through prescribed skip patterns.
  • Range checks or validity checks (e.g., to determine if numeric data fall within a prespecified range or if discrete data values fall within the set of acceptable responses).
  • Consistency checks across variables within individual records to ensure non-contradictory responses (e.g., if a respondent is recorded as 5 years old and married, the record contains an error).
  • Longitudinal consistency checks for data fields not measuring period to period changes.
  • Editing and imputation methods and rules (e.g., internal consistency edits, longitudinal edits, hot deck edits, and analyst corrections).
  • Addition of flags on the data files to clearly identify all imputed and assigned values and the imputation method(s) used.
  • Retention of the unedited values in the file along with the edited or imputed values.
  • Checks for topology errors in geospatial data (e.g., lack of coincidence between boundaries that should align, gaps, overshoots, and floating segments).
  • Checks for address range errors in geographic data (e.g., parity inconsistencies, address range overlaps and duplicates, and address range direction irregularities).
  • Checks for duplicate map features.
  • Standardization of street name information in geographic data (e.g., consistency of abbreviations and directionals, and consistent formatting).
  • Rules for when data not from the data collection qualify as “equivalent-quality-to-reported-data” for establishment data collections.

Sub-Requirement C2-3.2: Editing and imputation systems and procedures must be verified and tested to ensure that all components function as intended.

Examples of verification and testing activities include:

  • Verifying that edit and imputation specifications reflect the requirements for the edit and imputation systems.
  • Validating edit and imputation instructions or programming statements against specifications.
  • Verifying that the imputation process is working correctly using test files.
  • Verifying that edit and imputation outcomes comply with the specifications.
  • Verifying that edit and imputation rules are implemented consistently.
  • Verify that the editing and imputation outcomes are consistent within records and consistent across the full file.
  • Verifying that the editing and imputation outcomes that do not use randomization are repeatable.

Sub-Requirement C2-3.3: Systems and procedures must be developed and implemented to monitor and evaluate the quality of the editing and imputation operations and to take corrective actions if problems are identified.

Examples of monitoring and evaluation activities include:

  • Monitoring and documenting the distributions of, and reasons for, edit and imputation changes to determine if corrections are needed in the system.
  • Evaluating and documenting editing results for geospatial files (e.g., edits resulting in improvements in boundaries, feature coverage, and feature accuracy) and geographic files (e.g., address ranges, address parity, and geographic entity names and codes).
  • Reviewing and verifying data when edits produce results that differ from the past.
  • Using a truth deck to evaluate the accuracy of the imputed values.

Requirement C2-4: Documentation needed to replicate and evaluate the editing and imputation operations must be produced. The documentation must be retained, consistent with applicable policies and data-use agreements, and must be made available to Census Bureau employees who need it to carry out their work. (See Statistical Quality Standard S2, Managing Data and Documents.)

Examples of documentation include:

  • Plans, requirements, specifications, and procedures for the editing and imputation systems, including edit rules.
  • Distributions of changes from edits and imputations.
  • Retaining original responses (before edit/imputation) on data files along with the final edited/imputed responses.
  • Problems encountered and solutions implemented during the editing and imputing operations.
  • Quality measures from monitoring and evaluating the editing and imputation operations (e.g., imputation rates and edit change rates). (See Statistical Quality Standard D3, Providing Measures and Indicators of Nonsampling Error.)

Notes:

  1. The documentation must be released on request to external users, unless the information is subject to legal protections or administrative restrictions that would preclude its release. (See Data Stewardship Policy DS007, Information Security Management Program.)
  2. Statistical Quality Standard F2, Providing Documentation to Support Transparency in Information Products, contains specific requirements about documentation that must be readily accessible to the public to ensure transparency of information products released by the Census Bureau.

Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header