www.census.gov

http://www.census.gov/sipp/index.html
SIPP Main Page Introduction to SIPP SIPP Survey Content Technical Information Using and Linking Files SIPP Publications Access SIPP Data SIPP Users' Guide SIPP Tutorial User Notes, ListServe, News SIPP Help

SIPP Home > Technical Information > SIPP Data Editing and Imputation > Processing SIPP Data >


Processing SIPP Data

There are two phases to the processing of SIPP data. At the conclusion of each wave of interviewing, the data collected during that wave are processed, creating the core wave and topical module files. That is the first phase of processing. Then, at the conclusion of the final wave of interviews, core data from all waves are linked and a new set of edit and imputation procedures is applied to the resulting full panel file. That is the second phase of processing.

Figure 4-1 illustrates the steps that generate the Census Bureau's internal core wave and full panel files.

Figure 4-1. Sequence of Cross-Sectional Imputation and Longitudinal Editing Procedures

Figure 4-1 can be read on page 4-4 of the SIPP Users' Guide

a Most Type Z records in the 1996 Panel were not handled in a separate process.

Phase 1 Summary

There are six steps in the first phase of SIPP data processing:

  1. As each wave of interviewing is completed, core data collected during the wave are edited for internal consistency.
  2. Following data editing, the statistical matching and hot-deck procedures described later in this chapter are used to impute missing data from the core wave file.
  3. A public use version of the core wave file is then created from the resulting internal core wave file. The public use file is the same as the Census Bureau's internal file except that it has certain information suppressed or topcoded to protect the confidentiality of survey respondents (see sections on Topcoding and Suppression of Geographic Information, at the end of this chapter).
  4. On a separate production track from the core data, data from the topical module file administered with the wave are edited for internal consistency. The extent of data editing varies across the topical modules, and some topical modules receive almost no editing.
  5. Next, hot-deck procedures are used to impute missing data in the topical module. The extent of imputation varies across the topical modules; some topical modules have no missing data imputed.
  6. A public use version of the topical module file is created from the resulting internal file. As with the public use core wave files, the public use topical module files have certain information suppressed to protect the confidentiality of survey respondents.

These steps are repeated at the conclusion of each wave of interviews. Prior to the 1996 Panel, each wave was processed independently of other waves of data. Thus, when multiple core wave files are linked, apparent changes in a respondent's status could be due to different applications of data edits and imputations to the files being combined (file linkage is the subject of Chapter 13 of the SIPP Users' Guide). With the 1996 data, the hot-deck procedure was redesigned to rely on historical information reported in prior waves. In addition, other forms of longitudinal imputation, such as carryover methods, were adapted.


Phase 2 Summary

At the conclusion of the panel, the Census Bureau creates a full panel file containing core data from all waves. There are four steps to this process.

  1. Core data from all waves are linked. Those data have already been subjected to the Phase 1 edit and imputation procedures.
  2. A series of longitudinal edits are applied to the full panel file. Unlike the core wave edit procedures, these edits are designed to create longitudinally consistent records for each person. Both reported values and values that were imputed during the first phase of processing are subject to change. Thus, the data in a full panel file may differ from the data in the core wave files from which the full panel file was constructed.
  3. A missing wave imputation procedure is then applied. Data are imputed when a sample member was absent for one or two consecutive waves but was present for the two adjacent waves. Data for the missing wave(s) are interpolated on the basis of information from the fourth month of the prior wave and the first month of the subsequent wave. The missing wave imputation procedure was introduced with the 1991 Panel. Earlier panels were not subjected to this procedure.
  4. A public use version of the full panel file is created from the resulting internal file. The public use file has certain information suppressed to protect the confidentiality of survey respondents.

The balance of Chapter 4 of the SIPP Users' Guide describes in greater detail the full sequence of data edit and imputation procedures applied to SIPP data files. Most of the material contained in that chapter are taken from Pennell (1993).

red bullet   Types of Missing Data
red bullet   Missing Data Problems
red bullet   Handling Missing Data
red bullet   Data Editing and Imputation Goals
red bullet   Effects of Imputed Data on Analysis

red bullet   Confidentiality Procedures
end of content rule Skip bottom navigation groups

 |  Main |  Introduction to SIPP |  SIPP Survey Content |  Technical Information |  Using & Linking Files |  SIPP Publications | 
 |  Access SIPP Data |  SIPP Users' Guide |  SIPP Tutorial |  User Notes/ListServe/News |  SIPP Help | 


Page Last Modified: May 9, 2006


  Skip this navigation