www.census.gov

http://www.census.gov/sipp/index.html
SIPP Main Page Introduction to SIPP SIPP Survey Content Technical Information Using and Linking Files SIPP Publications Access SIPP Data SIPP Users' Guide SIPP Tutorial User Notes, ListServe, News SIPP Help

SIPP Home > Using & Linking Files > Linking Core Wave Files to Longitudinal Research Files


Linking Core Wave Files to Longitudinal Research Files

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses following 1996 variable names.

There are relatively few circumstances in which the core wave and full panels files need to be linked because, for the most part, they contain the same information.10 In general, if the same information is available from both the core wave and longitudinal research files, the information from the longitudinal research files is preferable because the edit and imputation procedures used for the longitudinal research files are believed to introduce less error than the procedures used for the core wave files.11 However, some core information is contained only on the core wave files, and, therefore, at times it will be necessary to merge the core wave and longitudinal research files.

The following steps are necessary to link data from the core wave files with data from the full panel files:

  1. Create data extracts from the core wave and longitudinal research files;
  2. Put the two extracts into the same format (either person-month format or person-record format);
  3. Sort the extracts into the same order; and
  4. Merge the extracts, creating the final file. The variables that uniquely identify people in the core wave and longitudinal research files have different names. Table 13-3 shows the names for the three variables needed to match people across those files for panels prior to 1996.12

Table 13-3. Variables Identifying People in the Core Wave and Longitudinal Research Files for Panels Prior to 1996

Variable Core Wave Files   Longitudinal
Research Files
Sample Unit ID SUID is matched to PP-ID
Entry Address ID ENTRY is matched to PP-ENTRY
Person Number PNUM is matched to PP-PNUM

If the final file will be in person-record format, these are the only variables needed for the sort and merge operations (steps 3 and 4, above). If the final file will be in person-month format, then WAVE and REFMTH are also needed.

Figure 13-2 shows the SAS code to transform data from the longitudinal research files in wide- record format into the person-month format used in the core wave files. The program creates a person-month format file from the 1993 longitudinal research file.

Because SAS does not allow variable names with embedded dashes, the .-. characters in the variable names have been replaced with underscore (._.) characters. The 1993 Panel had 10 waves, so the output file will have up to 40 monthly records for each person: no records are written for any months when pp_mis is not equal to 1. The program creates a data set with seven variables: SUID (renamed from PP_ID), ENTRY (renamed from PP_ENTRY), PNUM (renamed from PP_PNUM), REFMTH (which ranges from 1 to 4), WAVE (which ranges from 1 to 10), AGE, and TOTINC.

The REFMTH variable is computed as modulus (i/4) if it is not equal to 0, or 4 if is equal to 0. The modulus is the remainder from the division, so in month six of the panel the quantity is modulus (6/4) = 2, in month seven it is modulus (7/4) = 3, and in month eight it is 4 (since the remainder from the division of 8 by 4 is 0).

The wave is computed as the first integer greater than or equal to i/4. For month one, i/4 = 0.25, so wave = 1. For month four, i/4 = 1, so wave = 1. For month 17, 17/4 = 4.25, so wave = 5.

The file created by the program in Figure 13-2 could be merged with an extract from the core wave files from the 1993 Panel, using SUID, ENTRY, PNUM, WAVE, and REFMTH as the match keys. If the longitudinal research file was in its original sort order, the file created by the program in Figure 13-2 will already be sorted by this set of match keys.

Figure 13-2. Sample SAS Code to Change the Longitudinal Research Files
from Person-Record Format to Person-Month Format for Panels Prior to 1996

Data pmonth
	(keep =
		pp_id
		pp_entry
		pp_pnum
		refmth
		wave
		age
		totinc
	rename =
		(pp_id = suid
		pp_entry = entry
		pp_pnum = pnum
	   )
);
/*
  this example works with the 1993 SIPP panel – 10 waves
*/
set sipp93fp
	(keep =
		pp_id
		pp_entry
		pp_pnum
		pp_mis1 – pp_mis40
		age1 – age40
		totinc1 – totinc40
	);
/*
  define arrays to ease the programming burden
*/
array ages {40} age1 – age40;
array totincs {40} totinc1 – totinc40;
array pp_mis {40} pp_mis1 – pp_mis40;

do i = 1 to 40; 		/* for each month */
   if (pp_mis{i} eq 1) then do	/* if pp_mis is 1, use the data */
      age = ages{i}; 		/* the age in this month */
      totinc = totincs{i}; 	/* total income this month */

      j = mod(i,4);
      if (j eq 0) then refmth = 4;/* the reference month */
      else refmth = j;

      wave = ceil(i/4); 	/* the wave */
      output; 			/* write out the record */
     end;
   end;
run;


Values for AGE and TOTINC from the core wave and longitudinal research files will not match for all people in all months because the core wave files and the longitudinal research files are subjected to different edit and imputation procedures.

In addition, beginning with the 1991 Panel, a missing wave imputation procedure has been applied to the longitudinal research files: people who had missing data from one wave but complete data from the two adjacent waves had data imputed for the missing wave in the longitudinal research files.13 This means that some people will have data in the longitudinal research files for months in which they have no records in the associated core wave files (those who were not Type Z nonrespondents).

Linking Topical Module Files to Longitudinal Research Files from Pre-1996 Panels

While topical module files can be linked with data from the core wave files, there are many times when it will be necessary or desirable to use the longitudinal research files instead.19 For example, if the full panel weights20 are needed for the planned analysis, they must come from the longitudinal research files. When the same core items are available from the core wave and the longitudinal research files, analysts may prefer to use the longitudinal research files because the edit and imputation procedures used for them are believed to introduce less error than the procedures used for the core wave files.

The steps involved are as follows:
  1. Create an extract from the longitudinal research file.
  2. If a file in the person-month format is desired, apply the algorithm described in the section above, Linking Core Wave Files to Longitudinal Research Files. The example in Figure 13-2 can be adapted to that purpose, but the ID variables would need to be renamed to match those used in the topical module files rather than in the core wave files (Table 13-5).
  3. Sort the full panel extract; use PP-ID, PP-ENTRY, and PP-PNUM as the sort keys. These three variables uniquely identify people in the longitudinal research files. If the full panel extract is in the person-month format, include WAVE and REFMTH as the final sort keys.
  4. Create an extract from the topical module file of interest. Sort the extract; use ID (the variable name for the sample unit ID in the topical module files), ENTRY, and PNUM as the sort keys.
  5. Merge the core wave extract with the topical module extract based on the sort keys described here and shown in Table 13-5.

Table 13-5. Variables Identifying People in the Topical Module and Longitudinal Research Files Prior to the 1996 Panel

Variable Topical Module Files   Longitudinal
Research Files
Sample Unit ID ID is matched to PP-ID
Entry Address ID ENTRY is matched to PP-ENTRY
Person Number PNUM is matched to PP-PNUM

Because the longitudinal research files contain a record for every person who was ever a member of a SIPP household, every person with a record in a topical module file should have a record in the longitudinal research file. However, analysts working with a person-month-format file containing records only for months when PP-MIS = 1 may find nonmatches.

__________
10 Because the 1996 longitudinal research file is not complete yet, the discussion in this section pertains only to files for earlier panels. A revised version of this chapter will be available on the Census Bureau SIPP Web site (/sipp/) when the 1996 longitudinal research file is completed.
11 Even when the same variables are on both the core wave and longitudinal research files, the data may not be the same. Different edit and imputation procedures are used for these two types of files. Prior to the 1996 Panel, all edit and imputation procedures applied to the core wave files worked entirely within the given file. Information from previous waves or later waves was not used. Beginning with the 1996 Panel, edit and imputation procedures applied to the core wave files make greater use of information from previous waves. However, because the core wave files are processed as the data become available, it is not possible to make use of information from future waves. The edit and imputation procedures applied to the longitudinal research files, however, make use of each person.s full longitudinal record. There are many times when the preferred data for a study will be on the longitudinal research files but the weights will be on the core wave files.
12 Current plans call for using consistent variable names across all files from the 1996 Panel.

19 Because the full panel longitudinal research file for the 1996 SIPP was still under development at the time this chapter was written, it is not yet possible to describe procedures for using that file. A revised version of this chapter will be available once the longitudinal research file for the 1996 Panel is released to the public.
20 Chapter 8 discusses the SIPP weights, their derivation, and use.

   SIPP Public Use Files
   Using the Core Wave Files
   Using Topical Module Files
   Using the Full Panel Longitudinal Research Files
   Linking Core Wave, Topical Module and Full Panel Files
   Analysis Example
end of content rule Skip bottom navigation groups

 |  Main |  Introduction to SIPP |  SIPP Survey Content |  Technical Information |  Using & Linking Files |  SIPP Publications | 
 |  Access SIPP Data |  SIPP Users' Guide |  SIPP Tutorial |  User Notes/ListServe |  SIPP Help | 


Page Last Modified: May 9, 2006


  Skip this navigation