SIPP Home > Using & Linking Files > Linking Core Wave Files to Longitudinal Research Files
When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses following 1996 variable names.
There are relatively few circumstances in which the core wave and full panels files need to be linked because, for the most part, they contain the same information.10 In general, if the same information is available from both the core wave and longitudinal research files, the information from the longitudinal research files is preferable because the edit and imputation procedures used for the longitudinal research files are believed to introduce less error than the procedures used for the core wave files.11 However, some core information is contained only on the core wave files, and, therefore, at times it will be necessary to merge the core wave and longitudinal research files.
The following steps are necessary to link data from the core wave files with data from the full panel files:
| Variable | Core Wave Files | Longitudinal Research Files | |
| Sample Unit ID | SUID | is matched to | PP-ID | Entry Address ID | ENTRY | is matched to | PP-ENTRY | Person Number | PNUM | is matched to | PP-PNUM |
If the final file will be in person-record format, these are the only variables needed for the sort and merge operations (steps 3 and 4, above). If the final file will be in person-month format, then WAVE and REFMTH are also needed.
Figure 13-2 shows the SAS code to transform data from the longitudinal research files in wide- record format into the person-month format used in the core wave files. The program creates a person-month format file from the 1993 longitudinal research file.
Because SAS does not allow variable names with embedded dashes, the .-. characters in the variable names have been replaced with underscore (._.) characters. The 1993 Panel had 10 waves, so the output file will have up to 40 monthly records for each person: no records are written for any months when pp_mis is not equal to 1. The program creates a data set with seven variables: SUID (renamed from PP_ID), ENTRY (renamed from PP_ENTRY), PNUM (renamed from PP_PNUM), REFMTH (which ranges from 1 to 4), WAVE (which ranges from 1 to 10), AGE, and TOTINC.
The REFMTH variable is computed as modulus (i/4) if it is not equal to 0, or 4 if is equal to 0. The modulus is the remainder from the division, so in month six of the panel the quantity is modulus (6/4) = 2, in month seven it is modulus (7/4) = 3, and in month eight it is 4 (since the remainder from the division of 8 by 4 is 0).
The wave is computed as the first integer greater than or equal to i/4. For month one, i/4 = 0.25, so wave = 1. For month four, i/4 = 1, so wave = 1. For month 17, 17/4 = 4.25, so wave = 5.
The file created by the program in Figure 13-2 could be merged with an extract from the core wave files from the 1993 Panel, using SUID, ENTRY, PNUM, WAVE, and REFMTH as the match keys. If the longitudinal research file was in its original sort order, the file created by the program in Figure 13-2 will already be sorted by this set of match keys.
Data pmonth
(keep =
pp_id
pp_entry
pp_pnum
refmth
wave
age
totinc
rename =
(pp_id = suid
pp_entry = entry
pp_pnum = pnum
)
);
/*
this example works with the 1993 SIPP panel – 10 waves
*/
set sipp93fp
(keep =
pp_id
pp_entry
pp_pnum
pp_mis1 – pp_mis40
age1 – age40
totinc1 – totinc40
);
/*
define arrays to ease the programming burden
*/
array ages {40} age1 – age40;
array totincs {40} totinc1 – totinc40;
array pp_mis {40} pp_mis1 – pp_mis40;
do i = 1 to 40; /* for each month */
if (pp_mis{i} eq 1) then do /* if pp_mis is 1, use the data */
age = ages{i}; /* the age in this month */
totinc = totincs{i}; /* total income this month */
j = mod(i,4);
if (j eq 0) then refmth = 4;/* the reference month */
else refmth = j;
wave = ceil(i/4); /* the wave */
output; /* write out the record */
end;
end;
run;
|
Values for AGE and TOTINC from the core wave and longitudinal research files will not match for all people in all months because the core wave files and the longitudinal research files are subjected to different edit and imputation procedures.
In addition, beginning with the 1991 Panel, a missing wave imputation procedure has been applied to the longitudinal research files: people who had missing data from one wave but complete data from the two adjacent waves had data imputed for the missing wave in the longitudinal research files.13 This means that some people will have data in the longitudinal research files for months in which they have no records in the associated core wave files (those who were not Type Z nonrespondents).
While topical module files can be linked with data from the core wave files, there are many times when it will be necessary or desirable to use the longitudinal research files instead.19 For example, if the full panel weights20 are needed for the planned analysis, they must come from the longitudinal research files. When the same core items are available from the core wave and the longitudinal research files, analysts may prefer to use the longitudinal research files because the edit and imputation procedures used for them are believed to introduce less error than the procedures used for the core wave files.
The steps involved are as follows:Table 13-5. Variables Identifying People in the Topical Module and Longitudinal Research Files Prior to the 1996 Panel
| Variable | Topical Module Files | Longitudinal Research Files | |
| Sample Unit ID | ID | is matched to | PP-ID |
| Entry Address ID | ENTRY | is matched to | PP-ENTRY |
| Person Number | PNUM | is matched to | PP-PNUM |
Because the longitudinal research files contain a record for every person who was ever a member of a SIPP household, every person with a record in a topical module file should have a record in the longitudinal research file. However, analysts working with a person-month-format file containing records only for months when PP-MIS = 1 may find nonmatches.
__________
10 Because the 1996 longitudinal research file is not complete yet, the discussion in this section pertains only to files
for earlier panels. A revised version of this chapter will be available on the Census Bureau SIPP Web site
(/sipp/) when the 1996 longitudinal research file is completed.
11 Even when the same variables are on both the core wave and longitudinal research files, the data may not be the
same. Different edit and imputation procedures are used for these two types of files. Prior to the 1996 Panel, all edit
and imputation procedures applied to the core wave files worked entirely within the given file. Information from
previous waves or later waves was not used. Beginning with the 1996 Panel, edit and imputation procedures applied
to the core wave files make greater use of information from previous waves. However, because the core wave files
are processed as the data become available, it is not possible to make use of information from future waves. The edit
and imputation procedures applied to the longitudinal research files, however, make use of each person.s full
longitudinal record. There are many times when the preferred data for a study will be on the longitudinal research
files but the weights will be on the core wave files.
12 Current plans call for using consistent variable names across all files from the 1996 Panel.
19 Because the full panel longitudinal research file for the 1996 SIPP was still under development at the time this
chapter was written, it is not yet possible to describe procedures for using that file. A revised version of this chapter
will be available once the longitudinal research file for the 1996 Panel is released to the public.
20 Chapter 8 discusses the SIPP weights, their derivation, and use.
|
Main |
Introduction to SIPP |
SIPP Survey Content |
Technical Information |
Using & Linking Files |
SIPP Publications |
|
Access SIPP Data |
SIPP Users' Guide |
SIPP Tutorial |
User Notes/ListServe |
SIPP Help |
Page Last Modified: May 9, 2006