www.census.gov

http://www.census.gov/sipp/index.html
SIPP Main Page Introduction to SIPP SIPP Survey Content Technical Information Using and Linking Files SIPP Publications Access SIPP Data SIPP Users' Guide SIPP Tutorial User Notes, ListServe, News SIPP Help

SIPP Home > Using & Linking Files > Using the Core Wave Files > Identifying >


Identifying

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses following 1996 variable names.

Identifying Persons

There are many occasions when a user may need to identify which records belong to which individual in the SIPP data files. This need arises, for example, when:

USING THE CORE WAVE FILES

When text copy applies to both 1996 and pre-1996 panel files, pre-1996 variable names appear in parentheses following 1996 variable names.

Table 10-1. Person-Month File Structure for the Core Wave Files

1996 Panel
Sample
Unit ID
(SSUID)
Current
Address ID
(SHHADID)
Person
Number
(EPPPNUM)
Rotation
Group
(SROTATION)
Reference
Month
(SREFMON)
Calendar
Month
(RHCALMN)
123451000123 011 0101 2 1 2
123451000123 011 0101 2 2 3
123451000123 011 0101 2 3 4
123451000123 011 0101 2 4 5
123451000123 011 0102 2 1 2
123451000123 011 0102 2 2 3
123451000123 011 0102 2 3 4
123451000123 011 0102 2 4 5
123451000123 021 0201 2 1 2
123451000123 021 0201 2 2 3
123451000123 022 0202 2 4 5
Prior to the 1996 Panel
Sample
Unit ID
(SUID)
Current
Address ID
(ADDID)
Person
Number
(PNUM)
Rotation
Group
(ROT)
Reference
Month
(REFMTH)
Calendar
Month
(MONTH)
123451000 11 101 2 1 2
123451000 11 101 2 2 3
123451000 11 101 2 3 4
123451000 11 101 2 4 5
123451000 11 102 2 1 2
123451000 11 102 2 2 3
123451000 11 102 2 3 4
123451000 11 102 2 4 5
123451000 21 201 2 1 2
123451000 21 201 2 2 3
123451000 22 202 2 4 5

To uniquely identify a person in the core wave files, analysts should employ the three variables shown in Table 10-2. Users should note that in the 1996 Panel, the entry address ID is no longer needed for unique identification. Its continued use will not create any problems; it is simply redundant information. That is a change from earlier panels in which the entry address ID was key to uniquely identifying persons.


1996 Panel
Variable Name Description
SSUID (SUID) Sample unit ID
EENTAID (ENTRY) Entry address ID (Not required for identification in the 1996 Panel)
EPPPNUM (PNUM) Person number

The variables in Table 10-2 have the following characteristics:

  • SSUID (SUID) uniquely identifies each initially sampled dwelling unit.6 Every person in a core wave file was either a member of one of those units (an original sample member) or lives with someone who was a member of an initially sampled dwelling unit. A person’s connection to that unit is an attribute of that person and does not change over time.7 This means that as people move from address to address, their SSUID (SUID) stays the same. As new people join the homes of original sample members, they receive the SSUID (SUID) of the original sample members.
  • EENTAID (ENTRY) identifies the address where the person lived at the time she or he was first interviewed. It does not change even if the person moves.8 Prior to the 1996 Panel, it was used in conjunction with the person number and sample unit ID to uniquely identify persons within the sampling unit. It is not needed to uniquely identify persons in the 1996 panel. Values for this variable are unique only within sample units. The entry address ID has two components. The first part of the ID number (two digits in the 1992 and 1996 Panels, and one digit in all others) identifies the wave in which SIPP interviews were first conducted at the address. The second part of the number (one digit in all panels) sequentially numbers addresses within a sample unit [SSUID (SUID)] that enter the sample in the same wave. See Chapter 9 for a more complete discussion.
  • Prior to the 1996 Panel, PNUM uniquely identified a person within the sample unit and entry address ID. In the 1996 Panel, EPPPNUM uniquely identifies a person within the sample unit. EPPPNUM (PNUM) does not change even if the person moves.9 The first part of EPPPNUM (PNUM) (two digits in the 1992 and 1996 Panels, one digit in all others) indicates the wave in which the person was first interviewed.10 The remaining two digits are sequentially assigned within the household. Thus, original sample members are assigned person numbers ranging from 100 to 199. Individuals who enter the SIPP sample in Wave 2 are assigned a person number ranging from 200 to 299. Those who enter in Wave 10 are assigned person numbers ranging from 1001 to 1099.

Table 10-3 illustrates how the combination of SSUID (SUID), EENTAID (ENTRY), and EPPPNUM (PNUM) uniquely identifies people and provides information about when they first entered the SIPP sample. In this example, there are eight individuals: five are original sample members, one person joined the SIPP sample in Wave 3, one joined in Wave 4, and another joined in Wave 7. Note that the person who joined the sample in Wave 3 (pre-1996 Panel) was assigned a person number of 301, but an entry address ID of 21 (not 31). That is because the first part of the entry address ID indicates the wave in which that address was first occupied by any SIPP sample member, which is not necessarily the wave in which a given member entered the sample.

1996 Panel
Sample
Unit ID
(SSUID)
Entry
Address ID
(EENTAID)
Person Number
(EPPPNUM)
Notes
123456789123 011 0101 Original sample member
123456789123 011 0102 Original sample member
123456789123 022 0301 Enters SIPP sample in Wave 3
123456789123 011 0401 Enters SIPP sample in Wave 4
123456789123 071 0701 Enters SIPP sample in Wave 7
321456789123 011 0101 Original sample member
321456789123 011 0102 Original sample member
321456789123 011 0103 Original sample member
Prior to the 1996 Panel
Sample
Unit ID
(SUID)
Entry
Address ID
(ENTRY)
Person Number
(PNUM)
Notes
123456789 11 101 Original sample member
123456789 11 102 Original sample member
123456789 21 301 Enters SIPP sample in Wave 3
123456789 11 401 Enters SIPP sample in Wave 4
123456789 71 701 Enters SIPP sample in Wave 7
321456789 11 101 Original sample member
321456789 11 102 Original sample member
321456789 11 103 Original sample member

Identifying Households

The term household, as used in Census Bureau publications, refers to a group of persons who occupy a housing unit. A house, an apartment or other group of rooms, or a single room is regarded as a housing unit if it is occupied or intended for occupancy as separate living quarters. That is, the occupants do not live and eat with any other persons in the structure and there is direct access from the outside or through a common hall. A group of friends sharing an apartment constitutes a household. Noninstitutional group quarters, such as rooming and boarding houses, college dormitories, convents, and monasteries, are classified as group quarters rather than households.

To uniquely identify a household or group quarters in the core wave files, analysts should use the two variables shown in Table 10-4.

Variable Name Description
SSUID (SUID) Sample unit ID
SHHADID (ADDID) Current address ID

People with the same SSUID (SUID) and SHHADID (ADDID) values live in the same household (or group quarters). The six individuals in Table 10-5 make up three households. The first household contains the first four individuals. The second household contains one person. The third household contains one person.

1996 Panel
Sample
Unit ID
(SSUID)
Current
Address ID
(SHHADID)
Person
Number
(EPPPNUM)
Notes
123456789123 071 0101Four persons in this household
123456789123 071 0102
123456789123 071 0401
123456789123 071 0701
321456789123 031 0101 One person in this household
321456789123 032 0102 One person in this household
Prior to the 1996 Panel
Sample
Unit ID
(SUID)
Current
Address ID
(ADDID)
Person
Number
(PNUM)
Notes
123456789 71 101Four persons in this household
123456789 71 102
123456789 71 401
123456789 71 701
321456789 31 101 One person in this household
321456789 32 102 One person in this household

Each household contains one reference person. The household reference person is the person in whose name the home is owned or rented. If the house is owned or rented jointly by more than one person (such as a married couple or some roommate situations), any of those people may be listed as the "reference person." Users may find it helpful to refer to Figure 2-1 (pp. 2-10-2-14), which illustrates the concepts of household and changes in household composition.

Identifying Families

The term family, as used in Census Bureau publications, refers to a group of two or more people related by birth, marriage, or adoption who reside together; all such individuals are considered members of one family.

There are several types of families that the Census Bureau distinguishes:

  • A primary family is a family containing the household reference person and all of his or her relatives. This means that a household composed of a husband and wife, their son, and their son’s wife (i.e., the daughter-in-law) is classified as a primary family containing four people.
  • A related subfamily is a nuclear family that is related to but does not include the household reference person. For example, the son and his wife (i.e., the daughter-in-law) in the preceding example are a related subfamily.
  • An unrelated subfamily (sometimes called a secondary family) is a nuclear family that is not related to the household reference person. Thus, a husband and wife who live in a friend’s house are classified as an unrelated subfamily. A mother and daughter who live in the mother’s boyfriend’s apartment are classified as an unrelated subfamily.
  • A primary individual is a household reference person who lives alone or lives with only nonrelatives. Primary individuals are sometimes treated by the Census Bureau as families with only one person and are referred to as pseudo-families.
  • A secondary individual is not a household reference person and is not related to any other people in the household. Secondary individuals are sometimes treated by the Census Bureau as families with only one person and are referred to as pseudo-families.

To uniquely identify a family, analysts should use the variables shown in Table 10-6.

Table 10-6. Variables Used to Uniquely Identify a Family in the Core Wave Files


Variable Name Description
SSUID (SUID) Sample unit ID
SHHADID (ADDID) Current Address ID
and one of the following:
RFID (FID) Family ID
RFID2 (FID2) Family ID, excluding related subfamily members
RSID (SID) Family ID, for both related and unrelated subfamilies

The Census Bureau has two principal methods for distinguishing families.

  • The first method defines a family as all persons who are related and living together. The family ID variable RFID is used with this definition. RFID groups the household reference person with all related household members by assigning them the same ID number. This family group corresponds to the Census Bureau’s definition of a primary family. RFID groups members of each unrelated subfamily (and primary and secondary individuals) separately.
  • The second method is similar to the first in defining a family, but the family excludes members of related subfamilies. The family ID variable RFID2 is used with this definition. RFID2 equals zero for members of related subfamilies. RFID2 groups members of each unrelated subfamily (and primary and secondary individuals) in the same way as RFID— each group has a unique number.

    Analysts who want to analyze multigenerational families would use RFID2 (FID2) and the variable RSID (SID). RSID (SID) treats related subfamilies as distinct family units by assigning members of related subfamilies nonzero values. Analysts can easily distinguish unrelated subfamilies from other family units when they use these variables and numbering schemes. Table 10-7 illustrates the difference between the RFID (FID), RFID2 (FID2), and RSID (SID) variables. Those variables are set to new numbers in each month. For example, a mother, a father, and a child would be family 1 with RFID (FID) = 1 in month 1, RFID (FID) = 2 in month 2, RFID (FID) = 3 in month 3, and RFID (FID) = 4 in month 4, even though family composition remains the same. The first household in the table contains a primary family of five people. The primary family contains two related subfamilies. RFID (FID) and RFID2 (FID2) mask the fact that there are two related subfamilies; only RSID (SID) provides that information: RSID (SID) has nonzero values for those related subfamilies.

    The second "household" is actually a group of three households, each containing a primary family, that originally formed one household. The third household contains a primary family and two unrelated subfamilies. The fourth household contains a primary individual and an unrelated subfamily. The fifth household contains only a primary individual. The sixth household is a group quarters containing two people.

    The needs of the analysis will help to determine which family classification to use. The following guide may prove helpful:

    • To group people into families in the same way that the Census Bureau does, use SSUID (SUID), SHHADID (ADDID), and RFID (FID).
    • To analyze people in related subfamilies, include only those records with RSID (SID) greater than zero and ESFTYPE (FTYPE) equal to 2.
    • To analyze all families and to keep subfamilies separate from primary families, use SSUID (SUID), SHHADID (ADDID), RFID2 (FID2), and RSID (SID) to uniquely identify each family.

    Table 10-7. Uniquely Identifying Families in the Core Wave Files

  • 1996 Panel
    Sample
    Unit ID
    (SSUID)
    Current
    Address
    ID
    (SHHADID)
    Person
    Number
    (EPPPNUM)
    Family
    ID,
    Including
    Related
    Subfamily
    (RFID)
    Family
    ID,
    Excluding
    Related
    Subfamily
    (RFID2)
    Related
    Subfamily
    ID
    (RSID)
    Family
    Type
    (EFTYPE)a
    Related
    Subfamily
    Type
    (ESFTYPE)
    Notes
    110011111123 011 0101 1 1 0 1 0This household contains a primary family of five people. The primary family contains two subfamilies.
    110011111123 011 0102 1 0 2 1 2
    110011111123 011 0103 1 0 2 1 2
    110011111123 011 0104 1 0 3 1 2
    110011111123 011 0105 1 0 3 1 2
    110077777723 011 0101 1 1 0 1 0Three house holds formed by people who were originally members of the same originally sampled household (SSUID of 110077777723). Two sub- families split off from the original house hold to become two new primary families at addresses 21 and 22.
    110077777723 021 0102 1 1 0 1 0
    110077777723 021 0103 1 1 0 1 0
    110077777723 022 0104 1 1 0 1 0
    110077777723 022 0105 1 1 0 1 0
    122210000123 011 0101 1 1 0 1 0This household contains a primary family and two unrelated subfamilies.
    122210000123 011 0104 1 1 0 1 0
    122210000123 011 0305 2 2 0 3 0
    122210000123 011 0306 2 2 0 3 0
    122210000123 011 0307 3 3 0 3 0
    122210000123 011 0308 3 3 0 3 0
    555555555123 021 0101 1 1 0 4 0This household contains a primary individual and an unrelated subfamily.
    555555555123 021 0201 2 2 0 3 0
    555555555123 021 0202 2 2 0 3 0
    555555555123 021 0203 2 2 0 3 0
    610000000123 032 0101 1 1 0 4 0 Primary individual.
    897454644123 011 0101 1 1 0 5 0Group quarters with two secondary individuals.
    897454644123 011 0102 2 2 0 5 0
    (table continues)
    Table 10-7. Uniquely Identifying Families in the Core Wave Files (continued)
    .
    Pre-1996 Panel
    Sample Unit ID (SUID) Current Address ID (ADDID) Person Number (PNUM) Family ID, Including Related Subfamily (FID) Family ID, Excluding Related Subfamily (FID2) Related Subfamily ID (SID) Family Type (FAMTYP)b Related Subfamily Type (ESFTYPE) Notes
    110011111 11 101 1 1 0 1 This household contains a primary family of five people. The primary family contains two subfamilies.
    110011111 11 102 1 0 2 1 
    110011111 11 103 1 0 2 1 
    110011111 11 104 1 0 3 1 
    110011111 11 105 1 0 3 1 
    110077777 011 101 1 1 0 1 0Three households formed by people who were originally members of the same originally sampled household (SUID of 110077777). Two subfamilies split off from the original household to become two new primary families at addresses 21 and 22.
    110077777 021 102 1 1 0 1 0
    110077777 021 103 1 1 0 1 0
    110077777 022 104 1 1 0 1 0
    110077777 022 105 1 1 0 1 0
    122210000 33 101 1 1 0 1 This household contains a primary family and two unrelated subfamilies.
    122210000 33 104 1 1 0 1 
    122210000 33 305 2 2 0 3 
    122210000 33 306 2 2 0 3 
    122210000 33 307 3 3 0 3 
    122210000 33 308 3 3 0 3 
    555555555 21 101 1 1 0 4 This household contains a primary individual and an unrelated subfamily.
    555555555 21 201 2 2 0 3 
    555555555 21 202 2 2 0 3 
    555555555 21 203 2 2 0 3 
    610000000 11 101 1 1 0 4  Primary individual
    897454644 11 101 1 1 0 5 Group quarters with two secondary individuals.
    897454644 11 102 2 2 0 5 

    __________
    a EFTYPE = 1 means the person belongs to a primary family (including related subfamily members). EFTYPE = 3 means the person belongs to an unrelated subfamily. EFTYPE = 4 means the person is a primary individual. EFTYPE = 5 means the person is a secondary individual.

    b FAMTYP = 1 means the person belongs to a primary family (including related subfamily members). FAMTYP = 3 means the person belongs to an unrelated subfamily. FAMTYP = 4 means the person is a primary individual. FAMTYP = 5 means the person is a secondary individual.

    __________
    3 Although an attempt was made in the 1996 Panel to give all variables meaningful names, the eight-character limitation imposed by many software packages places severe constraints on the degree to which this can be done. Prior to the 1996 Panel, the situation was more pronounced since numeric sequencing was used to name variables (e.g., in the paper survey, SE22318 is the variable that indicates the total number of employees working for the second business; in CAI, that variable is TEMPB2). In the 1996 Panel, variable names beginning with a "T" have been topcoded to protect respondent confidentiality.
    4 The universe definitions included in the data dictionaries prior to the 1996 Panel were not always accurate. Users of pre-1996 SIPP Panels should check the skip patterns in the actual survey questionnaire to determine which subset of respondents was asked each question.
    5 Prior to the 1990 Panel, core wave files had one record per person. Each record contained four occurrences of each monthly variable. For more information, see earlier editions of the SIPP Users’ Guide.
    6 The SSUID (SUID) is a random recode of three other variables in the Census Bureau’s internal (not public use) files: the respondent’s sampling area (PSU), the cluster of housing units within that area (called the "segment"), and a sequentially assigned serial number. Those variables are omitted from the public use files to protect the confidentiality of the respondents.
    7 There is one rare exception to this rule for Panels prior to 1996, which is described in the section entitled "Identifying Movers" later in this chapter.
    8 See footnote 6.
    9 See footnote 6.
    10 For Wave 10 of the 1992 Panel and for the 1996 Panel, the first two digits of PNUM instead of the first digit identify the wave in which the person entered sample.

       SIPP Public Use Files
       Using the Core Wave Files
       Using Topical Module Files
       Using the Full Panel Longitudinal Research Files
       Linking Core Wave, Topical Module and Full Panel Files
    end of content rule Skip bottom navigation groups

     |  Main |  Introduction to SIPP |  SIPP Survey Content |  Technical Information |  Using & Linking Files |  SIPP Publications | 
     |  Access SIPP Data |  SIPP Users' Guide |  SIPP Tutorial |  User Notes/ListServe/News |  SIPP Help | 


    Page Last Modified: May 9, 2006


      Skip this navigation