Source of Data.
The data were collected during the first seven waves of the 1992 panel of the Survey of Income and Program Participation (SIPP). The SIPP universe is the noninstitutionalized resident population living in the United States. This population includes persons living in group quarters, such as dormitories, rooming houses, and religious group dwellings. Crew members of merchant vessels, Armed Forces personnel living in military barracks, and institutionalized persons, such as correctional facility inmates and nursing home residents, were not eligible to be in the survey. Also, United States citizens residing abroad were not eligible to be in the survey. Foreign visitors who work or attend school in this country and their families were eligible; all others were not eligible to be in the survey. With the exceptions noted above, persons who were at least 15 years of age at the time of the interview were eligible to be in the survey.
The 1992 SIPP panel sample is located in 284 Primary Sampling Units (PSUs) each consisting of a county or a group of contiguous counties. Within these PSUs, expected clusters of 2 or 4 living quarters (LQs) were systematically selected from lists of addresses prepared for the 1980 decennial census to form the bulk of the sample. To account for LQs built within each of the sample areas after the 1980 census, a sample was drawn of permits issued for construction of residential LQs up until shortly before the beginning of the panel. In jurisdictions that do not issue building permits, small land areas were sampled and the LQs within were listed by field personnel and then subsampled. In addition, sample LQs were selected from supplemental frames that included LQs identified as missed in the 1980 census and group quarters.
At the time of the initial visit, the occupants of about 19,600 living quarters were interviewed. This accounts for approximately 72% of the living quarters originally designated for sample. Approximately 21% of the designated living quarters were found to be vacant, demolished, converted to nonresidential use, or otherwise ineligible for the survey. The remainder, approximately 2000 living quarters, were not interviewed because the occupants refused to be interviewed, could not be found at home, were temporarily absent, or were otherwise unavailable. Thus, occupants of about 91% of all eligible living quarters participated in the first interview of the survey.
For later interviews, only original sample persons (those in Wave 1 sample households and interviewed in Wave 1) and persons living with them were eligible to be interviewed. With certain restrictions, original sample persons were to be followed even if they moved to a new address. When original sample persons moved without leaving a forwarding address or moved to extremely remote parts of the country and no telephone number was available, additional noninterviews resulted.
Sample households within the panel are divided into four subsamples of nearly equal size. These subsamples are called rotation groups 1, 2, 3, or 4 and one rotation group is interviewed each month. Each household in the sample was scheduled to be interviewed at 4 month intervals over a period of roughly 2 2/3 years beginning in February 1992. The reference period for the questions is the 4-month period preceding the interview month. In general, one cycle of four interviews covering the entire sample, using the same questionnaire, is called a wave.
The period covered by the 1992 7 wave longitudinal file consists of 28 interview months (seven interviews) conducted from February 1992 to May 1994. Data for up to 28 reference months are available for persons on the file. Specific months available depend on the person's rotation group and his/her sample entry or exit date. However, data from all four rotation groups (i.e., the full sample) are available only for reference months January 1992 through January 1994, inclusive. Also note that the availability of data on household composition begins with the first interview month of a rotation group.
Table 1 indicates the reference months and interview months for the collection of data from each rotation group of the 1992 7 wave longitudinal file. For example, rotation group 2 was first interviewed in February 1992 and data for the reference months October 1991 through January 1992 were collected. This rotation group was interviewed for the seventh time in February 1994 to collect data for October 1993 through January 1994. Table 1 also shows that 1992 calendar year (92CY) data were collected in interview months February 1992 to April 1993 and that 1993 calendar year (93CY) data were collected exactly one year later. Data from all four rotation groups are available for each reference month of the 1992 and 1993 calendar years.
In the 1984-1990 panels, the longitudinal weighting process treated persons with at least one missing interview as noninterviewed and assigned them zero weights. This procedure resulted in the loss of a large amount of collected survey data. To increase the reliability of longitudinal estimates and make more use of collected data, we introduced a "missing wave imputation" procedure.
The 1992 panel is the second panel to benefit from the new imputation procedure. We now impute missing wave data for persons who miss an interview (wave) and have completed interviews before and after the missing wave. For example, persons who were not interviewed in wave 3 but interviewed in waves 2 and 4 will have their wave 3 data imputed based on waves 2 and 4. There is an imputation flag field on the 1992 7 wave longitudinal panel file named "WAVFLG" to identify the noninterview cases that were imputed.
For panel, 92CY, and 93CY weighting procedures, a person was classified as interviewed or noninterviewed based on the following definitions. (NOTE: A person may be classified differently for calculating different weights). Interviewed sample persons (including children) were defined to be:
1) those for whom self, proxy, or imputed responses were obtained for each month of the appropriate longitudinal period, or
2) those for whom self or proxy responses were obtained for the first month of the appropriate longitudinal period and self, proxy, or imputed responses exist for each subsequent month until they were known to have died or moved to an ineligible address (foreign living quarters, institutions, or military barracks).
The months for which persons were deceased or residing in an ineligible address were identified on the file. Noninterviewed persons were defined to be those for whom neither self nor proxy responses were obtained for one or more months of the appropriate longitudinal period (excluding imputed persons and persons who died or moved to an ineligible address).
It is estimated that roughly 56,300 persons were initially designated in the sample. Approximately 51,100 persons were interviewed in wave 1; while the balance, residing in the 2000 living quarters not interviewed at wave 1 remained anonymous and became the initial source of person nonresponse in the weighting procedures. For the panel and 92CY weighting procedures, the eligible sample is considered to be all persons initially designated for sample. In the panel weighting procedure, approximately 42,000 persons were classified as interviewed with a person nonresponse rate of 25%. The 92CY weighting procedure classified about 45,900 persons as interviewed and had a person nonresponse rate of 18%. The longitudinal file contains approximately 59,700 persons in all. This includes the wave 1 interviewed persons and about 8,600 persons who entered survey households during the panel through births, marriages, and other reasons. Approximately one-half of the newcomers were considered eligible for the 93CY weighting procedure; increasing the eligible sample size to roughly 55,400 persons. The 93CY weighting procedure classified about 43,600 persons as interviewed with a person nonresponse rate of 28%. Some respondents did not respond to some of the questions; therefore, item nonresponse rates, especially for sensitive income and money related items, are higher than the person nonresponse rates given above.
In the estimation procedure described below, all persons classified as interviewed for a given longitudinal period, i.e., panel, 92CY, or 93CY, are assigned positive weights for that period, while those classified as noninterviewed are assigned zero weights.
Essentially the same estimation procedure was used to derive each of the three sets of SIPP longitudinal person weights. Several stages of weight adjustments were involved. Each person received a base weight equal to the inverse of his/her probability of selection. Two noninterview adjustment factors were applied. One adjusted the weights of interviewed persons in interviewed households to account for persons who were eligible for the sample but could not be interviewed at the first interview. The second was applied to compensate for person noninterviews occurring in subsequent interviews.
An additional stage of adjustment to longitudinal person weights was performed to reduce the mean square error of the survey estimates. This was accomplished by bringing the sample estimates into agreement with monthly Current Population Survey (CPS) type estimates of the civilian (and some military) noninstitutional population of the United States by age, sex, race, Hispanic ethnicity, and householder/not householder status as of the specified control date. The control dates for the panel, 92CY, and 93CY weights were March 1, 1992, January 1, 1992, and January 1, 1993, respectively. The CPS estimates were themselves brought into agreement with estimates from the 1990 decennial census which have been adjusted for undercount and to reflect births, deaths, immigration, emigration, and changes in the Armed Forces since 1990.
Users should be forewarned to apply the appropriate weights given on this file before attempting to calculate estimates. The weights vary between units due to weighting adjustments, and following movers. If analysis is done for the general population without applying the appropriate weights, the results will be erroneous. Each person on the 1992 7 wave longitudinal file has three longitudinal person weights (some of which may be zero) for estimation of panel, 92CY and 93CY person characteristics and two longitudinal household factors to be used only for exploratory estimates of household and family characteristics. We strongly recommend that all nonexploratory analysis be confined to person analysis using the longitudinal person weights. For example, using 92CY person weights, one can estimate the number of persons receiving food stamps from January through March of 1992. Also, we recommend the use of longitudinal person weights for person characteristics based on household attributes. For example, using panel person weights, one can estimate the number of persons living in households which received food stamps during the period covered by the 1992 panel.
This file was created for purposes of survey research and evaluation, and the Bureau of the Census will continue to examine the data, correcting and improving the computer processing and estimation procedures where appropriate. We welcome and appreciate any research on your part that will help us achieve this goal.
All estimates may be divided into two broad categories: longitudinal and cross-sectional. Longitudinal estimates require that data records for each person be linked across interviews, where as cross-sectional estimates do not. For example, annual income estimates obtained by summing the 12 monthly income amounts for each person would require linking records and so would be longitudinal estimates. Because there is no linkage between interviews, cross-sectional estimates can combine data from different interviews only at the aggregate level. Longitudinal person weights were developed for longitudinal estimation, but may be used for cross-sectional estimation as well. However, note that wave files with cross-sectional weights are also produced for the SIPP. Because of the larger sample size available on the wave files, it is recommended that these files be used for cross-sectional estimation, if possible.
In this section it is assumed that all four rotation groups are used for estimation. If an estimate covers a time period for which data from some rotation groups are unavailable, refer to the section "Adjusting Estimates Which Use Less Than the Full Sample."
Some basic types of longitudinal and cross-sectional estimates which can be constructed using longitudinal person weights are described below in terms of estimated numbers. Of course, more complex estimates, such as percents, averages, ratios, etc., can be constructed from the estimated numbers. Longitudinal person weights can be used to construct the following types of longitudinal estimates:
1. The number of persons who have ever experienced a characteristic during a given time period.
To construct such an estimate, use the longitudinal person weight (panel, 92CY, or 93CY) for the shortest time period which covers the time period of interest, summing the weights over all persons who possessed the characteristic of interest at some point during the time period of interest. For example, to estimate the number of persons who ever received food stamps during the last six months of 1992 use the 92CY longitudinal person weight.
2. The amount of a characteristic accumulated by persons during a given time period.
To construct such an estimate, use the longitudinal person weight for the shortest time period which covers the time period of interest. Then compute the product of the weight times the amount of the characteristic and sum this product over all appropriate persons. For example, to estimate the aggregate 1992 annual income of persons who were employed during all 12 months of the year use the 92CY longitudinal person weight.
3. The average number of consecutive months of possession of a characteristic (i.e., the average spell length for a characteristic) during a given time period.
For example, one could estimate the average length of each spell of receiving food stamps during 1992. Also, one could estimate the average spell of unemployment that elapsed before a person found a new job. To construct such an estimate, first identify the persons who possessed the characteristic at some point during the time period of interest. Then, create two sums of these person's appropriate longitudinal weights: (1) sum the product of the weight times the number of months the spell lasted and (2) sum the weights only. Now, the estimated average spell length in months is given by (1) divided by (2). A person who experienced two spells during the time period of interest would be treated as two persons and appear twice in sums (1) and (2). An alternate method of calculating the average can be found in the section "Standard Error of a Mean or Aggregate."
4. The number of month-to-month changes in the status of a characteristic (i.e., number of transitions) summed over every set of two consecutive months during the time period of interest. To construct such an estimate, sum the appropriate longitudinal person weight each time a change is reported between two consecutive months during the time period of interest. For example, to estimate the number of persons who changed from receiving food stamps in July 1992 to not receiving in August 1992 add together the 92CY longitudinal person weights of each person who had such a change. To estimate the number of changes in monthly salary income during the third quarter of 1992 sum together the estimate of number of persons who made a change between July 1 and August 1, between August 1 and September 1, and between September 1 and October 1.
Note that spell and transition estimates should be used with caution because of the biases that are associated with them. Sample persons tend to report the same status of a characteristic for all four months of a reference period. This tendency results in a bias toward reported spell lengths that are multiples of four months. This tendency also affects transition estimates in that, for many characteristics, the number of characteristics, the number of month-to-month transitions reported between the last month of one reference period and the first month of the next reference period are much greater than the number of reported transitions between any two months within a reference period. Additionally, spells extending before or after the time period of interest are cut off (censored) at the boundaries of the time period. If they are used in estimating average spell length, a downward bias will result.
Also using longitudinal person weights one can construct the following type of cross-sectional estimate:
5. Monthly estimates of a characteristic averaged over a number of consecutive months.
For example, one could estimate the monthly average number of food stamp recipients over the months July through December 1992. To construct such an estimate, first form an estimate for each month in the time period of interest. Use the longitudinal 92CY person weight, summing over all persons who possessed the characteristic of interest during the month of interest. Then, sum the monthly estimates and divide by the number of months.
Estimation of Household Characteristics. The Census Bureau has not developed household and family weights for longitudinal analysis. However, to facilitate exploratory research based upon the Census Bureau's provisional longitudinal household definition, two different longitudinal household weights, termed adjustment factor 1 and adjustment factor 2, were created for each longitudinal household each month. These factors were then assigned to every member of the longitudinal household each month. The primary difference between the factors is that for married-couple households adjustment factor 1 was derived jointly from the panel longitudinal person weights of the householder and spouse, while adjustment factor 2 was derived solely from the panel longitudinal person weight of the householder.
For each month, five data fields are included on the longitudinal panel file to facilitate creation of household level estimates: (1) current household type, (2) key person, (3) other household member, (4) adjustment factor 1, (5) adjustment factor 2. Definitions of fields (1) through (3) as well as the provisional definitions of longitudinal household, original household, and successor household are provided below. In this section "month" refers to reference month unless stated otherwise.
LONGITUDINAL HOUSEHOLD: A longitudinal household is a household which exists during at least one month, but which may continue to exist for more than one month. A longitudinal household continues from one month to the next, if it has the same householder (and spouse, if present in the household), and if it is the same household type, where household type is defined below.
CURRENT HOUSEHOLD TYPE: Households are classified by type in the current month where household types are: (1) married-couple household, (2) other family household, male householder, (3) other family household, female householder, (4) non-family household, male householder, (5) non-family household, female householder.
ORIGINAL HOUSEHOLD: A household existing at the beginning of the survey, i.e., a household which exists during the first interview month of the rotation group.
SUCCESSOR HOUSEHOLD: A household which is not an original household but which does exist during at least one month as an off-shoot of an original household. A successor household must exist during at least one month succeeding the first interview month of the rotation group, and must have a key person (see definition below) who was a member of an original household.
KEY PERSON: In married-couple longitudinal households both the householder and the householder's spouse are key persons. In all other types of longitudinal households, there is only one key person - the householder. In married-couple households at least one key person must have entered the sample at Wave 1. In all other household types, the key person must have entered the sample at Wave 1.
OTHER HOUSEHOLD MEMBER: A person who, during a specific month, is a member of a longitudinal household but is not a key person.
Adjustment factors 1 and 2 are presented in figure 1. In examining figure 1, keep the following principles in mind: Adjustment factors 1 and 2 are always derived from the panel longitudinal person weight(s) of an original householder (and/or key person). For every successor household, where the current month householder (and/or spouse) was a member of an original household, it is the householder (and/or spouse) of the original household who supplies the panel longitudinal person weight from which the adjustment factors are derived.
|HHer entered sample in Wave 1||HHer entered sample in Wave 2+||HHer entered sample in Wave 1||HHer entered sample in Wave 2+|
|Other KP entered sample in Wave 1||Other KP entered sample in Wave 2+||Other KP entered sample in Wave 1||Other KP entered sample in Wave 2+|
|AF1||mean LPW of two key persons||LPW of HHer||first monthly value of AF1||½ first monthly value of AF1||½ first monthly value of AF1||Zero1||first monthly value of AF1||Zero1|
|AF2||LPW of HHer||LPW of HHer||first monthly value of AF2||first monthly value of AF2||Zero||Zero1||first monthly value of AF2||Zero1|
AF1 = Adjustment factor 1;
AF2 = Adjustment factor 2;
LPW = Panel longitudinal person weight;
Wave 2+ = Wave 2 or later wave
HHer = Current month householder;
KP = Current month key person
Note: The situation where a successor household is formed by the merging of two Wave 1 households is not covered in figure 1. Original sample persons who move into another sample household cannot be linked to their original household and so are treated as if they entered the sample in Wave 2+.
Use of Household Weights. Adjustment factor 1, adjustment factor 2, and the related data fields are intended to provide the basis for exploratory household and family estimates. For example, by using adjustment factor fields for key persons (in married couple households, one key person must be selected) with additional variables, estimates pertaining to longitudinal households can be derived for statements equivalent to the following: "During the period from month 'A' to month 'B', there were 'C' households with characteristics 'D'." An example of such a statement would be: "During the period from January to December 1992, there were 'C' households which received food stamps for 10 or more months." All such estimates should be considered exploratory, because the adjustment factors do not explicitly take into account several possible sources of bias, including differential attrition from the sample, with the result that the estimates may, even as national estimates, be subject to substantial bias. The purpose of including these data fields on the longitudinal panel file is to facilitate analyses that may be useful in developing improved longitudinal household weights. Although the exploratory adjustment factors may be useful for other purposes, the Census Bureau intends that these factors be used for only this one purpose.
Exploratory household (family) estimates can be formed using either adjustment factor 1 or adjustment factor 2. At present, there is insufficient evidence to recommend one factor over the other in any given situation. To form exploratory household (family) estimates, use the adjustment factor deemed appropriate, summing over all households (families) possessing the characteristic of interest. Note that both adjustment factors for a household will remain the same for each month the household exists. Therefore, the appropriate adjustment factor for a household can be taken from any month of a household's existence. Also, note that the adjustment factors assigned to each member of a household actually apply to the entire household. As an example of the use of these adjustment factors, suppose one had an independent estimate of the number of households which received food stamps for 10 months or more during 1992 and wanted to compare it to the SIPP estimate. To construct the SIPP estimate, first, using appropriate data fields (e.g., current household type, key person), identify all households which existed for exactly 10, 11, and 12 months during 1992; then sum adjustment factor 1 or adjustment factor 2 over all of the identified households which received food stamps for the appropriate time period.
Adjusting Estimates Which Use Less Than the Full Sample. All four rotation groups of data are not available for reference months October through December 1991 and February through April 1994 (see table 1). If the time period of interest for a given estimate (of person or household characteristics) includes these months, the estimate may need to be adjusted in some way to account for the missing rotation groups. For longitudinal estimates (types 1-4) this adjustment factor equals four divided by the number of rotation groups contributing data. For example, if the time period of interest for a given estimate is December 1991, then data will be available only from rotation groups 2, 3, and 4. Therefore, a factor of 4/3 = 1.3333 will be applied. To estimate the number of persons ever unemployed in the fourth quarter of 1991, only data from rotation group 2 are available. Thus, a factor of 4/1 = 4 will be applied.
Note that, if the given estimate is an average of monthly estimates (estimate type 5), then the number of rotation groups and the factor used will be determined independently for each month in the average and the adjusted monthly estimates will be averaged together in the usual way. For example, to estimate the average number of persons unemployed per month in the fourth quarter of 1991, the October, November, and December data will be multiplied by 4/1, 4/2, and 4/3 respectively before being summed together and divided by three.
ACCURACY OF ESTIMATES
SIPP estimates are based on a sample; they may differ somewhat from the figures that would have been obtained if a complete census had been taken using the same questionnaire, instructions, and enumerators. There are two types of errors possible in an estimate based on a sample survey: nonsampling and sampling. We are able to provide estimates of the magnitude of SIPP sampling error, but this is not true of nonsampling error. Found in the next sections are descriptions of sources of SIPP nonsampling error, followed by a discussion of sampling error, its estimation, and its use in data analysis.
Note that estimates from this sample for individual states are subject to very high sampling errors and are not recommended. The state codes on the file are primarily of use for linking respondent characteristics with appropriate contextual variables (e.g., state-specific welfare criteria) and for tabulating data by user-defined groupings of states.
Nonsampling Errors. Nonsampling errors can be attributed to many sources, e.g., inability to obtain information about all cases in the sample; definitional difficulties; differences in the interpretation of questions; inability or unwillingness on the part of the respondents to provide correct information; inability to recall information, errors made in the following: collection such as in recording or coding the data, processing the data, estimating values for missing data; biases resulting from the differing recall periods caused by the rotation pattern used; and undercoverage. Quality control and edit procedures were used to reduce errors made by respondents, coders and interviewers. More detailed discussions of the existence and control of nonsampling errors in the SIPP can be found in the SIPP Quality Profile.
Undercoverage in SIPP results from missed living quarters and missed persons within sample households. It is known that undercoverage varies with age, race, and sex. Generally, undercoverage is larger for males than for females and larger for Blacks than for Nonblacks. Ratio estimation to independent age-race-sex population controls partially corrects for the bias due to survey undercoverage. However, biases exist in the estimates to the extent that persons in missed households or missed persons in interviewed households have characteristics different from those of interviewed persons in the same age-race-sex group. Further, the independent population controls used have not been adjusted for undercoverage in the decennial census. The Bureau has used complex techniques to adjust the weights for nonresponse. For an explanation of the techniques used, see the Nonresponse Adjustment Methods for Demographic Surveys at the U.S. Bureau of the Census, November 1988, Working paper 8823, by R. Singh and R. Petroni. An example of successfully avoiding bias can be found in "Current Nonresponse Research for the Survey of Income and Program Participation" (paper by Petroni, presented at the Second International Workshop on Household Survey Nonresponse, October 1991).
Comparability with Other Estimates. Caution should be exercised when comparing data from this file with data from other SIPP publications or with data from other surveys. The comparability problems are caused by such sources as the seasonal patterns for many characteristics, different nonsampling errors, and different concepts and procedures. Refer to the SIPP Quality Profile for known differences with data from other sources and further discussion.
Sampling Variability. Standard errors indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration, but do not measure any systematic biases in the data. The standard errors for the most part measure the variations that occurred by chance because a sample rather than the entire population was surveyed.
USES AND COMPUTATION OF STANDARD ERRORS
Confidence Intervals. The sample estimate and its standard error enable one to construct confidence intervals, ranges that would include the average result of all possible samples with a known probability. For example, if all possible samples were selected, each of these being surveyed under essentially the same conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then:
1. Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.
2. Approximately 95 percent of the intervals from 1.960 standard errors below the estimate to 1.960 standard errors above the estimate would include the average result of all possible samples.
The average estimate derived from all possible samples is or is not contained in any particular computed interval. However, for a particular sample, one can say with a specified confidence that the average estimate derived from all possible samples is included in the confidence interval.
Hypothesis Testing. Standard errors may also be used for hypothesis testing, a procedure for distinguishing between population characteristics using sample estimates. The most common types of hypotheses tested are 1) the population characteristics are identical versus 2) they are different. Tests may be performed at various levels of significance, where a level of significance is the probability of concluding that the characteristics are different when, in fact, they are identical.
To perform the most common test, compute the difference XA - XB, where XA and XB are sample estimates of the characteristics of interest. A later section explains how to derive an estimate of the standard error of the difference XA - XB. Let that standard error be sDIFF. If XA - XB is between -1.645 times sDIFF and +1.645 times sDIFF, no conclusion about the characteristics is justified at the 10 percent significance level. If, on the other hand, XA - XB is smaller than -1.645 times sDIFF or larger than +1.645 times sDIFF, the observed difference is significant at the 10 percent level. In this event, it is commonly accepted practice to say that the characteristics are different. We recommend that users report only those differences that are significant at the 10 percent level or better. Of course, sometimes this conclusion will be wrong. When the characteristics are, in fact, the same, there is a 10 percent chance of concluding that they are different.
Note that as more tests are performed, more erroneous significant differences will occur. For example, at the 10 percent significance level, if 100 independent hypothesis tests are performed in which there are no real differences, it is likely that about 10 erroneous differences will occur. Therefore, the significance of any single test should be interpreted cautiously.
Note Concerning Small Estimates and Small Differences. Because of the large standard errors involved, there is little chance that estimates will reveal useful information when computed on a base smaller than 200,000. Also, nonsampling error in one or more of the small number of cases providing the estimate can cause large relative error in that particular estimate. Therefore, care must be taken in the interpretation of small differences since even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.
Standard Error Parameters. Most SIPP estimates have greater standard errors than those obtained through a simple random sample because clusters of living quarters are sampled for the SIPP. To derive standard errors that would be applicable to a wide variety of estimates and could be prepared at a moderate cost, a number of approximations were required. Estimates with similar standard error behavior were grouped together and two parameters (denoted "a" and "b") were developed to approximate the standard error behavior of each group of estimates. Because the actual standard error behavior was not identical for all estimates within a group, the standard errors computed from these parameters provide an indication of the order of magnitude of the standard error for any specific estimate. These "a" and "b" parameters vary by characteristic and by demographic subgroup to which the estimate applies.
Computation of Standard Error Parameters. In this section we discuss the adjustment of base "a" and "b" parameters to provide "a" and "b" parameters appropriate for each type of longitudinal and cross-sectional estimate described in the section "Use of Person Weights." Later sections will discuss the use of the adjusted parameters in various formulas to compute standard errors of estimated numbers, percents, averages, etc. Tables 4, 5 and 6 provide the base "a" and "b" parameters needed to compute the approximate standard errors for estimates using panel, 92CY, and 93CY weights, respectively. (Users should be aware that these parameters are preliminary and may be revised in the future.) Table 7 provides additional factors to be used for averages of monthly cross-sectional estimates. These factors are needed for two reasons: the monthly estimates are correlated and averaging over a greater number of monthly estimates will produce an average with a smaller standard error. Table 8 gives correlations between quarterly and yearly averages of cross-sectional estimates. These correlations are used in the formula for the standard error of a difference (formula (11)). If household estimates have been produced using the adjustment factor 1 or adjustment factor 2, then follow the procedures described below, but use the household "a" and "b" parameters in table 4.
The creation of appropriate "a" and "b" parameters for the previously discussed types of estimates are described below. Again, it is assumed that all four rotation groups are used in estimation. If not, refer to the section "Adjusting Standard Errors of Estimates Which Use Less Than the Full Sample."
1. The number of persons who have ever experienced a characteristic during a given time period.
The appropriate "a" and "b" parameters are taken directly from table 4, 5 or 6. The choice of parameter depends on whether panel, 92CY, or 93CY weights were used, on the characteristic of interest, and on the demographic subgroup of interest.
2. Amount of a characteristic accumulated by persons during a given time period.
The appropriate "b" parameters are also taken directly from table 4, 5 or 6.
3. The average number of consecutive months of possession of a characteristic per spell (i.e., the average spell length for a characteristic) during a given time period.
Start with the appropriate base "a" and "b" parameters from table 4, 5 or 6. The parameters are then inflated by an additional factor, g, to account for persons who experience multiple spells during the time period of interest. This factor is computed by:
where there are n persons with at least one spell and mi is the number of spells experienced by person i during the time period of interest.
4. The number of month-to-month changes in the status of a characteristic (i.e., number of transitions) summed over every set of two consecutive months during the time period of interest.
Obtain a set of adjusted "a" and "b" parameters exactly as just described in 3, then multiply these parameters by an additional factor. Use 1.0000 if the time period of interest is two months and 2.0000 for a longer time period. (The factor of 2.0000 is based on the conservative assumption that each spell produces two transitions within the time period of interest.)
5. Monthly estimates of a characteristic averaged over a number of consecutive months.
Appropriate base "a" and "b" parameters are taken from table 4, 5 or 6. If more than one longitudinal weight has been used in the monthly average, then there is a choice of parameters from tables 4, 5 and 6. Choose the table which gives the largest parameter. Next multiply the base "a" and "b" parameters by the factor from table 7 corresponding to the number of months in the average.
Adjusting Standard Error Parameters for Estimates which Use Less Than the Full Sample. If some rotation groups are unavailable to contribute data to a given estimate, then the estimate and its standard error need to be adjusted. The adjustment of the estimate is described in a previous section. The standard error of a longitudinal estimates (types 1-4) is adjusted by multiplying the appropriate "a" and "b" parameters by a factor equal to four divided by the number of rotation groups contributing data to the estimate. Note that the parameters for the standard error of an average must still be adjusted according to this rule, even though the average itself is unaffected by the adjustment for missing rotation groups.
For the standard error of cross-sectional estimates which cover only one month, the factor can be computed as just described or it can be taken from table 3 where the factor is given for each single reference month, October 1991 to April 1994. For the standard error of quarterly averages of monthly estimates which use less than the full sample, special factors are used, also given in table 3 for the fourth quarter of 1991 to the first quarter of 1994.
As an example, suppose we want a standard error for the estimated number of females who have ever received food stamps during the fourth quarter of 1991. The appropriate "a" and "b" parameters are -0.0002109 and 18,863, respectively, (from table 4). Because only one rotation group is available for this estimate (see table 1), a factor of 4/1 = 4.000 would be applied to obtain final "a" and "b" parameters of -0.0008436 and 75,452, respectively. Suppose that instead, we were interested in the cross-sectional estimate of the average monthly number of female food stamp recipients for the fourth quarter of 1991. In that case a factor of 1.8519 (from table 3) would be applied to obtain final "a" and "b" parameters of -0.0003906 and 34,932, respectively. Note that only panel "a" and "b" parameters will be affected by this adjustment; no such adjustment is ever needed for 92CY and 93CY parameters since the full sample is available for all months in calendar years 1992 and 1993.
Standard Errors of Estimated Numbers. The approximate standard error of an estimated number can be obtained by using formula (2):
Here x is the estimated number and "a" and "b" are the parameters associated with the particular type of characteristic for the appropriate longitudinal time period, i.e., panel, 92CY, or 93CY.
Illustration. Suppose the SIPP estimate of the number of persons ever receiving Social Security during the first three months of 1992 is 34,122,000. (This estimate is obtained using the 92CY weights.) The appropriate "a" and "b" parameters to use in calculating a standard error for the estimate are obtained from table 5. They are a = -0.0001025, b = 17,457, respectively. Using formula (2), the approximate standard error is
The 90-percent confidence interval as shown by the data is from 32,986,950 to 35,257,050. Therefore, a conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all samples. Similarly, the 95-percent confidence interval as shown by the data is from 32,796,600 to 35,474,400 and we could conclude that the average estimate derived from all possible samples lies within this interval.
Standard Error of a Mean or Aggregate. A mean is defined here to be the average quantity of some characteristic (other than the number of persons, families, or households) per person, family, or household. An aggregate is defined to be the total quantity of some characteristic summed over all units in a subpopulation. For example, a mean could be the average annual income of females age 25 to 34; an aggregate, the total annual income for that subpopulation. The standard error of a mean can be approximated by formula (3) below and the standard error of an aggregate can be approximated by formula (4). Because of the approximations used in developing formulas (3) and (4), an estimate of the standard error of the mean or aggregate obtained from these formulas will generally underestimate the true standard error.
The formula used to estimate the standard error of a mean, , is
where y is the base, s2 is the estimated population variance of the characteristic and b is the "b" parameter associated with the particular type of characteristic. The standard error of an aggregate k is estimated by:
The population variance, s2, may be estimated by one of two methods: the first method uses data that has been grouped into intervals, the second method uses ungrouped data. The second method is recommended because it is more precise. However, the first method will be easier to implement if grouped data is already being used as part of the analysis. In both methods it is assumed xi is the value of the characteristic for person i.
To use the first method, the range of values for the characteristic is divided into c intervals, where the lower and upper boundaries of interval j are Zj-1 and Zj, respectively. Each person is placed into one of the c groups such that the value of the characteristic is between Zj-1 and Zj. The estimated population variance, s2, is then given by:
where pj is the estimated proportion of persons in group j (based on weighted data), and mj = (Zj-1 + Zj) / 2. The most representative value of the characteristic in group j is assumed to be mj. If group c is open-ended, i.e., no upper interval boundary exists, then an approximate value for mc is
The mean, , can be obtained using the following formula:
In the second method, the estimated population variance is given by
where there are n sample persons with the characteristic of interest and wi is the final weight for person i (note that ). The mean, , can be obtained from the formula
Illustration of Method 1. Suppose that the 1992 distribution of annual incomes is given in table 2 for persons aged 25 to 34 who were employed for all 12 months of 1992.
The mean annual cash income from formula (6) is
Using formula (5) and the mean annual cash income of $26,717 the estimated population variance, s2, is
The appropriate "b" parameter from table 5 is 5,951. Now, using
formula (3), the estimated standard error of the mean is
Illustration of Method 2. Suppose that we are interested in estimating the average length of spells of food stamp recipiency during the calendar year 1992 for a given subpopulation. Also, suppose there are only 10 sample persons in the subpopulation who were food stamp recipients. (This example is for illustrative purposes only; actually, 10 sample cases would be too few for a reliable estimate.) The number of consecutive months of food stamp recipiency during 1992 and the 92CY weights are given below for each sample person:
Using formula (8), the average spell of food stamp recipiency is estimated to be
The standard error will be computed by formula (3). First, the estimated population variance can be obtained by formula (7):
Next, the base "b" parameter of 17,457 is taken from table 5 and multiplied by the factor computed from formula (1):
Therefore, the final "b" parameter is 29,851 and the standard error of the mean is
Standard Errors of Estimated Percentages. This section refers to the percentages of a group of persons, families, or households possessing a particular attribute and to percentages of money or related concepts. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends upon both the size of the percentage and the size of the total upon which the percentage is based. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are over 50 percent. For example, the percent of employed persons is more reliable than the estimated number of employed persons. When the numerator and denominator of the percentage have different parameters, use the parameter of the numerator. If proportions are presented instead of percentages, note that the standard error of a proportion is equal to the standard error of the corresponding percentage divided by 100.
There are two types of percentages commonly estimated. The first type is the percentage of persons sharing a particular characteristic such as the percentage of persons owning their own home or the percentage of January food stamp recipients who were also receiving food stamps in July. The second type is the percentage of money or some similar concept held by a particular group of persons or held in a particular form. Examples are the percentage of wealth held by persons with high income and the percentage of annual income received by females.
For the percentage of persons, the approximate standard error, s(x,p), of the estimated percentage, p, can be obtained by the formula:
Here x is the base of the percentage, p is the percentage (0<p<100), and b is the "b" parameter for the numerator.
Illustration. Suppose that an estimated 46,023,000 males were employed in July 1992 and an estimated 2.4 percent of them became unemployed in August 1992. The base "b" parameter is 5,951 (from table 5). Using formula (9) and the appropriate "b" parameter, the approximate standard error is
Consequently, the 90-percent confidence interval as shown by these data is from 2.1 to 2.7 percent.
For percentages of money, a more complicated formula is required. A percentage of money will usually be estimated in one of two ways. It may be the ratio of two aggregates:
pM = (XA/XN) x 100
or it may be the ratio of two means with an adjustment for different bases:
where xA and XN are aggregate money figures, and are mean money figures, and is the estimated number in group A divided by the estimated number in group N. In either case, we estimate the standard error as
where sP is the standard error of , sA is the standard error of and sN is the standard error of . To calculate sp, use formula (9). The standard errors of and are calculated using formula (3).
Note that there is frequently some correlation between the characteristics estimated by , , and . These correlations, if present, will cause a tendency towards overestimates or underestimates, depending on the relative sizes of the correlations and whether they are positive or negative.
Illustration. Suppose that in October 1992 an estimated 8.8% of males 16 years and over were black, the mean monthly earnings of these black males was $1288, the mean monthly earnings of all males 16 years and over was $1911, and the corresponding standard errors are .37%, $36, and $27. Then, the percent of male earnings made by blacks in October 1992 is:
Using formula (10), the approximate standard error is:
Standard Error of a Difference. The standard error of a difference between two sample estimates, x and y, is equal to
where sx and sy are the standard errors of the estimates x and y. The estimates can be numbers, averages, percents, ratios, etc. The correlation between x and y is represented by r. Some estimated correlations are given in table 8. These correlations apply only to cross-sectional estimates of the same characteristic at two points of time. The cross-sectional estimates must be monthly estimates averaged over quarters or years (see the section "Use of Person Weights" for a discussion of cross-sectional estimates). Correlations are given for both person and household characteristics. If no correlation has been provided for a given set of x and y estimates, then assume r = 0. If r is assumed to be zero and the true correlation is really positive (negative), then this assumption will result in a tendency towards overestimates (underestimates) of the true standard error.
Illustration. Suppose that we are interested in the difference in the average monthly number of males vs. females with monthly cash income above $5,000 in 1992. An estimate of the number of persons in this income bracket has been obtained for each month of both males and females. Averaging the 12 monthly estimates for 1992 produces an estimate of 1,619,000 for the average number of females in this monthly income bracket during 1992 (based on 92CY weights). The similar estimate for males is 2,000,000 (based on 92CY weights). The difference in estimates is 381,000.
The standard error of the female estimate is computed next. Base "a" and "b" parameters from table 5 for females are -0.0000665 and 5,951, respectively. Because 12 monthly estimates were used in the average, these parameters are multiplied by a factor of 0.86 from table 7 to yield final parameters of -0.0000572 and 5,118. Using formula (2), the standard error of the female estimate is
In a similar manner, using parameters from table 5, the standard error of the male estimate is 90,000. Now, the standard error of the difference is computed using the above two standard errors. The correlation r for this example is 0. The standard error of the difference is computed by formula (11):
Suppose that it is desired to test at the 10 percent significance level whether the average number of males and females with monthly cash income above $5,000 were different in 1992. To perform the test, compare the difference of 381,000 to the product 1.645 x 127,000 = 209,000. Since the difference is larger than 1.645 times the standard error of the difference, the data show that the two sexes are significantly different at the 10 percent level.
Standard Error of a Median. The median quantity of some item such as income for a given group of persons, families, or households is that quantity such that at least half the subpopulation have as much or more and at least half the group have as much or less. The sampling variability of an estimated median depends upon the form of the distribution of the item as well as the size of the subpopulation.
The median, like the mean, can be estimated using either data which has been grouped into intervals or ungrouped data. If grouped data are used, the median is estimated using formulas (12) or (13) with p = 0.5. If ungrouped data are used, the data records are ordered based on the value of the characteristic, then the estimated median is the value of the characteristic such that the weighted estimate of 50 percent of the subpopulation falls at or below that value and 50 percent is at or above that value. Note that the method of standard error computation which is presented here requires the use of grouped data. Therefore, it should be easier to compute the median by grouping the data and using formulas (12) or (13).
An approximate method for measuring the reliability of an estimated median is to determine a confidence interval about it. (See the section "Confidence Intervals".) The following procedure may be used to estimate the 68-percent confidence limits and hence the standard error of a median based on sample data.
1. Determine, using formula (9), the standard error of an estimate of 50 percent of the group;
2. Add to and subtract from 50 percent the standard error determined in step 1;
3. Using the distribution of the item within the group, calculate the quantity of the item such that the percent of the group owning more is equal to the smaller percentage found in step 2. This quantity will be the upper limit for the 68-percent confidence interval. In a similar fashion, calculate the quantity of the item such that the percent of the group owning more is equal to the larger percentage found in step 2. This quantity will be the lower limit for the 68-percent confidence interval (note that a median computed from ungrouped data may or may not fall in this confidence interval);
4. Divide the difference between the two quantities determined in step 3 by two to obtain the standard error of the median.
To perform step 3, it will be necessary to interpolate. Different methods of interpolation may be used. The most common are simple linear interpolation and Pareto interpolation. The appropriateness of the method depends on the form of the distribution around the median. We recommend Pareto interpolation in most instances. Interpolation is used as follows. The quantity of the item such that "p" percent own more is
if Pareto Interpolation is indicated and
if linear interpolation is indicated, where
N is the size of the group,
A1 and A2 are the lower and upper bounds, respectively, of the interval in which XpN falls,
N1 and N2 are the estimated number of group members owning more than A1 and A2, respectively,
exp refers to the exponential function and
Ln refers to the natural logarithm function.
It should be noted that a mathematically equivalent result is obtained by using common logarithms (base 10) and antilogarithms.
Illustration. To illustrate the calculations for the standard error of a median, we return to the first example used to illustrate the standard error of a mean. The median annual income for this group is computed by formula (12) to be $18,317. The size of the group is 39,851,000.
1. Using formula (9) and the appropriate "b" parameter of 5,951, the standard error of 50 percent on a base of 39,851,000 is about 0.6 percentage points.
2. Following step (2), the two percentages of interest are 49.4 and 50.6.
3. By examining table 2, we see that the percentage 49.4 falls in the income interval from $17,500 to $19,999. (Since 55.5 percent receive $17,500 or more per year, but only 40.9 percent receive $20,000 or more per year, the quantity that exactly 49.4 percent receive more than must be between $17,500 and $19,999.) Thus
A1 = $17,500, A2 = $19,999, N1 = 22,117,000, and N2 = 16,299,000. Implementing Pareto interpolation, the upper bound of a 68-percent confidence interval for the median is
Also by examining table 2, we see that the percentage of 50.6 falls in the same income interval. Thus, A1, A2, N1, and N2 are the same as above. The lower bound of a 68-percent confidence interval for the median is
and the 68-percent confidence interval on the estimated median of $18,317 is from $18,222 to $18,414. An approximate standard error is
If linear interpolation is used, the median is estimated using formula (13) to be $18,441 and the 68-percent confidence interval of the estimated median is from $18,338 to $18,544. The approximate standard error is $103.
Standard Errors of Ratios of Means or Medians. The standard error for a ratio of means or medians is approximated by formula (14):
where x and y are the means or medians, and sx and sy are their associated standard errors. Formula (14) assumes that the means or medians are not correlated. If the correlation between the population means or medians estimated by x and y are actually positive (negative), then this procedure will tend to produce overestimates (underestimates) of the true standard error for the ratio of means or medians.
Introduction to SIPP |
SIPP Survey Content |
Technical Information |
Using & Linking Files |
SIPP Publications |
| Access SIPP Data | SIPP Users' Guide | SIPP Tutorial | User Notes/ListServe/News | SIPP Help |
Page Last Modified: May 9, 2006