Disability |
|
SOURCE AND ACCURACY STATEMENT
FOR THE 1993 PUBLIC USE FILES
FROM THE SURVEY OF INCOME AND PROGRAM PARTICIPATION (SIPP)
SOURCE OF DATA
The SIPP universe is the noninstitutionalized resident population living in the United States. The population includes persons living in group quarters, such as dormitories, rooming houses, and religious group dwellings. Not eligible to be in the survey are crew members of merchant vessels, Armed Forces personnel living in military barracks, and institutionalized persons, such as correctional facility inmates and nursing home residents. Also, not eligible are United States citizens residing abroad. Foreign visitors who work or attend school in this country and their families are eligible; all others are not eligible. With the exceptions noted above, field representatives interview eligible persons who are at least 15 years of age at the time of the interview.
The 1993 panel of the SIPP sample is located in 284 Primary Sampling Units (PSUs) each consisting of a county or a group of contiguous counties. Within these PSUs, we systematically selected expected clusters of two living quarters (LQs) from lists of addresses prepared for the 1980 decennial census to form the bulk of the sample. To account for LQs built within each of the sample areas after the 1980 census we selected a sample containing clusters of four LQs from permits issued for construction of residential LQs up until shortly before the beginning of the panel.
In jurisdictions that have incomplete addresses or don't issue building permits, we sampled small land areas, listed expected clusters of four LQs, and then subsampled. In addition, we selected a sample of LQs from a supplemental frame that included LQs identified as missed in the 1980 census.
Approximately 27,300 living quarters were originally designated for the 1993 panel. For Wave 1 of the panel, we obtained interviews from occupants of about 19,900 of the 27,300 designated living quarters. We found most of the remaining 7,400 living quarters in the panel to be vacant, demolished, converted to nonresidential use, or otherwise ineligible for the survey. However, we did not interview approximately 2,000 of the 7,400 living quarters in the panel because the occupants refused to be interviewed, could not be found at home, were temporarily absent, or were otherwise unavailable. Thus, occupants of about 91 percent of all eligible living quarters participated in the first interview of the panel.
For subsequent interviews, only original sample persons (those in Wave 1 sample households and interviewed in Wave 1) and persons living with them are eligible to be interviewed. We followed original sample persons if they moved to a new address, unless the new address was more than 100 miles from a SIPP sample area, we attempted telephone interviews. When original sample persons moved to remote parts of the
country and were unreachable by telephone, moved without leaving a forwarding address, or refused the interview, additional noninterviews resulted.
The Bureau divides sample households within a given panel into four subsamples of nearly equal size. We call these subsamples rotation groups 1, 2, 3, or 4 and interview one rotation group each month. Beginning in February 1993, we schedule interviews for each household in the sample at 4 month intervals over a period of roughly 2½ years. The reference period for the questions is the 4-month period preceding the interview month. A wave is one cycle of four interviews covering the entire sample, using the same questionnaire.
A unique feature of the SIPP design is overlapping panels. The overlapping design allows combining of panels and essentially doubles the sample size. It is possible to combine selected interviews for the 1993 panels with interviews from the 1992 panels. We include information necessary to do this later in this statement.
The public use files include core and supplemental (topical module) data. Field representatives repeat core questions at each interview over the life of the panel. Topical modules include questions which are asked only in certain waves. The 1993 and 1992 panel topical modules are shown in tables 1 and 2 respectively.
Tables 3 and 4 indicate the reference months and interview months for the collection of data from each rotation group for the 1993 and 1992 panels respectively. For example, Wave 1 rotation group 2 of the 1993 panel was interviewed in February 1993 and data for the reference months October 1992 through January 1993 were collected.
Estimation. We derived SIPP person weights in each panel from several stages of weight adjustments. In the first wave, we gave each person a base weight equal to the inverse of his/her probability of selection. For each subsequent interview, the Bureau gave each person a base weight that accounted for following movers.
We applied a factor to each interviewed person's weight to account for the SIPP sample areas not having the same population distribution as the strata they are from.
We applied a noninterview adjustment factor to the weight of every occupant of interviewed households to account for persons in noninterviewed occupied households which were eligible for the sample. (The Bureau treated individual nonresponse within partially interviewed households with imputation. We made no special adjustment for noninterviews in group quarters.)
The Bureau used complex techniques to adjust the weights for nonresponse. For a further explanation of the techniques used, see the Nonresponse Adjustment Methods for Demographic Surveys at the U.S. Bureau of the Census, November 1988, Working paper 8823, by R. Singh and R. Petroni. The success of these techniques in avoiding bias is unknown. An example of successfully avoiding bias can be found in "Current Nonresponse Research for the Survey of Income and Program Participation" (paper by Petroni, presented at the Second International Workshop on Household Survey Nonresponse, October 1991).
We performed an additional stage of adjustment to persons' weights to reduce the mean square errors of the survey estimates. We accomplished this by ratio adjusting the sample estimates to agree with monthly Current Population Survey (CPS) type estimates of the civilian (and some military) noninstitutional population of the United States at the national level by demographic characteristics including age, sex, and race as of the specified date. The Bureau brought CPS estimates by age, sex, and race into agreement with adjusted estimates from the 1990 decennial census. Adjustments to the 1990 decennial census estimates include an adjustment for undercount(1) and also reflect births, deaths, immigration, emigration, and changes in the Armed Forces since 1990. The 1991 panel wave 6 is the first panel and wave to use the 1990 census based controls in the weighting. Weights for earlier waves were based on independent population estimates derived by updating the 1980 decennial census counts. For information about the effect of the new population controls on various person and household characteristics, refer to tables 5 through 10 from the January 10, 1994 memorandum for Turner from Waite, titled "SIPP 91: Source and Accuracy Statement for 1991 Wave 6+ Panel Public Use Files." In addition, we controlled SIPP estimates to independent Hispanic controls and made an adjustment to assign equal weights to husbands and wives within the same household. We implemented all of the above adjustments for each reference month and the interview month.
Use of Weights. Each household and each person within each household on each wave tape has five weights. Four of these weights are reference month specific and therefore can be used only to form reference month estimates. Average reference month estimates to form estimates of monthly averages over some period of time. For example, using the proper weights, one can estimate the monthly average number of households in a specified income range over November and December 1993. To estimate monthly averages of a given measure (e.g., total, mean) over a number of consecutive months, sum the monthly estimates and divide by the number of months.
The remaining weight is interview month specific. Use this weight to form estimates that specifically refer to the interview month (e.g., total persons currently looking for work), as well as estimates referring to the time period including the interview month and all previous months (e.g., total persons who have ever served in the military).
To form an estimate for a particular month, use the reference month weight for the month of interest, summing over all persons or households with the characteristic of interest whose reference period includes the month of interest. Multiply the sum by a factor to account for the number of rotations contributing data for the month. This factor equals four divided by the number of rotations contributing data for the month. For example, December 1992 data is only available from rotations 2, 3, and 4 for Wave 1 of the 1993 panel (see table 3), so apply a factor of 4/3. To form an estimate for an interview month, use the procedure discussed above using the interview month weight provided on the file.
Apply factors greater than 1 when constructing estimates for months with four rotations worth of data from a wave file. However, when using core data from consecutive waves together, data from all four rotations may be available, in which case the factors are equal to 1.
These tapes contain no weight for characteristics that involve a persons's or household's status over two or more months (e.g., number of households with a 50 percent increase in income between November and December 1992).
Producing Estimates for Census Regions and States. The total estimate for a region is the sum of the state estimates in that region. Using this sample, estimates for individual states are subject to very high variance and are not recommended. The state codes on the file are primarily of use for linking respondent characteristics with appropriate contextual variables (e.g., state-specific welfare criteria) and for tabulating data by user-defined groupings of states.
Producing Estimates for the Metropolitan Population. For Washington, DC and 18 states, we identify metropolitan or non-metropolitan residence (variable H*-METRO). In 28 additional states, where the non-metropolitan population in the sample was small enough to present a disclosure risk, we recoded a fraction of the metropolitan sample to be indistinguishable from non-metropolitan cases (H*-METRO=2). In these states, therefore, the cases coded as metropolitan (H*-METRO=1) represent only a subsample of that population.
In producing state estimates for a metropolitan characteristic, multiply the individual, family, or household weights by the metropolitan inflation factor for that state, presented in table 5. (This inflation factor compensates for the subsampling of the metropolitan population and is 1.0 for the states with complete identification of the metropolitan population.)
The same procedure applies when creating estimates for particular identified MSA's or CMSA's--apply the factor appropriate to the state. For multi-state MSA's, use the factor appropriate to each state part. For example, to tabulate data for the Washington, DC-MD-VA MSA, apply the Virginia factor of 1.0321 to weights for residents of the Virginia part of the MSA; Maryland and DC residents require no modification to the weights (i.e., their factors equal 1.0).
In producing regional or national estimates of the metropolitan population, it is also necessary to compensate for the fact that we don't identify a metropolitan subsample within one state (West Virginia). Thus, use factors in the right-hand column of table 11 for regional and national estimates. The results of regional and national tabulations of the metropolitan population will be biased slightly. However, less than one-half of one percent of the metropolitan population is not represented.
Producing Estimates for the Non-Metropolitan Population. State, regional, and national estimates of the non-metropolitan population cannot be computed directly, except for Washington, DC and the 18 states where the factor for state tabulations in table 5 is 1.0. In all other states, the cases identified as not in the metropolitan subsample (METRO=2) are a mixture of non-metropolitan and metropolitan households. Only an indirect method of estimation is available: first compute an estimate for the total population, then subtract the estimates for the metropolitan population. The results of these tabulations will be slightly biased.
Combined Panel Estimates. Both the 1993 and 1992 panels provide data for October 1992-April 1995. Thus, obtain estimates for these time periods by combining the corresponding panels. However, since the Wave 1 questionnaire differs from the subsequent waves' questionnaire and since the procedures changed between the 1992 and 1993 panels, we recommend that estimates not be obtained by combining Wave 1 data of the 1993 panel with data from another panel. In this case, use the estimate obtained from either panel. Additionally, even for other waves, care should be taken when combining data from two panels since questionnaires for the two panels differ somewhat and since the length of time in sample for interviews from the two panels differ.
Obtain combined panel estimates either (1) by combining estimates derived separately for the two panels or (2) by first combining data from the two files and then producing an estimate.
1. Combining Separate Estimates
Combine corresponding estimates from two consecutive year panels to create joint estimates by using the formula

To combine the 1992 and 1993 panels use a W value of 0.517 unless one of the panels contributes no information to the estimate. In that case, assign the panel contributing information a factor of 1. Assign the other a factor of zero.
2. Combining Data from Separate Files
Start by first creating a file containing the data from the two panel files. Apply the weighting factor, W, to the weight of each person from the earlier panel and apply (1-W) to the weight of each person from the later panel. Then produce estimates using the same methodology as used to obtain estimates from a single panel.
Illustration for computing combined panel estimate.
Suppose SIPP estimates for Wave 5, 1992 panel show there were 441,000 households with monthly May income above $6,000. Also, suppose SIPP estimates for Wave 2, 1993 panel show there were 435,000 households with monthly May income above $6,000. Using formula (A), the joint level estimate is
ACCURACY OF ESTIMATES
We base SIPP estimates on a sample. The sample estimates may differ somewhat from the values obtained from administering a complete census using the same questionnaire, instructions, and enumerators. The difference occurs because with an estimate based on a sample survey two types of errors are possible: nonsampling and sampling. We can provide estimates of the magnitude of the SIPP sampling error, but this is not true of nonsampling error. The next few sections describe SIPP nonsampling error sources, followed by a discussion of sampling error, its estimation, and its use in data analysis.
Nonsampling Variability. We attribute nonsampling errors to many sources, they include:
inability to obtain information about all cases in the sample,
definitional difficulties,
differences in the interpretation of questions,
inability or unwillingness on the part of the respondents to provide correct information,
inability to recall information,
errors made in collection (e.g. recording or coding the data),
errors made in processing the data,
errors made in estimating values for missing data,
biases resulting from the differing recall periods caused by the interviewing pattern used,
undercoverage.
We used quality control and edit procedures to reduce errors made by respondents, coders and interviewers. More detailed discussions of the existence and control of nonsampling errors in the SIPP are in the SIPP Quality Profile.
Undercoverage in SIPP resulted from missed living quarters and missed persons within sample households. It is known that undercoverage varies with age, race, and sex. Generally, undercoverage is larger for males than for females and larger for Blacks than for Nonblacks. Ratio estimation to independent age-race-sex population controls partially corrects for the bias due to survey undercoverage. However, biases exist in the estimates when persons in missed households or missed persons in interviewed households have characteristics different from those of interviewed persons in the same age-race-sex group.
A common measure of survey coverage is the coverage ratio, the estimated population before ratio adjustment divided by the independent population control. Table 6 shows CPS coverage ratios for age-sex-race groups for 1992. The CPS coverage ratios can exhibit some variability from month to month, but these are a typical set of coverage ratios. Other Census Bureau household surveys like the SIPP experience similar coverage.
Comparability with Other Estimates. Exercise caution when comparing data from this report with data from other SIPP publications or with data from other surveys. Comparability problems are from varying seasonal patterns for many characteristics, different nonsampling errors, and different concepts and procedures. Refer to the SIPP Quality Profile for known differences with data from other sources and further discussion.
Sampling Variability. Standard errors indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration, but do not measure any systematic biases in the data. The standard errors mostly measure the variations that occurred by chance because we surveyed a sample rather than the entire population.
USES AND COMPUTATION OF STANDARD ERRORS
Confidence Intervals. The sample estimate and its standard error enable one to construct confidence intervals, ranges that would include the average result of all possible samples with a known probability. For example, if we selected all possible samples and surveyed each of these under essentially the same conditions and with the same sample design, and if we calculated an estimate and its standard error from each sample, then:
1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average result of all possible samples.
2. Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.
3. Approximately 95 percent of the intervals from 1.960 standard errors below the estimate to 1.960 standard errors above the estimate would include the average result of all possible samples.
The average estimate derived from all possible samples is or is not contained in any particular computed interval. However, for a particular sample, one can say with a specified confidence that the confidence interval includes the average estimate derived from all possible samples.
Hypothesis Testing. One may also use standard errors for hypothesis testing. Hypothesis testing is a procedure for distinguishing between population characteristics using sample estimates. The most common type of hypothesis tested is 1) the population characteristics are identical versus 2) they are different. One can perform tests at various levels of significance, where a level of significance is the probability of concluding that the characteristics are different when, in fact, they are identical.
Unless noted otherwise, all statements of comparison in the report passed a hypothesis test at the 0.10 level of significance or better. This means that, for differences cited in the report, the estimated absolute difference between parameters is greater than 1.645 times the standard error of the difference.
To perform the most common test, compute the difference XA - XB, where XA and XB are sample estimates of the characteristics of interest. A later section explains how to derive an estimate of the standard error of the difference XA - XB. Let that standard error be sDIFF. If XA - XB is between -1.645 times sDIFF and +1.645 times sDIFF, no conclusion about the characteristics is justified at the 10 percent significance level. If, on the other hand, XA - XB is smaller than -1.645 times sDIFF or larger than +1.645 times sDIFF, the observed difference is significant at the 10 percent level. In this event, it is commonly accepted practice to say that the characteristics are different. Of course, sometimes this conclusion will be wrong. When the characteristics are, in fact, the same, there is a 10 percent chance of concluding that they are different.
Note that as we perform more tests, more erroneous significant differences will occur. For example, at the 10 percent significance level, if we perform 100 independent hypothesis tests in which there are no real differences, it is likely that about 10 erroneous differences will occur. Therefore, interpret the significance of any single test cautiously.
Note Concerning Small Estimates and Small Differences. We show summary measures in the report only when the base is 200,000 or greater. Because of the large standard errors involved, there is little chance that estimates will reveal useful information when computed on a base smaller than 200,000. Also, nonsampling error in one or more of the small number of cases providing the estimate can cause large relative error in that particular estimate. We show estimated numbers, however, even though the relative standard errors of these numbers are larger than those for the corresponding percentages. We provide smaller estimates primarily to permit such combinations of the categories as serve each user's needs. Therefore, be careful in the interpretation of small differences since even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.
Standard Error Parameters and Tables and Their Use. Most SIPP estimates have greater standard errors than those obtained through a simple random sample because we sampled clusters of living quarters for the SIPP. To derive standard errors at a moderate cost and applicable to a wide variety of estimates, we made a number of approximations. We grouped estimates with similar standard error behavior and developed two parameters (denoted "a" and "b") to approximate the standard error behavior of each group of estimates. Because the actual standard error behavior was not identical for all estimates within a group, the standard errors we computed from these parameters provide an indication of the order of magnitude of the standard error for any specific estimate. These "a" and "b" parameters vary by characteristic and by demographic subgroup to which the estimate applies. Use base "a" and "b" parameters found in table 7 for 1993 panel estimates. Note that for estimates which include data for wave 5 and beyond multiply the "a" and "b" parameters by 1.09 to account for sample attrition.
The factors provided in table 8 when multiplied by the base parameters of table 7 for a given subgroup and type of estimate give the "a" and "b" parameters for that subgroup and estimate type for the specified reference period. For example, the base "a" and "b" parameters for total number of households are -0.0000702 and 6,715, respectively. For Wave 1 the factor for October 1992 is 4 since only 1 rotation month of data is available. So, the "a" and "b" parameters for total household income in October 1992 based on Wave 1 are -0.0002808 and 26,860, respectively. Also for Wave 1, the factor for the first quarter of 1993 is 1.2222 since 9 rotation months of data are available (rotations 1 and 4 provide 3 rotations months each, while rotations 2 and 3 provide 1 and 2 rotation months, respectively). So the "a" and "b" parameters for total number of households in the first quarter of 1993 are -0.0000857 and 8,207, respectively for Wave 1.
Use the "a" and "b" parameters to calculate the standard error for estimated numbers and percentages. Because the actual standard error behavior was not identical for all estimates within a group, the standard errors computed from these parameters provide an indication of the order of magnitude of the standard error for any specific estimate. The following sections give methods for using these parameter for computation of approximate standard errors.
For users who wish further simplification, we also provide general standard errors in tables 9 through 12. Note that these standard errors only apply when data from all four rotations are used and you need to adjust these standard errors by a factor from table 7. The standard errors resulting from this simplified approach are less accurate. Methods for using these parameters and tables for computation of standard errors are given in the following sections.
For the 1992, 1993 combined panel parameters, multiply the parameters in table 7 by the appropriate factor from table 16. The factors provided in table 17 adjust parameters for the number of rotation months available for a given estimate. These factors, when multiplied by the combined panel parameters derived from table 7 for a given subgroup and type of estimate, give the "a" and "b" parameters for that subgroup and estimate type for the specified combined reference period.
Table 13 provides base "a" and "b" parameters for calculating 1993 topical module variances. Table 14 provides base "a" and "b" parameters for computing the 1992, 1993 combined panel topical module variances.
Described below are procedures for calculating standard errors for the types of estimates most commonly used. Note specifically that these procedures apply only to reference month estimates or averages of reference month estimates. Refer to the section "Use of Weights" for a more detailed discussion of the construction of estimates. We included stratum codes and half sample codes on the tapes so users can compute variances directly by methods such as balanced repeated replications (BRR). William G. Cochran provides a list of references discussing the application of this technique. (See Sampling Techniques, 3rd Ed., New York: John Wiley and Sons, 1977, p. 321.)
Standard errors of estimated numbers. Obtain the approximate standard error, sx, of an estimated number of persons, households, families, unrelated individuals and so forth, in one of two ways. Both apply when data from all four rotations are used to make the estimate. However, only the second method should be used when less than four rotations of data are available for the estimate. Note that neither method should be applied to dollar values.
The standard error may be obtained by the use of the formula
Illustration.
Suppose SIPP estimates for Wave 1 of the 1993 panel show that there were 472,000 black households with monthly household income above $6,000. The appropriate parameters and factor from table 7 and the appropriate general standard error from table 9 are
a = -0.0004187 b = 4,640 f = 0.83 s = 55,000
Using formula 1, the approximate standard error is
sx = 46,000
Using formula 2, the approximate standard error is
Illustration for computing standard errors for combined panel estimates.
Suppose the combined SIPP estimate for total number of males in the 16+ Income and Labor Force for Wave 6, 1992 panel and Wave 3, 1993 panel was 92,398,000. The combined panel parameters for total males are obtained by multiplying the appropriate "a" and "b" values from table 7 by the appropriate factors from tables 16 and 17. The 1993 parameters and factors are a = -0.0000580, b = 5,433, g = 1.0000 and factor = 1.0000, respectively. Thus, the combined panel parameters are a = -0.0000580 and b = 5,433. Using formula 2, the approximate standard error is
Standard Error of a Mean. Define a mean as the average quantity of some item (other than
persons, families, or households) per person, family or household. For example, it could be
the average monthly household income of females age 25 to 34. Use formulas below to
approximate the standard error of a mean. Because of the approximations used in developing
formula 3, an estimate of the standard error of the mean obtained from this formula will
generally underestimate the true standard error. The formula used to estimate the standard
error of a mean
is
(3)where y is the size of the base, s2 is the estimated population variance of the item and b is the parameter associated with the particular type of item.
Estimate the population variance s2 by one of two methods. In both methods we assume xi is the value of the item for unit i. (Unit may be person, family, or household). To use the first method, divide the range of values for the item into c intervals. The upper and lower boundaries of interval j are Zj-1 and Zj, respectively. Place each unit into one of c groups such that Zj-1 < xi Zj.
The estimated population variance, s2, is given by the formula:
where pj is the estimated proportion of units in group j, and mj = (Zj-1 + Zj) /2. We assume the most representative value of the item in group j is mj. If group c is open-ended, i.e., no upper interval boundary exists, then an approximate value for mc is

Compute the mean,
, using the following formula:

In the second method, the estimated population variance is given by
(5)where there are n units with the item of interest and wi is the final weight for unit i. Compute
the mean,
, using the formula

When forming combined estimates using formula (A) from the section on combined panel
estimates, calculate s2, given by formula (4), by forming a distribution for each panel. Divide
the range of values for the item into intervals. Obtain combined estimates for each interval
using formula (A). Apply formula (4) to the combined distribution. To calculate
and s2
given by formula (5), replace xi by Wxi for xi from the earlier panel and (1-W)xi for xi from the
later panel.
Illustration.
Suppose that based on Wave 1 data, the distribution of monthly cash income for persons age 25 to 34 during the month of January 1993 is given in table 15.
Using formula 4 and the mean monthly cash income of $2,530 the approximate population variance, s2, is


Standard error of an aggregate. We define an aggregate as the total quantity of an item summed over all the units in a group. Approximate the standard error of an aggregate using formula 6.
Because of the approximations used in developing formula (6), it will generally underestimate the true standard error. Let y be the size of the base, s2 be the estimated population variance of the item obtained using formula (4) or (5) and b be the parameter associated with the particular type of item. The standard error of an aggregate is:
Standard Errors of Estimated Percentages. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends on the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more, e.g., the percent of people employed is more reliable than the estimated number of people employed. When the numerator and denominator of the percentage have different parameters, use the parameter (and appropriate factor) of the numerator. If proportions are presented instead of percentages, note that the standard error of a proportion is equal to the standard error of the corresponding percentage divided by 100.
We commonly estimate two types of percentages. The first is the percentage of persons, families or households sharing a particular characteristic such as the percent of persons owning their own home. The second type is the percentage of money or some similar concept held by a particular group of persons or held in a particular form. Examples are the percent of total wealth held by persons with high income and the percent of total income received by persons on welfare.
For the percentage of persons, families, or households, calculate the approximate standard error, s(x,p), of an estimated percentage p using the formula
when estimating p using data from all four rotations.
In this formula, f is the appropriate "f" factor from table 7 and s is the standard error of the estimate from table 11 or 12.
Alternatively, approximate it by the formula:
(8)from which we calculated the standard errors in tables 11 and 12. Here x is the size of the subclass of social units which is the base of the percentage, p is the percentage (0<p<100), and b is the parameter associated with the characteristic in the numerator. Using this formula gives more accurate results than using formula 7 above. Use this formula to estimate p for data with less than four rotations.
Illustration.
Suppose that, in the month of January 1993, 6.7 percent of the 16,812,000 persons in nonfarm households with a mean monthly household cash income of $4,000 to $4,999, were black. Using formula 8 and the "b" parameter of 7,310 from table 7 and a factor of 1 for the month of January 1993 from table 8, the approximate standard error is

Consequently, the 90 percent confidence interval as shown by these data is from 5.8 to 7.6 percent.
Percentages of money require a more complicated formula. Estimate a percentage of money one of two ways. It may be the ratio of two aggregates:
(9)where sp is the standard error of
, sA is the standard error of
and sB is the standard
error of
. To calculate sp, use formula 8. Calculate the standard errors of
and
using formula 3.
Note that there is frequently some correlation between
,
and
. Depending on
the magnitude and sign of the correlations, the standard error will be over or underestimated.
Illustration.
Suppose that in January 1993, 9.8% of the households own rental property, the mean value of rental property is $72,121, the mean value of assets is $78,734, and the corresponding standard errors are 0.31%, $5799, and $2867. In total there are 86,790,000 households. Then, the percent of all household assets held in rental property is

Using formula (9), the appropriate standard error is

= 0.8%
Standard Error of a Difference. The standard error of a difference between two sample estimates, x and y, is approximately equal to
where sx and sy are the standard errors of the estimates x and y.
The estimates can be numbers, percents, ratios, etc. The above formula assumes that the correlation coefficient between the characteristics estimated by x and y is zero. If the correlation is really positive (negative), then this assumption will tend to cause overestimates (underestimates) of the true standard error.
Illustration.
Suppose that SIPP estimates show the number of persons age 35-44 years with monthly cash income of $4,000 to $4,999 was 3,186,000 in the month of January 1993 and the number of persons age 25-34 years with monthly cash income of $4,000 to $4,999 in the same time period was 2,619,000. Then, using parameters from table 7 and formula 2, the standard errors of these numbers are approximately 130,000 and 118,000, respectively. The difference in sample estimates is 567,000 and, using formula 10, the approximate standard error of the difference is
Suppose that it is desired to test at the 10 percent significance level whether the number of persons with monthly cash income of $4,000 to $4,999 was different for persons age 35-44 years than for persons age 25-34 years. To perform the test, compare the difference of 567,000 to the product 1.645 x 176,000 = 290,000. Since the difference is greater than 1.645 times the standard error of the difference, the data show that the two age groups are significantly different at the 10 percent significance level.
Standard Error of a Median. The median quantity of some item such as income for a given group of persons, families, or households is that quantity such that at least half the group have as much or more and at least half the group have as much or less. The sampling variability of an estimated median depends upon the form of the distribution of the item as well as the size of the group. Use the procedure described below to calculate standard errors on medians.
An approximate method for measuring the reliability of an estimated median is to determine a confidence interval about it. (See the section on sampling variability for a general discussion of confidence intervals.) Use the following procedure to estimate the 68-percent confidence limits and hence the standard error of a median based on sample data.
1. Determine, using either formula 7 or formula 8, the standard error of an estimate of 50 percent of the group;
2. Add to and subtract from 50 percent the standard error determined in step 1;
3. Using the distribution of the item within the group, calculate the quantity of the item such that the percent of the group with more of the item is equal to the smaller percentage found in step 2. This quantity will be the upper limit for the 68-percent confidence interval. In a similar fashion, calculate the quantity of the item such that the percent of the group with more of the item is equal to the larger percentage found in step 2. This quantity will be the lower limit for the 68-percent confidence interval;
4. Divide the difference between the two quantities determined in step 3 by two to obtain the standard error of the median.
To perform step 3, you must interpolate. You may use different methods of interpolation. The most common are simple linear interpolation and Pareto interpolation. The appropriateness of the method depends on the form of the distribution around the median. If density is declining in the area, then we recommend Pareto interpolation. If density is fairly constant in the area, then we recommend linear interpolation. Never use Pareto interpolation if the interval contains zero or negative measures of the item of interest. Use interpolation as follows. The quantity of the item such that "p" percent have more of the item is
(11)if Pareto Interpolation is indicated and
(12)if linear interpolation is indicated, where
N is the size of the group,
A1 and A2 are the lower and upper bounds, respectively, of the interval in which XpN falls,
N1 and N2 are the estimated number of group members owning more than A1 and A2, respectively,
exp refers to the exponential function and
Ln refers to the natural logarithm function.
Illustration.
To illustrate the calculations for the sampling error on a median, we return to table 15. The median monthly income for this group is $2,158. The size of the group is 39,851,000.
1. Using formula 8, the standard error of 50 percent on a base of 39,851,000 is about 0.6 percentage points.
2. Following step 2, the two percentages of interest are 49.4 and 50.6.
3. By examining table 15, we see that the percentage 49.4 falls in the income interval from 2000 to 2499. (Since 55.5% receive more than $2,000 per month, the dollar value corresponding to 49.4 must be between $2,000 and $2,500). Thus, A1 = $2,000, A2 = $2,500, N1 = 22,106,000, and N2 = 16,307,000.
In this case, we decided to use Pareto interpolation. Therefore, the upper bound of a 68% confidence interval for the median is

Also by examining table 15 , we see that 50.6 falls in the same income interval. Thus, A1, A2, N1 and N2 are the same. We also use Pareto interpolation for this case. So the lower bound of a 68% confidence interval for the median is
Thus, the 68-percent confidence interval on the estimated median is from $2139 to $2177. An approximate standard error is

Standard Errors of Ratios of Means and Medians. Approximate the standard error for a ratio of means or medians by:
(13)where x and y are the means or medians, and sx and sy are their associated standard errors. Formula 13 assumes that the means are not correlated. If the correlation between the population means estimated by x and y are actually positive (negative), then this procedure will tend to produce overestimates (underestimates) of the true standard error for the ratio of means.
1. See "The 1990 Post-Enumeration Survey: Operations and Results" by Howard Hogan in the 1993 Proceedings of the Undercount in the 1990 Census Section, American Statistical Association.
Requests for Special Tabulations