SIPP Home > Technical Information > SIPP Sampling and Weighting > SIPP Sampling Error > Direct Variance Estimation

# Direct Variance Estimation

The primary sampling unit (PSU) plays a key role in variance estimation with a multistage sample design. SIPP PSUs are mostly counties, groups of counties, or independent cities (SIPP Quality Profile, 3rd Ed. [U.S. Census Bureau, 1998a, Chapter 3]), which are sampled with probability proportional to size within strata. The PSUs are sampled without replacement so that no PSU is selected more than once for the sample. Some PSUs are so large that they are included in the sample with certainty. Because no sampling is involved, those PSUs are, in fact, not PSUs but strata. The actual PSUs for those certainty selections are the enumeration districts and other units selected within them.

Although the SIPP PSUs are selected without replacement (as is the case with most multistage designs), for the purpose of variance estimation they are treated as if they were sampled with replacement. The with-replacement assumption greatly facilitates variance estimation since it means that variance estimates can be computed by taking into account only the PSUs and strata, without the need to consider the complexities of the subsequent stages of sample selection. This widely used simplifying assumption leads to an overestimation of variances, but the overestimation is not great.

Several software packages are available for computing variances of a wide range of survey estimates (e.g., means and proportions for the total sample and for subclasses, for differences in means and proportions between subclasses, and for regression and logistic regression coefficients) from complex sample designs. Many of these packages are listed on the Web: http://www.fas.harvard.edu/~stats/survey-soft/survey-soft.html. Lepkowski and Bowles (1996) examined eight of the packages.

These packages use a variety of methods for variance estimation. Some use an approach based on a Taylor-series approximation, or linearization, method. Others use a replication method, such as jackknife repeated replications or balanced repeated replications. Although some methods have advantages in some situations, there is generally little to recommend one method over another. The variance estimates they produce are not identical, but the differences are usually small. See Wolter (1985) and Rust (1985) for discussions of these methods.

## Variance Units and Variance Strata, 1990–1993 Panels

For the 1990–1993 SIPP Panels, the sample member record contains information concerning the PSU and stratum within which the member was sampled. This information is needed as input for all of the specialized software packages. The original PSU and strata codes are not included in the SIPP public use data files, however, to avoid potential identification of small geographic areas and sampled individuals. Instead, sets of PSUs are combined across strata to produce variance units and variance strata, with two variance units in each variance stratum. Variance units and variance strata may be treated as PSUs and strata for variance estimation purposes. Their use does not give rise to any bias in the variance estimates. The variance estimates are somewhat less precise, however, than those obtained from the use of the PSUs and strata that have not been combined.

Under the complex sample design, the number of degrees of freedom for variance estimation depends on the number of variance strata. The 1984 SIPP Panel consists of 142 variance units in 71 variance strata; the panels between 1985 and 1991 have 144 variance units and 72 variance strata; and the 1992–1993 Panels have 198 variance units and 99 variance strata. As a rough approximation, the number of degrees of freedom for a variance estimate is the number of variance strata. Thus, for national estimates, the variance estimates have about 71 degrees of freedom for the 1984 Panel, 72 degrees of freedom for the 1985–1991 Panels, and 99 degrees of freedom for the 1992–1993 Panels. Regional estimates will have fewer degrees of freedom because such estimates include only some of the variance strata.

Table 7-1 displays the variable names for the variance stratum and variance unit codes in the SIPP core wave files and the SIPP full panel files. These codes can be employed as stratum and PSU codes in any of the software packages for variance estimation with complex sample designs.

Table 7-1. Variance Stratum Code and Variance Unit Code in SIPP Files, 1990–1993

 Variable for Variance Estimation: SIPP Core Wave File SIPP Full Panel File Variance stratum code HSTRAT VARSTRAT Variance unit (or half-sample) code HHSC HALFSAMP

## Replication Weights for the 1996 Panel

Analysts should use Fay’s method for estimating variances for the 1996 SIPP Panel. Fay’s method is a modified balanced repeated replication (BRR) method of variance estimation. The difference between the basic BRR method and Fay’s method is that the BRR method uses replicate factors of 0 and 2, whereas Fay’s method uses one factor, k, which is in the range (0, 1), with the other factor equal to 2 – k. In Fay’s method, the introduction of the perturbation factor (1 – k) allows the use of both halves of the sample. Thus, Fay’s method has the advantage that no subset of the sample units in a particular classification will be totally excluded. The variance formula for Fay’s method is

 [D]

where

The 1996 SIPP Panel uses 108 replicate weights, which are calculated on the basis of a perturbation factor of 0.5 (k = 0.5). Inserting those values into Equation (7-1) results in the 1996 SIPP Panel variance formula of

[D]

The Census Bureau used VPLX software to compute the replicate weights that are available through DataFerrett.