METHODSSAMPLE DESIGN RELIABILITY OF ESTIMATES CONFIDENTIALITY AND DISCLOSURE
The sampling frame was constructed from files of truck registrations identified as being active as of July 1, 2002. Due to difficulty in obtaining up-to-date vehicle registration records for New Hampshire, the sample for this state was drawn as of September 1, 2001. The frame was stratified by geography and truck characteristics. The 50 states and the District of Columbia made up the 51 geographic strata. Body type and gross vehicle weight (GVW) determined the following five truck strata: 1) pickups; 2) minivans, other light vans, and sport utilities; 3) light single-unit trucks (GVW < 26,000 lb); 4) heavy single-unit trucks (GVW >= 26,000 lb; and 5) truck-tractors. Therefore, the sampling frame was partitioned into 255 geographic-by-truck strata. Within each stratum, a simple random sample of truck registrations was selected without replacement. This produced a total sample of approximately 136,000 truck registrations.back to top
An estimate of the number of trucks for a particular state and truck characteristic was computed in the following manner. Weighted estimates of the number of trucks having the characteristic of interest were computed for each of the five truck strata. The weight for a given truck was the product of two factors—the reciprocal of the truck’s probability of selection and a nonresponse adjustment factor. (See the Nonsampling Error section for a description of the nonresponse adjustment procedure.) The truck stratum estimates were summed to form a state-level estimate. Two types of truck miles estimates are provided. Distributed truck miles estimates, as shown in Table 8, were computed by apportioning each truck’s annual miles into the appropriate category based on the percent of miles driven in the category as reported by the respondent. Truck miles estimates presented in all other tables were computed by attributing 100 percent of an individual truck’s annual miles to the category with the greatest reported percentage. For example, say a particular truck was driven 50,000 miles in the survey year and the respondent indicated 80 percent of the trips were between 201 and 500 miles from the home base, while 20 percent of the trips were between 101 and 200 miles from the home base. In Table 8, 40,000 miles would be tabulated in the ’’201 to 500 miles’’ category and 10,000 miles would be tabulated in the ‘‘101 to 200 miles’’ category. In all other tables, 50,000 miles would be tabulated in the ‘‘201 to 500 miles’’ category. To compute an estimate of the average miles per truck, the total miles estimate was divided by the number of trucks estimate for the characteristic of interest.back to top
RELIABILITY OF THE ESTIMATES
Estimates in published tables are based on data from the 2002 VIUS and administrative records. To maintain confidentiality, no estimates are published that would disclose the operations of an individual truck. The total error or a published estimate may be considered to be comprised of sampling error and nonsampling error. Individuals who use the VIUS estimates to create new estimates should cite the Census Bureau as the source of only the original estimates.
The total error of an estimate based on a sample survey is the difference between the estimate and the population parameter that it estimates. This error may be considered to be comprised of sampling error and nonsampling error. Sampling error is the difference between the estimate and the result that would be obtained from a complete enumeration of the sampling frame conducted under the same survey conditions. This error occurs because characteristics differ among sampling units and because only a subset of the entire population is measured in a sample survey. Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate. The accuracy of a survey result may be affected by these two types of errors.
Sampling and nonsampling errors are often measured by the quantities, bias and variance. The bias of an estimator of a population parameter is the difference, averaged over all possible samples of the same size and design, between the estimator and the population parameter being estimated. (The population parameter is usually unknown.) Any systematic error, or inaccuracy that affects all samples of a specified design in a similar way, may bias the resulting estimates. The variance of an estimator is the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value.back to top
Measures of Sampling Variability
Because the estimates are based on a sample, exact agreement with the results that would be obtained from a complete enumeration of the truck registrations on the sampling frame is not expected. However, because each truck included on the sampling frame has a known probability of being selected into the sample, it is possible to estimate the sampling variability of the survey estimates.
The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same design. If all possible samples had been surveyed under the same conditions, an estimate of the population parameter of interest could have been obtained from each sample. These samples give rise to a distribution of estimates for the population parameter. A statistical measure of the variability among these estimates is the standard error, which can be approximated from any one sample. The standard error is defined as the square root of the variance. The coefficient of variation (or relative standard error) of an estimator is the standard error of the estimator divided by the estimator. Note that measures of sampling variability, such as the standard error and coefficient of variation, are estimated from the sample and are also subject to sampling variability. (Technically, we should refer to the estimated standard error or the estimated coefficient of variation of an estimator. However, for the sake of brevity, we have omitted this detail.) It is important to note that the standard error and coefficient of variation only measure sampling variability. They do not measure any systematic biases in the estimates. The U.S. Census Bureau recommends that individuals using estimates contained in this report incorporate this information into their analyses, as sampling error could affect the conclusions drawn from these estimates.
An estimate from a particular sample and the standard error associated with the estimate can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the result of a complete enumeration of the sampling frame conducted under the same survey conditions. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained, then:
- For approximately 90 percent of the possible samples, the interval from 1.645 standard errors below to 1.645 standard errors above the estimate would include the population parameter as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.
- For approximately 95 percent of the possible samples, the interval from 1.96 standard errors below to 1.96 standard errors above the estimate would include the population parameter as obtained from a complete enumeration of the sampling frame conducted under the same survey conditions.
To illustrate the computation of a confidence interval for an estimate of the number of trucks, assume that an estimate of trucks is 3,377.8 thousand and the coefficient of variation for this estimate is 2.9 percent, or 0.029. First obtain the standard error of the estimate by multiplying the number of trucks estimate by its coefficient of variation. For this example, multiply 3,377.8 thousand by 0.029. This yields a standard error of 97.9562 thousand. The upper and lower bounds of the 90-percent confidence interval are computed as 3,377.8 thousand plus or minus 1.645 times 97.9562 thousand. Consequently, the 90-percent confidence interval is 3,216.7 thousand to 3,538.9 thousand. If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of these intervals would contain the result obtained from a complete enumeration of all trucks on the sampling frame.back to top
Nonsampling error encompasses all other factors that contribute to the total error of a sample survey estimate and may also occur in censuses. It is often helpful to think of nonsampling error as arising from deficiencies or mistakes at some point in the survey process. Nonsampling error can be attributed to many sources:
- Inability to obtain information about all trucks in the sample,
- Response errors,
- Differences in the interpretation of the questions,
- Mistakes in coding or keying the data obtained, and
- Other errors of collection, response, coverage, and processing.
Although no direct measurement of the potential biases due to nonsampling error has been obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize its influence.
A potential source of bias in the estimates is nonresponse. Nonresponse is defined as the failure to obtain all the intended measurements or responses about all the trucks in the sample. Two types of nonresponse are often distinguished. Unit nonresponse is used to describe the failure to obtain any of the substantive measurements about a sampled truck. In most cases of unit nonresponse, the questionnaire was never returned to the Census Bureau after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits. The procedures used to account for unit and item nonresponse are discussed below.
Unit nonresponse is handled in the estimation procedure by reweighting. To apply this method of nonresponse adjustment, we make the assumption that the population of trucks can be divided into a finite number of mutually exclusive adjustment cells so that within each cell, all the population elements possess similar characteristics and share a similar probability of responding, if selected into the sample. The adjustment cells for the 2002 Vehicle Inventory and Use Survey (VIUS) are identical to the sampling strata. A nonresponse adjustment factor is computed for each adjustment cell and is equal to the ratio of the number of truck registrations selected into the sample to the number of responses received within each cell. In this sense, reweighting allocates characteristics to the nonrespondents in proportion to the characteristics observed for the respondents within each adjustment cell. The amount of bias introduced by this nonresponse adjustment procedure depends on the extent to which the nonrespondents differ, characteristically, from the respondents in each adjustment cell.
For item nonresponse, a missing value is replaced by a predicted value obtained from an appropriate model for nonresponse. This procedure is called imputation. To impute annual miles and lifetime miles, we divide the sample into a finite number of mutually exclusive cells based on state of registration, and related vehicle characteristics. For each cell, estimates of average annual miles and average lifetime miles are computed based on those trucks in the cell for which annual miles and lifetime miles have been reported. Missing values are then replaced with the appropriate average values. A slightly different imputation procedure is used to impute length and average weight (empty weight plus cargo weight). For these data items, we replace a missing value with data from a truck with similar characteristics for which length and average weight have been reported.
For all other data items, no imputation is performed. Instead, separate estimates are published in a ‘‘Not reported’’ category. For example, a respondent who did not indicate the type of business in which his/her truck was used would be included in the estimate for the ‘‘Not reported’’ category. Users of the estimates should exercise caution when allocating the estimate for the ‘‘Not reported’’ category to the estimates for the reported categories in the proportions observed for the reported categories. This is because the characteristics of the trucks for which we obtained information may differ significantly from those trucks for which we obtained no information.back to top
Additional statistics not shown in the tables are obtainable by tabulating records on a CD-ROM containing the survey microdata. These additional estimates have not been included in the published reports because of high sampling variability, poor response, or other factors that may make them potentially misleading. It should be noted that some unpublished estimates can be derived directly from these reports by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to the poor response rates or high sampling variability as previously described. Data users should take into account the magnitude of "Not Reported" categories when assessing estimates computed using data contained in the CD-ROM.
Individuals who use estimates from the published reports to create new estimates should cite the Census Bureau as the source of only the original estimates. Individuals who use the CD-ROM microdata to create estimates not published by the Census Bureau should cite the Census Bureau as the source of only the microdata used, and not as the source of the new estimates.back to top
CONFIDENTIALITY AND DISCLOSURE
Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and surveys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as confidential. Section 214 of Title 13 and Sections 3559 and 3571 of Title 18 of the United States Code provide for the imposition of penalties of up to five years in prison and up to $250,000 in fines for wrongful disclosure of confidential census information. In accordance with Title 13, no estimates are published that would disclose the operations of an individual firm.
The Census Bureau’s internal Disclosure Review Board sets the confidentiality rules for all data releases. A checklist approach is used to ensure that all potential risks to the confidentiality of the data are considered and addressed.
A disclosure of data occurs when an individual can use published statistical information to identify either an individual or firm that has provided information under a pledge of confidentiality. Disclosure limitation is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure limitation procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk for disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.