Sampling and Estimation Methodologies
The estimates in this report are based on a stratified simple random sample. The ICTS sample consists of 47,818 companies with paid employees (determined by the presence of payroll) in 2006.
The scope of the survey was defined to include all private, nonfarm, domestic companies. Major exclusions from the frame were government-owned operations (including the U.S. Postal Service), foreign-owned operations of domestic companies, establishments located in U.S. Territories, establishments engaged in agricultural production (not agricultural services), and private households.
The 2006 Business Register (BR) was used to develop the 2007 ICTS sample frame. The BR is the U.S. Census Bureau's establishment-based database. The database contains records for each physical business entity with payroll located in the United States, including company ownership information and current-year administrative data. In creating the ICTS frame, establishment data in the BR file were consolidated to create company-level records. Employment and payroll information was maintained for each six-digit North American Industry Classification System1 (NAICS) industry in which the company had activity. Next, payroll data for each company-level record were run through an algorithm to assign the company, first to an industry sector (i.e., manufacturing, construction, etc.), then to a subsector (three-digit NAICS code), then to an industry group (four-digit NAICS code), then to an industry (five-digit NAICS code), and finally to an ICTS industry code based on the industry. The resulting sample frame contained slightly more than 6.3 million companies.
The 2007 ICTS sampling frame consists of a certainty portion and a noncertainty portion. The 17,688 companies with 500 or more employees were selected with certainty. The remaining companies with 1 to 499 employees were then grouped into 135 industry categories. Each industry was then further divided into four strata. Since noncapitalized expenditures data were not available on the sampling frame, 2006 payroll was used as the stratification variable. The stratification methodology resulted in minimizing the sample size subject to a desired level of reliability for each industry. The expected relative standard errors (RSEs) ranged from 1 to 3 percent.
Each company selected for the survey has a sample weight which is the inverse of its probability of selection. All sampled companies within the same stratum and industry grouping have the same weight. Weights were increased to adjust for nonresponse. The coverage rate for all companies was 89.8 percent. The coverage rate is calculated by multiplying 100 by the ratio of noncapitalized and capitalized expenditures of all reporting companies weighted by the original sample weights, to the noncapitalized and capitalized expenditures of all reporting companies weighted by the adjusted-for-nonresponse sample weights. Weight adjustment and publication estimation are described in the following subsections.
1North American Industry Classification System (NAICS) – United States, 2002. For sale by National Technical Information Service (NTIS), Springfield, VA 22161. Call NTIS at 1-800-553-6847.
For estimation purposes, each company was placed into 1 of 4 response-related categories:
3. Not in business.
4. Known duplicates.
A company was considered a respondent or nonrespondent based on whether the company provided sufficient data in items 1, 2 or 3 of the survey form. Companies that went out of business prior to 2007 and duplicates were dropped from the survey. Companies that went out of business during the survey year were kept in the sample and efforts were made to collect data for the period the company was active.
ICTS segment. The following discussion assumes 658 strata (strata designation h = 1, 2, . . ., 658) which are based on 135 industries, each normally containing five strata (including the certainty stratum), which would be a maximum of 675 strata. Where there is insufficient sample size to justify distinct strata, they were collapsed together. In 2007, 34 strata were collapsed into 17 strata.
The original stratum weights (Wh) were adjusted to compensate for nonresponse. The adjusted weight is computed as follows:
Wh(adj)is the adjusted stratum weight of the hth stratum
is the original stratum weight of the hth stratum
Nh is the population size of the hth stratum
nh is the sample size of the hth stratum
Phr is the sum of total company payroll for respondent
companies in stratum h
Phn is the sum of total company payroll for nonrespondent
companies in stratum h
Publication cell estimates were computed by obtaining a weighted sum of reported values for companies treated as respondents. For those strata undergoing nonresponse adjustment, the estimates may be biased, since this method assumes that nonresponse is a purely random event. No attempt was made to estimate the magnitude of this bias.
ICTS segment. The ICTS estimates were derived as follows.
Each estimated cell total, , is
of the form
Wh(adj) is the adjusted weight of the hth stratum
X(j)i,h is the value attributed to the ith company of stratum h,
where j is the publication cell of interest.
Note: Although a company was assigned to and sampled in one ICTS industry, it could report expenditures in multiple ICTS industries. When this occurred, the reported data for all industries were inflated by the weight in the sample industry.
The values shown in this report are estimates from a sample and will differ from the data which would have been obtained from a different sample or a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the measurable sampling errors but also on the nonsampling errors that are not explicitly measured. For any particular estimate, the total error may considerably exceed the measured sampling error.
The sample used in this survey is one of many possible samples that could have been selected using the sampling methodology described earlier. Each of these possible samples would likely yield different results. The relative standard error (RSE) is a measure of the variability among the estimates from all possible samples using this methodology. The RSEs were calculated using a delete-a-group jackknife replicate variance estimator. The RSE accounts only for sampling variability, and does not account for any nonsampling error or systematic biases in the estimates. A bias is the difference, averaged over all possible samples of the same design and size, between the estimate and the true value being estimated.
The RSEs presented in the tables can be used to derive the standard error (SE) of the estimate. The SE can be used to derive interval estimates with prescribed levels of confidence that the interval includes the average results of all samples:
a. intervals defined by one SE above and below the sample estimate will contain the true value about 68 percent of the time.
b. intervals defined by 1.6 SE above and below the sample estimate will contain the true value about 90 percent of the time.
c. intervals defined by two SEs above and below the sample estimate will contain the true value about 95 percent of the time.
The SE of the estimate can be calculated by multiplying the RSE presented in the tables by the corresponding estimate. Note, the RSE is the measure of variability presented for all estimates in this publication except for the estimates of percent changes presented in Table 2a[xls, 23KB], for which we provide the SE as the measure of variability (refer to Table 2b[xls, 22KB]). Also note that RSEs in this publication are in percentage form. They must be divided by 100 before being multiplied by the corresponding estimate.
Examples of Calculating a Confidence Interval:
a. For a data value: using data from Table 4a[xls, 25KB] and Table 4b[xls, 24KB], the SE for 2007 total nondurable manufacturing noncapitalized expenditures would be calculated as follows:
The 90-percent confidence interval can be constructed by multiplying 1.6 by the SE, adding this value to the estimate to create the upper bound, and subtracting it from the estimate to create the lower bound.
Using data from Table 4a[xls, 25KB], for 2007 total nondurable manufacturing noncapitalized equipment expenditures, a 90-percent confidence interval would be calculated as:
$2,494 million ± 1.6*($27.4 million) = $2,494 ± $44 million
This implies 90 percent confidence that the interval $2,450 million to $2,538 million contains the actual total for nondurable manufacturing noncapitalized equipment expenditures, subject to further nonsampling errors.
b. For percent change: using data from Table 2a[xls, 23KB] and Table 2b[xls, 22KB], the 90-percent confidence interval can be constructed by multiplying 1.6 by the SE of the percent change, adding this value to the estimated percent change to create the upper bound, and subtracting it from the estimate to create the lower bound. For example, for the noncapitalized expenditures in the Health care and social assistance sector, the estimated percent change from 2006 to 2007 is 15.5 percent (from Table 2a[xls, 23KB]), and the standard error of this estimate is 12.9 percent (from Table 2b[xls, 22KB]).
This implies 90 percent confidence that the interval –5.1 percent to +36.1 percent contains the actual percent change for noncapitalized expenditures in the Health care and social assistance sector. Since this interval contains zero (0), we do not have sufficient evidence to conclude that the estimated percent change was statistically different from 0, i.e., the percent change is not statistically significant.
Data for the current year along with revised data for the prior year are presented in this publication. Two numbers of interest for many data users may be the absolute difference between the prior year and the current year, and the percent change from the prior year to the current year.
The absolute difference is calculated as:
and a 90-percent confidence interval on this difference is estimated as:
As an example, for the capitalized equipment expenditures for computer and peripheral equipment in the Retail trade sector, from Table 4c[xls, 23KB], the estimate for 2007 is $6,165 with the RSE found in Table 4d[xls, 25KB] as 1.7, and for 2006 the revised estimate from Table 4c[xls, 25KB] is $5,806 with the RSE found in Table 4d[xls, 25KB] as 1.5. The above calculations would be:
And the 90-percent confidence interval is estimated as:
so the 90-percent confidence interval is $359 +/- $218 million, or $141 million to $577 million.
The percent change is calculated as 100 multiplied by the ratio of the difference divided by the prior estimate.
So continuing with the example from above,
and a 90-percent confidence interval on this percent change is estimated as:
so the 90-percent confidence interval is 6.18 percent +/- 3.85 percent or 2.33 percent to 10.03 percent.
All surveys and censuses are subject to nonsampling errors. Nonsampling errors can be attributed to many sources: inability to obtain information about all companies in the sample; inability or unwillingness on the part of respondents to provide correct information; response errors; definition difficulties; differences in the interpretation of questions; mistakes in recording or coding the data; and other errors of collection, response, coverage, and estimation for nonresponse.
Explicit measures of the effects of these nonsampling errors are not available. However, to minimize nonsampling error, all reports were reviewed for reasonableness and consistency, and every effort was made to achieve accurate response from all survey participants.
Coverage errors may have a significant effect on the accuracy of estimates for this survey. The BR, which forms the basis of our survey universe frame, may not contain all businesses. Also, businesses that are contained in the BR may have their payroll misreported.