U.S. Department of Commerce

Information & Communication Technology Survey

You are here: Census.govBusiness & IndustryInformation and Communication TechnologyHistorical Data › 2007
Skip top of page navigation

ESTIMATION

RELIABILITY OF THE ESTIMATES



Appendix B.
Sampling and Estimation Methodologies

The estimates in this report are based on a stratified simple random sample. The ICTS sample consists of 47,818 companies with paid employees (determined by the presence of payroll) in 2006.

The scope of the survey was defined to include all private, nonfarm, domestic companies. Major exclusions from the frame were government-owned operations (including the U.S. Postal Service), foreign-owned operations of domestic companies, establishments located in U.S. Territories, establishments engaged in agricultural production (not agricultural services), and private households.

The 2006 Business Register (BR) was used to develop the 2007 ICTS sample frame. The BR is the U.S. Census Bureau's establishment-based database. The database contains records for each physical business entity with payroll located in the United States, including company ownership information and current-year administrative data. In creating the ICTS frame, establishment data in the BR file were consolidated to create company-level records. Employment and payroll information was maintained for each six-digit North American Industry Classification System1 (NAICS) industry in which the company had activity. Next, payroll data for each company-level record were run through an algorithm to assign the company, first to an industry sector (i.e., manufacturing, construction, etc.), then to a subsector (three-digit NAICS code), then to an industry group (four-digit NAICS code), then to an industry (five-digit NAICS code), and finally to an ICTS industry code based on the industry. The resulting sample frame contained slightly more than 6.3 million companies.

The 2007 ICTS sampling frame consists of a certainty portion and a noncertainty portion. The 17,688 companies with 500 or more employees were selected with certainty. The remaining companies with 1 to 499 employees were then grouped into 135 industry categories. Each industry was then further divided into four strata. Since noncapitalized expenditures data were not available on the sampling frame, 2006 payroll was used as the stratification variable. The stratification methodology resulted in minimizing the sample size subject to a desired level of reliability for each industry. The expected relative standard errors (RSEs) ranged from 1 to 3 percent.

ESTIMATION

Each company selected for the survey has a sample weight which is the inverse of its probability of selection. All sampled companies within the same stratum and industry grouping have the same weight. Weights were increased to adjust for nonresponse. The coverage rate for all companies was 89.8 percent. The coverage rate is calculated by multiplying 100 by the ratio of noncapitalized and capitalized expenditures of all reporting companies weighted by the original sample weights, to the noncapitalized and capitalized expenditures of all reporting companies weighted by the adjusted-for-nonresponse sample weights. Weight adjustment and publication estimation are described in the following subsections.

1North American Industry Classification System (NAICS) – United States, 2002. For sale by National Technical Information Service (NTIS), Springfield, VA 22161. Call NTIS at 1-800-553-6847.

Weight Adjustment

For estimation purposes, each company was placed into 1 of 4 response-related categories:

1. Respondents.

2. Nonrespondents.

3. Not in business.

4. Known duplicates.

A company was considered a respondent or nonrespondent based on whether the company provided sufficient data in items 1, 2 or 3 of the survey form. Companies that went out of business prior to 2007 and duplicates were dropped from the survey. Companies that went out of business during the survey year were kept in the sample and efforts were made to collect data for the period the company was active.

ICTS segment. The following discussion assumes 658 strata (strata designation h = 1, 2, . . ., 658) which are based on 135 industries, each normally containing five strata (including the certainty stratum), which would be a maximum of 675 strata. Where there is insufficient sample size to justify distinct strata, they were collapsed together. In 2007, 34 strata were collapsed into 17 strata.

The original stratum weights (Wh) were adjusted to compensate for nonresponse. The adjusted weight is computed as follows:

This is the equation for nonresponse weight adjustment to the sampling weights

where,

Wh(adj)is the adjusted stratum weight of the hth stratum
This is the equation of the original sampling weightis the original stratum weight of the hth stratum
Nh is the population size of the hth stratum
nh is the sample size of the hth stratum
Phr is the sum of total company payroll for respondent
companies in stratum h
Phn is the sum of total company payroll for nonrespondent
companies in stratum h

Publication Estimation

Publication cell estimates were computed by obtaining a weighted sum of reported values for companies treated as respondents. For those strata undergoing nonresponse adjustment, the estimates may be biased, since this method assumes that nonresponse is a purely random event. No attempt was made to estimate the magnitude of this bias.

ICTS segment. The ICTS estimates were derived as follows. Each estimated cell total,This is the notation for estimated cell total , is of the form

This is the equation for estimating a cell total

where,

Wh(adj) is the adjusted weight of the hth stratum
X(j)i,h is the value attributed to the ith company of stratum h,
where j is the publication cell of interest.

Note: Although a company was assigned to and sampled in one ICTS industry, it could report expenditures in multiple ICTS industries. When this occurred, the reported data for all industries were inflated by the weight in the sample industry.

RELIABILITY OF THE ESTIMATES

The values shown in this report are estimates from a sample and will differ from the data which would have been obtained from a different sample or a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the measurable sampling errors but also on the nonsampling errors that are not explicitly measured. For any particular estimate, the total error may considerably exceed the measured sampling error.

Sampling Variability

The sample used in this survey is one of many possible samples that could have been selected using the sampling methodology described earlier. Each of these possible samples would likely yield different results. The relative standard error (RSE) is a measure of the variability among the estimates from all possible samples using this methodology. The RSEs were calculated using a delete-a-group jackknife replicate variance estimator. The RSE accounts only for sampling variability, and does not account for any nonsampling error or systematic biases in the estimates. A bias is the difference, averaged over all possible samples of the same design and size, between the estimate and the true value being estimated.

The RSEs presented in the tables can be used to derive the standard error (SE) of the estimate. The SE can be used to derive interval estimates with prescribed levels of confidence that the interval includes the average results of all samples:

a. intervals defined by one SE above and below the sample estimate will contain the true value about 68 percent of the time.

b. intervals defined by 1.6 SE above and below the sample estimate will contain the true value about 90 percent of the time.

c. intervals defined by two SEs above and below the sample estimate will contain the true value about 95 percent of the time.

The SE of the estimate can be calculated by multiplying the RSE presented in the tables by the corresponding estimate. Note, the RSE is the measure of variability presented for all estimates in this publication except for the estimates of percent changes presented in Table 2a[xls, 23KB], for which we provide the SE as the measure of variability (refer to Table 2b[xls, 22KB]). Also note that RSEs in this publication are in percentage form. They must be divided by 100 before being multiplied by the corresponding estimate.

Examples of Calculating a Confidence Interval:
a. For a data value: using data from Table 4a[xls, 25KB] and Table 4b[xls, 24KB], the SE for 2007 total nondurable manufacturing noncapitalized expenditures would be calculated as follows:

This is an example of calculating a standard error for a total

The 90-percent confidence interval can be constructed by multiplying 1.6 by the SE, adding this value to the estimate to create the upper bound, and subtracting it from the estimate to create the lower bound.

This is the formula for a 90 percent confidence interval of a total

Using data from Table 4a[xls, 25KB], for 2007 total nondurable manufacturing noncapitalized equipment expenditures, a 90-percent confidence interval would be calculated as:

$2,494 million ± 1.6*($27.4 million) = $2,494 ± $44 million

This implies 90 percent confidence that the interval $2,450 million to $2,538 million contains the actual total for nondurable manufacturing noncapitalized equipment expenditures, subject to further nonsampling errors.

b. For percent change: using data from Table 2a[xls, 23KB] and Table 2b[xls, 22KB], the 90-percent confidence interval can be constructed by multiplying 1.6 by the SE of the percent change, adding this value to the estimated percent change to create the upper bound, and subtracting it from the estimate to create the lower bound. For example, for the noncapitalized expenditures in the Health care and social assistance sector, the estimated percent change from 2006 to 2007 is 15.5 percent (from Table 2a[xls, 23KB]), and the standard error of this estimate is 12.9 percent (from Table 2b[xls, 22KB]).

This is an example of calculating a confidence interval for an estimated percent change in totals between two cycles

This implies 90 percent confidence that the interval –5.1 percent to +36.1 percent contains the actual percent change for noncapitalized expenditures in the Health care and social assistance sector. Since this interval contains zero (0), we do not have sufficient evidence to conclude that the estimated percent change was statistically different from 0, i.e., the percent change is not statistically significant.

Examples of Calculating Absolute Differences and Percent Changes

Data for the current year along with revised data for the prior year are presented in this publication. Two numbers of interest for many data users may be the absolute difference between the prior year and the current year, and the percent change from the prior year to the current year.

The absolute difference is calculated as:

This is the equation for calculating the absolute difference in totals between two cycles

and a 90-percent confidence interval on this difference is estimated as:

This is the formula for a confidence interval for the difference in totals over two cycles

As an example, for the capitalized equipment expenditures for computer and peripheral equipment in the Retail trade sector, from Table 4c[xls, 23KB], the estimate for 2007 is $6,165 with the RSE found in Table 4d[xls, 25KB] as 1.7, and for 2006 the revised estimate from Table 4c[xls, 25KB] is $5,806 with the RSE found in Table 4d[xls, 25KB] as 1.5. The above calculations would be:

This is an example of calculating an absolute difference in totals between two cycles

And the 90-percent confidence interval is estimated as:

This is an example of calculating a confidence interval for an absolute difference in totals between two cycles


so the 90-percent confidence interval is $359 +/- $218 million, or $141 million to $577 million.

The percent change is calculated as 100 multiplied by the ratio of the difference divided by the prior estimate.

So continuing with the example from above,

This is an example of calculating the percent change in totals between two cycles

and a 90-percent confidence interval on this percent change is estimated as:

This is an example of calculating the 90% confidence interval for the percent change in totals between two cycles

so the 90-percent confidence interval is 6.18 percent +/- 3.85 percent or 2.33 percent to 10.03 percent.

Nonsampling Error

All surveys and censuses are subject to nonsampling errors. Nonsampling errors can be attributed to many sources: inability to obtain information about all companies in the sample; inability or unwillingness on the part of respondents to provide correct information; response errors; definition difficulties; differences in the interpretation of questions; mistakes in recording or coding the data; and other errors of collection, response, coverage, and estimation for nonresponse.

Explicit measures of the effects of these nonsampling errors are not available. However, to minimize nonsampling error, all reports were reviewed for reasonableness and consistency, and every effort was made to achieve accurate response from all survey participants.

Coverage errors may have a significant effect on the accuracy of estimates for this survey. The BR, which forms the basis of our survey universe frame, may not contain all businesses. Also, businesses that are contained in the BR may have their payroll misreported.


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe. [Excel] or the letters [xls] indicate a document is in the Microsoft® Excel® Spreadsheet Format (XLS). To view the file, you will need the Microsoft® Excel® Viewer Off Site available for free from Microsoft®. This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
Source: U.S. Census Bureau | Information and Communication Technology | (301) 763-3324 |  Last Revised: June 20, 2011