Sampling and Estimation Methodologies
The estimates are based on two distinct stratified simple random samples. The first sample, receiving the ACE-1 form, are 46,427 companies with paid employees as determined by nonzero payroll in the previous year, 2007. The second sample, receiving the ACE-2 form, are 14,981 businesses without paid employees. Appendix D has examples of each type of survey form.
The survey's scope includes all private, nonfarm, domestic companies. Major exclusions from the frame are government-owned operations, including the U.S. Postal Service, foreign-owned operations of domestic companies, establishments located in U.S. territories, establishments engaged in agricultural production (but including agricultural services), and private households.
The 2007 final version of the Census Bureau's establishment-based database, the Business Register (BR) was used to develop the 2008 sampling frame. This database contains records for each physical business entity, the establishment, with payroll located in the United States. Records include company ownership information and current-year administrative data, such as payroll.
In creating the ACE-1 frame, establishment data are consolidated to create company-level records for companies that have more than one establishment. This created a frame of slightly more than 6 million companies. To create business activity classifications, the employment and payroll data for each establishment in that company was gathered on its assigned 2002 six-digit North American Industry Classification System1 (NAICS) industry. This data is then assigned to an industry sector that has the most payroll (i.e., manufacturing, construction, etc.), then subsector within that sector, industry group within that subsector, then industry within the industry group. This company is assigned a 2002 NAICS industry is then recoded to an Annual Capital Expenditures Survey (ACES) code.
The 2008 ACE-1 sampling frame is partitioned into two portions: the certainty and noncertainty parts. The certainty portion is a group of 17,916 companies that had 500 or more employees in the frame year. These are all placed in the sample. The nearly 6 million remaining companies have between 1 to 499 employees, and are stratified into 1 of the 135 ACES industry categories. Each ACES industry is subdivided into four substrata based on diminishing 2007 value of payroll. The methodology used to determine how to create the substrata minimizes the sample size subject to a relative desired level of reliability. Samples are chosen from each of these 135 ACES strata and their 4 substrata. Collectively, 28,511 companies were chosen from this part of the ACE-1 frame.
The ACE-2 sample frame is a composite frame of four categories of small businesses, all treated as independent stratum. The 2007 BR is the source of the first two of these groups: companies without payroll in the prior year or employment on March 12th of in the prior year but had paid employees in the past and some IRS activity in the last 5 years, and companies that applied for an employer identification number (EIN) in the last 2 years, but still have no payroll or employment. A special 2007 nonemployer database is the source of the other two groups: nonemployer corporations and partnerships, and nonemployer sole proprietorships with receipts of $1000 or more. Collectively, there were about 29.0 million nonemployers. A simple random sample of different sizes was taken from each group, resulting in a sample of 14,981 selected companies.
1Information about NAICS can be found at http://www.census.gov/eos/www/naics/
Sampling Weights and Weight Adjustment for Nonresponse
After being given an initial sampling weight, the weights could be further adjusted based on activity and response status. The goal is to have the in-scope responding sample reflect the frame. Each sampled company becomes either a respondent, a nonrespondent, or is out of scope if found to have been out of business prior to the survey year or is a duplicate to another record. Companies that went out of business during the survey year are still in-scope, and efforts are made to collect data for the period the company was active.
A company is a respondent if they return a report, and they report nonzero amounts for item 111 (Capital Expenditures) or item 2 (more detailed Capital Expenditures) on the ACE-1 form, or item 1 (Capital Expenditures) of the ACE-2 form. Respondents will have their sampling weights adjusted upwards to account for the nonrespondents, such that the respondents still represent the entire in-scope population. The adjustment for ACE-1 respondents is based on the outstanding payroll nonrespondents account for in each ACES industry by substrata, while for ACE-2 respondents it is based solely on the percentage of companies not reporting, regardless of size. In addition, companies who are deemed 'extreme outliers' may have their weights further reduced to minimize the mean squared error of the estimates.
ACE-1 segment. The following discussion assumes 675 substrata (substrata designation h = 1, 2, . . ., 675) which are based on the 135 ACES industries, each containing five strata (four noncertainty strata and the certainty stratum). The sampling weights (Wh) are adjusted for nonresponse based on payroll:
Wh(adj): adjusted substratum weight of the hth substratum
Wh: substratum sampling weight of the hth substratum
Nh: population size of the hth substratum
nh: sample size of the hth substratum
Phr: sum of total company payroll for respondents in substratum h
Phn: sum of total company payroll for nonrespondents in substratum h
ACE-2 segment. The ACE-2 segment initially was stratified into four strata based on the four small business categories mentioned above. The stratum consisting of ''companies with no payroll in the prior year and no employees on March 12 in the prior year, but with payroll in previous years'' was poststratified into two strata. The stratum ''companies which had received an Employer Identification Number (EIN) within the last 2 years, but for which no payroll, employment, or receipts data have yet been received'' was poststratified into two strata. In both instances, the poststratification was based on updated administrative record data. This method resulted in six strata (strata designation h = 1, 2, . . ., 6). The stratum population sizes, sample sizes, response counts, and stratum weights for the four new strata resulting from the poststratification were modified accordingly, while the other two strata retained the original weights.
The ACE-2 stratum weights (Wh) were also adjusted to compensate for nonresponse based on number of respondents:
Wh(adj): adjusted stratum weight of the hth stratum
Wh: stratum weight of the hth stratum
Nh: population size of the hth stratum
nh: sample size of the hth stratum
rh: number of respondents in the hth stratum
Publication cell estimates were computed by obtaining a weighted sum of reported values for respondents. These estimates may be biased from the nonresponse adjustment, since its is assumed nonresponse is a purely random event, which it may not be. No attempt to measure the bias is made.
ACE-1 Estimation: The ACE-1 estimates, are (assuming 675 substrata)
Wh(adj): adjusted weight of the hth substratum
X(j),i,h: value attributed
to the ith company of substratum h, where j is
the publication cell of interest.
Note: Although a company is assigned to and sampled from a single ACES industry, it can report capital expenditures in several ACES industries. Reported data for all industries are inflated by the weight in the sample industry of the respondent.ACE-2 segment. The ACE-2 estimates, , are (with k=6 in 2008):
Wh(adj) :adjusted weight of the hth stratumX(j),i,h :value attributed to the ith company in stratum h, where j
The estimates are derived from sample data, and will differ from results derived from data from other samples or a complete census of the population. A sample and a census will both experience errors classified as nonsampling errors, which often introduce systematic bias into the results. Bias is the difference, averaged over all possible samples of the same design and size, between the estimate and the true value being estimated. These types of errors are not explicitly measured. Only samples have sampling errors, the error from only observing a subset of the population. With a probability sample, this type of error can be explicitly measured. For any particular estimate though, the total error from sampling and nonsampling error may considerably exceed the measured error.
The sample selected is only one of the many possible samples that could have been selected, with each possible sample producing possibly different results. The relative standard error (RSE) measures the variability among the possible estimates from these possible samples, relative to the estimates. These are calculated using a delete-a-group jackknife replicate variance estimator. The RSEs in the tables can be used to derive the standard error (SE), which can then be used to create interval estimates with prescribed levels of confidence.
The SE of the estimate is calculated by multiplying the RSE by its corresponding estimate. Note, the RSE is the measure of variability presented for all estimates in this publication except for the estimates of percent change. RSEs are also given as a percentage, and need to be divided by 100 before used to calculate the SE.
In general, those intervals defined by 1.6 standard errors above and below the sample estimate will contain the true population value about 90 percent of the time, while those intervals defined by 2 standard errors above and below the sample estimate will contain the true population value about 95 percent of the time. These intervals are called confidence intervals. Note that the SE is in the same units as the estimate, while the RSE is unitless."table 4a" and RSEs from table "table 4c", the SE for nondurable manufacturing total capital expenditures would be calculated as follows:
= (8.7 / 100) * $108,215 million = $9,415 million.
The 90-percent confidence interval can be constructed by multiplying 1.6 by the SE to create the margin of error (MOE), and adding and subtracting the MOE to the estimate. The 90% confidence interval for the estimate of nondurable manufacturing total capital expenditures is then:
$108,215 million ± [1.6*$9,415 million] = $108,215 ± $15,064 million
This implies that using the sampling method described, we are 90% confident that the true value of total capital expenditures for this subsector is between ($108,215-$15,064) $93,151 million and ($108,215+$15,064) $123,279 million. Since this confidence interval does not contain zero (0), we also have sufficient evidence to conclude that the estimated change was statistically larger than 0, i.e., this sector showed an increase in the amount of capital expenditures. This does not consider any additional issues due to nonsampling errors.
b. Calculating a confidence interval for a percent change of an estimate between two survey years: using estimates from table 2a and SEs from table 2b, the 90-percent confidence interval can be constructed by multiplying 1.6 by the SE of the percent change to create the MOE, and adding and subtracting the MOE to the estimate. For example, for the nondurable manufacturing total capital expenditures, the estimated percent change from 2007 to 2008 is 20.7% (from Table 2a), and the standard error of this estimate is 10.6 percent (from Table 2b)
20.7% ± [1.6 * 10.6%] = 20.7% ± 17.0%
This implies that using the sampling method described, we are 90% confident that the true value of the percentage change in this sector is between (20.7%-17.0%) 3.7% and (20.7% + 17.0% ) 37.7%. Since this confidence interval does not contain zero (0), we also have sufficient evidence to conclude that the estimated percent change was statistically larger than 0, i.e., this sector showed an increase in the amount of capital expenditures. This does not consider any additional issues due to nonsampling errors.
Data for the current year along with revised data for the prior year are presented in this publication. Two numbers of interest for many data users may be the difference between the prior year and the current year, and the percent change from the prior year to the current year.
The difference is calculated as:
and the MOE for a 90-percent confidence interval on this difference:
As an example, for the nondurable goods manufacturing, from table 4a the total expenditures estimate for 2008 is $108,215 with the RSE found in "table 4c" as 8.7. The revised 2007 estimate from table 4b is $89,633 with the RSE found in table 4d as 1.2. The difference would be be:
[$108,215 million - $89,633 million] = $18,582 million
And the MOE for the 90-percent confidence interval is estimated as follows, including translating the RSEs into SEs:
= 1.6 * √ [ ((8.7/100) * $108,215 million )2 + ((1.2/100) * $89,633 million )2 ]
= 1.6 * √ [ (0.087 * $108,215 million )2 + (0.012 * $89,633 million )2 ]
= 1.6 * √ [ 88,636,670 + 1,156,907] million2
= 1.6 * √ [89,793,577] million2
= 1.6 * 9476 million= $ 15,162 million
The 90-percent confidence interval for the difference between the two years is $18,582 million ± $15,162 million, or the interval of $3,420 million to $33,774 million. At the 90-percent confidence level, the change is significant. In this instance, however, a 95-percent confidence interval, with a larger confidence interval, would have a lower bound below 0, and would be interpreted as not significant at that confidence level.
The percent change is calculated as 100 multiplied by the ratio of the difference divided by the prior estimate.
So continuing with the example from above,
= 100 * ($18,582/ $89,633)
This is the number we used above in part b, which we took from table 2a. The MOE for a 90-percent confidence interval on this is estimated as:
= 1.6 * 100 * ($108,215 / 89,633 ) * √ [(8.7/100)2 + (1.2/100)2 ]
= 1.6 * 100 *(1.21) * √ [(0.087)2 + (0.012)2 ]
= 160 * 1.21 * √ [ .0077 ]
= 193 * 0.088
= 17.0 %
so the 90-percent confidence interval for the percent change is 20.7% ± 17.0%, or 3.7% to 37.7%. Since this interval does not contain zero (0), we can conclude that the positive percentage change from 2007 to 2008 is statistically significant at the 90-percent confidence level.
All surveys and censuses are subject to nonsampling errors. Nonsampling errors can be attributed to many sources, including: inability to obtain information about all companies in the sample; inability or unwillingness on the part of respondents to provide correct information; difficulties in defining concepts; differences in the interpretation of questions; mistakes in recording or coding the data; and other errors of collection, response, coverage, and estimation for nonresponse.
Explicit measures of the effects of these nonsampling errors are not available. However, to minimize total nonsampling error, all reports were reviewed for reasonableness and consistency, and every effort was made to achieve accurate response from all survey participants. Coverage errors, errors from not including companies that are in-scope of the survey or mistakenly including those that are out-of-scope as eligible, may have a significant effect on the accuracy of estimates for this survey. The Business Register, which forms the basis of our survey universe frame, may not contain all in-scope businesses, or have incorrect values of payroll that then affect how they are sampled and their impact of their responses through their sampling weights.
A more detailed profile on the quality of the Annual Capital Expenditures Survey is available on request. Please contact the Business Investment Branch of the Company Statistics Division at 301-763-3324.