2022 Annual Wholesale Trade Survey Methodology

Annual Wholesale Trade Survey Methodology

In this Section

Sampling Frame
Survey Design and Size
Sample Maintenance
Data Collection
Estimation and Sampling Variance
Linking Samples
Benchmarking to the 2017 Economic Census
Disclosure Avoidance
Archived Methodology Pages

Sampling Frame:

The sampling frame used for the Annual Wholesale Trade Survey (AWTS) has two types of sampling units represented: single establishment firms and multiple-establishment firms. The information used to create these sampling units was extracted from data collected as part of the 2012 Economic Census and from establishment records contained on the Census Bureau's Business Register (as of October 2015). The next few paragraphs give details about the Business Register and the construction of the sampling units. Though important, they are not essential to understanding the basic sample design and readers may continue to the Survey Design and Size section.

The Business Register is a multi-relational database that contains a record for each known establishment that is located in the United States or one of its territories and has paid employees. An establishment is a single physical location where business transactions take place and for which payroll and employment records are kept. Groups of one or more establishments under common ownership or control are firms. A single-unit firm owns or operates only one establishment. A multi-unit firm owns or operates two or more establishments. The structure of a firm’s primary identifier on the sampling frame differs depending on whether it is a single-unit firm or a multi-unit firm.

A single-unit firm's primary identifier is its EIN. The Internal Revenue Service (IRS) issues the EIN, and the firm uses it as an identifier to report social security payments for its employees under the Federal Insurance Contributions Act (FICA). The same act requires all employer firms to use EINs. Each employer firm is associated with at least one EIN and only one firm can use a given EIN. Because a single-unit firm has only one establishment, there is a one-to-one relationship between the firm and the EIN. Thus, the firm, the EIN, and the establishment all reference the same physical location and all three terms can be used interchangeably and unambiguously when referring to a single-unit firm.

For multi-unit firms, however, a different structure connects the firm with its establishments via the EIN. Essentially a multiunit firm is associated with a cluster of one or more EINs and EINs are associated with one or more establishments. A multi-unit firm consists of at least two establishments. Each firm is associated with at least one EIN and only one firm can use a given EIN. However, one multi-unit firm may have several EINs. Similarly, there is a one-to-many relationship between EINs and establishments. Each EIN can be associated with many establishments, but each establishment is associated with only one EIN. Because of the possibility of one-to-many relationships, we must distinguish between the firm, its EINs, and its establishments. The multi-unit firm that owns or controls a particular establishment is identified on the Business Register by way of the establishment's primary identifier.

The primary identifier of a multi-unit firm is a unique alpha number. The Census Bureau assigns the alpha number to the multi-unit firm and assigns a unique establishment identification number to each establishment within a multi-unit firm. All establishments owned or controlled by the same multi-unit firm have the same alpha number. Different multi-unit firms have different alpha numbers, and different establishments within the same multi-unit firm have different establishment identification numbers. The Census Bureau assigns both the alpha number to the multi-unit firm and the establishment identification number to the corresponding establishments based on the results of the quinquennial Economic Census and the annual Company Organization Survey.

To create the sampling frame, we extract the records for all employer establishments located in the United States that are classified in the wholesale trade sector as defined by the 2012 North American Industry Classification System (NAICS). For these establishments we extract sales, end-of-year inventories, payroll, employment, name and address information, wholesale type of operation code (TOC), as well as primary identifiers and associated Employer Identification Numbers (EINs). We use the TOC to distinguish between three different types of wholesale establishments: (1) merchant wholesale establishments, excluding manufacturers’ sales branches and offices (MSBOs); (2) manufacturers’ sales branches and offices; and (3) agents, brokers, and business electronic markets. To create the sampling units, we sum the establishment data for all wholesale establishments associated with the same firm identifier. In some cases, a multi-unit firm has establishments active in more than one wholesale TOC. In these situations, we create firm-level sampling units for each type of operation. No aggregation is necessary to put single-unit establishment information on a firm basis. Thus, the sampling units created for single-unit firms simultaneously represent establishment, EIN, and firm information. The sampling frame is an amalgam of establishments/EINs and firms/alphas.

Sampling unit:

The sampling units used for the Annual Wholesale Trade Survey are firms. The firms consist of one or more establishments. An establishment is a single physical location where business transactions take place and for which payroll and employment records are kept. We create these sampling units from data collected as a part of the 2012 Economic Census and from establishment records contained in the Census Bureau’s Business Register (as of October 2015). The Business Register is a database that contains records of known establishments in the U.S.

Survey Design and Size:

Target population:

The Annual Wholesale Trade Survey (AWTS) target population consists of all U.S. firms with paid employees that are primarily engaged in merchant wholesale trade, as defined by the 2012 NAICS, including manufacturers’ sales branches and offices and non-merchant wholesalers such as agents, brokers, and electronic markets.

Sample design:

The sample for AWTS consists of three separate samples – one for each wholesale TOC. AWTS uses a stratified, one-stage design with primary strata defined by industry (e.g., Motor Vehicle and Motor Vehicle Parts, Furniture and Home Furnishings, Grocery, etc.). There are 83 primary strata: 56 from the merchant wholesale establishments, excluding MSBOs; 19 from the manufacturers’ sales branches and offices sample; and 8 from the agents, brokers, and business electronic markets sample. The primary strata are sub-stratified into 4, 7, 10, or 13 annual sales size strata. The largest sales size stratum within each industry stratum consists of firms, all of which are selected with certainty (probability equal to one). We determine a substratum boundary that divides the certainty units from the noncertainty units based on a statistical analysis of data from the 2012 Economic Census. Sample sizes are computed to meet multiple coefficient of variation constraints on estimated annual sales totals and end-of-year inventory totals. Constraints are specified at detailed industry levels and at broad industry levels (e.g., durable goods, nondurable goods) up to the total wholesale level. Units are selected independently between strata using simple random sampling without replacement within the annual sales substrata. The selected noncertainty firms are divided into two approximately equal groups. For merchant wholesalers excluding MSBOs, one group is canvassed for both the monthly and the annual survey. The MSBOs are only canvassed in the annual survey. Firms selected in the MSBOs or agent and broker sample are included in the Monthly Wholesale Trade Survey sample if that firm had activity in scope to the MWTS. Sampling weights for the Annual Wholesale Trade Survey range from 1 to 250. The sample consists of approximately 8,400 firms: 6,700 merchant wholesalers, excluding MSBOs; 1,000 manufacturers’ sales branches and offices; and 700 agents, brokers, and business electronic markets.

Frequency of sample redesign:

Sample revisions are performed approximately every 5 to 7 years.

Sample Maintenance

During the period for which the samples are used, updates are made on a quarterly basis to reflect changes in the business universe. These updates are designed to account for new businesses (births) and businesses that discontinue operations (deaths). The samples are also updated to reflect mergers, acquisitions, divestitures, splits, and other changes to the business universe.

We update the sample on a quarterly basis to represent EINs issued since the initial sample selection. These new EINs, called births, are EINs recently assigned by the Internal Revenue Service (IRS) that have an active payroll filing requirement on the IRS Business Master File (BMF). An active payroll filing requirement indicates that the EIN is required to file payroll for the next quarterly period. The Social Security Administration attempts to assign industry classification to each new EIN.

EINs with an active payroll filing requirement on the IRS BMF are said to be "BMF active" and EINs with an inactive payroll filing requirement are said to be "BMF inactive."

We sample EIN births on a quarterly basis using a two-phase selection procedure. To be eligible for selection, a birth must either have no industry classification or be classified in an industry within the scope of the Service Annual Survey (SAS), the Annual Wholesale Trade Survey (AWTS), or the Annual Retail Trade Survey (ARTS), and it must meet certain criteria regarding its quarterly payroll. In the first phase, we stratify births by broad industry groups and a measure of size based on quarterly payroll. A relatively large sample is drawn and canvassed to obtain a more reliable measure of size, consisting of revenue in two recent months and a new or more detailed industry classification code. We contact births by telephone if they have not returned their electronic worksheet within 30 days.

Using this more reliable information, in the second phase we subject the selected births from the first phase to probability proportional-to-size sampling with overall probabilities equivalent to those used in drawing the initial AWTS sample from the Business Register (as of October 2015). Because of the time it takes for a new employer firm to acquire an EIN from the IRS and the time needed to accomplish the two-phase birth-selection procedure, we add births to the sample approximately nine months after they begin operation.

To better represent all EIN births in the reference year, and specifically to account for the time it takes to identify and select new EINs, we traditionally add births that are chosen in the quarterly birth-selection procedure in February, May, and August of the reference year to the AWTS sample in the reference year. We will mail a letter to the February and May births in February and May, respectively, to supplement the initial survey mailing for the reference year. Although the August births are included in the reference year's estimates, we will not mail a letter to the August births for the reference year. (They will not receive a letter asking them to report until the initial mailing for the next reference year, which occurs in January of the following calendar year.) Births that are chosen in the quarterly birth-selection procedure in November of the reference year are added to the AWTS sample the following reference year. The November births are mailed as part of the initial mailing of the AWTS letters for the next reference year (in January of the following calendar year).

If a firm was selected with certainty and had more than one establishment at the time of sampling, any new establishments that the firm acquires, even if under new or different EINs, are included in the sample with certainty.

However, if a firm was selected with certainty and had only one establishment at the time of sampling, only future establishments associated with that firm’s originally-selected EIN are included in the sample with certainty; any new EINs that might later be associated with that firm are subjected to sampling through the quarterly birth-selection procedure.

To be eligible for the sample canvass and tabulation, a single establishment EIN or at least one EIN associated with a firm selected in the noncertainty sampling operations must meet both of the following requirements:

· It must be on the active payroll filing requirement on the IRS BMF.
· It must have been selected from the Business Register in either the initial sampling or during the quarterly birth-selection procedure.

We include any new establishments that a firm acquires, even if under new or different EINs, into the sample with the same sampling status as the original firm i.e., with the same initial sampling weight. For noncertainty firms, additional evaluation may be done in some instances to determine the feasibility of adding the new establishments by evaluating the effect of the new establishments on the industry estimates.

Similarly, each quarter we check against the current Business Register to determine if any EINs on the survey have become BMF inactive. Typically, we do not canvass BMF inactive EINs during the reference year. Likewise, if any EIN on the survey was BMF inactive in a previous reference year or was part of an inactive sampling unit in the survey and is now BMF active on the current Business Register, we again include these EINs in the canvass. In both cases, we only tabulate data for that portion of the reference year that these EINs reported payroll to the IRS.

Data Collection:

Key data items requested and reference period covered: Data items requested vary by form type, but include annual sales, e-commerce sales, number of establishments covered by the report, value of inventories, total purchases of products, total operating expenses, commissions, sales on own account, gross selling value, and the beginning and ending dates of the reporting period if the data provided are for a period other than the calendar year. Sales tax and detailed operating expense items are requested every 5 years, with the most recent collection in 2022. For key data items and response rates, please see AWTS response rates.

Survey letters, including guidance on electronic reporting, are mailed each year and request data for the previous year. The most current survey worksheets can be found here.

Type of request: Mandatory.

Frequency and mode of contact: AWTS is an annual survey. Firms receive a mailing with instructions to provide responses online via Centurion. Due date and follow up mailing are also conducted during the collection period. Phone calls are also utilized to follow up with firms that fail to respond, data may also be obtained in this manner.

Data collection unit: The data collection (reporting) unit for AWTS is the firm, which consists of one or more establishments. The initial sample consisted of approximately 8,400 firms. These firms are further divided by NAICS and type of operation to create reporting units.

Nonresponse:

Nonresponse is defined as the inability to obtain requested data from an eligible survey unit. Two types of nonresponse are often distinguished. Unit nonresponse is the inability to obtain any of the substantive measurements about a unit. In most cases of unit nonresponse, the Census Bureau was unable to obtain any information from the survey unit after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or unusable.

Estimation and Sampling Variance:

Estimation:

Total estimates are computed using the Horvitz-Thompson estimator, i.e. the sum of weighted reported or imputed data, for all selected sampling units that meet the sample canvass and tabulation criteria. See Sample Maintenance section. The weight for a given sampling unit is the reciprocal of its probability of selection into the AWTS sample. These estimates are input to a benchmarking procedure.

Sampling Error:

The sampling error of an estimate based on a sample survey is the difference between the estimate and the result that would be obtained from a complete census conducted under the same survey conditions. This error occurs because characteristics differ among sampling units in the population and only a subset of the population is measured in a sample survey. The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same sample design. Because each unit in the sampling frame had a known probability of being selected into the sample, it was possible to estimate the sampling variability of the survey estimates.

Common measures of the variability among these estimates are the sampling variance, the standard error, and the coefficient of variation (CV), which is also referred to as the relative standard error (RSE). The sampling variance is defined as the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. For example, an estimate of 200 units that has an estimated standard error of 10 units has an estimated CV of 5 percent. The sampling variance, standard error, and CV of an estimate can be estimated from the selected sample because the sample was selected using probability sampling. Note that measures of sampling variability, such as the standard error and CV, are estimated from the sample and are also subject to sampling variability. It is also important to note that the standard error and CV only measure sampling variability. They do not measure any systematic biases in the estimates.

The Census Bureau recommends that individuals using these estimates incorporate sampling error information into their analyses, as this could affect the conclusions drawn from the estimates.

We estimate variances for published statistics (totals, ratios, and percent changes) using the method of random groups. To implement the random group method of variance estimation, we assign a random group number to each sampling unit at the time of sample selection. Then, for each tabulation level at which estimates are produced, we compute variance estimates using the assigned random group numbers. We use 16 random groups (G=16) to estimate variances for the Annual Wholesale Trade Survey.

Confidence Interval:

The sample estimate and an estimate of its standard error can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the average of the estimates for the parameter derived from all possible samples of the same size and design. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained and using a t-statistic with 15 (=G-1) degrees of freedom, then:

For approximately 90 percent of the possible samples, the interval from 1.753 standard errors below to 1.753 standard errors above the estimate would include the average of the estimates derived from all possible samples of the same size and design.
For approximately 95 percent of the possible samples, the interval from 2.131 standard errors below to 2.131 standard errors above the estimate would include the average of the estimates derived from all possible samples of the same size and design.

To illustrate the computation of a confidence interval for an estimate of total sales, assume that an estimate of total sales is $10,750 million and the CV for this estimate is 1.8 percent, or 0.018. First obtain the standard error of the estimate by multiplying the total sales estimate by its CV. For this example, multiply $10,750 million by 0.018. This yields a standard error of $193.5 million. The upper and lower bounds of the 90-percent confidence interval are computed as $10,750 million plus or minus 1.753 times $193.5 million. Consequently, the 90 percent confidence interval is $10,411 million to $11,089 million. If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of these intervals would contain the average of the estimates derived from all possible samples.

Non-sampling Error:

Non-sampling error encompasses all factors other than sampling error that contribute to the total error associated with an estimate. This error may also be present in censuses and other nonsurvey programs. Non-sampling error arises from many sources: inability to obtain information on all units in the sample; response errors; differences in the interpretation of the questions; mismatches between sampling units and reporting units, requested data and data available or accessible in respondents’ records, or with regard to reference periods; mistakes in coding or keying the data obtained; and other errors of collection, response, coverage, and processing.

Although no direct measurement of non-sampling error was obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize its influence. Precise estimation of the magnitude of non-sampling errors would require special research or access to independent data, and, consequently, the magnitudes are often unavailable.

The Census Bureau recommends that individuals using these estimates factor in this information when assessing their analyses of these data, as non-sampling error could affect the conclusions drawn from the estimates.

Economic surveys at the Census Bureau are required to compute two different types of response rates: a unit response rate and weighted item response rates. Read more about AWTS response rates.

Quality Suppressions:

Estimates can be suppressed from publication for quality reasons. An estimate with a coefficient of variation (CV) greater than 30 percent, with a total quantity response rate (TQRR) less than 50 percent, or with other concerns about data quality has been suppressed from publication, unless the estimate has consistently been published for prior years and the CV and TQRR are acceptably close to the thresholds. A suppressed estimate and its corresponding measure of sampling variability have been replaced with an "S" in the published tables.

Two exceptions to the previously mentioned policy were made for the 2022 Annual Wholesale Trade Survey data tables released on January 29, 2024. The TQRRs for North American Industry Classification System (NAICS) code 42 e-commerce sales at the “merchant wholesalers" and "merchant wholesalers, except manufacturers' sales branches and offices" levels were 40.4 percent and 37.3 percent, respectively, for the 2022 statistical period. A decision was made to release these two e-commerce sales estimates (and their corresponding measures of sampling variability) instead of suppressing the data (i.e., instead of replacing the estimates with “S”), because these are 2-digit NAICS code estimates. Data users should be aware of the TQRRs for these estimates and use caution when drawing conclusions from these estimates.

For a description of the Census Bureau's standards for Releasing Information Products, see https://www.census.gov/about/policies/quality/standards.html.

Linking Samples

The current sample was introduced with the 2016 Annual Wholesale Trade Survey. This sample is designed to produce estimates based on the 2012 NAICS. All published estimates from the 2015 Annual Wholesale Trade Report were restated from 2007 NAICS definitions to 2012 NAICS definitions. Definitions changed for NAICS 4236 and 4237 and changes were applied from the span of 2008-2015.

In order to maintain the time series for each industry, an operation is performed to link estimates from the prior and new samples. For the linking operation to occur, two years of data were collected (2015 and 2016) from units in the new sample.

Sales estimates from the new sample for reference year 2015 and subsequent years are linked to the restated prior sample estimates by multiplying the Horvitz-Thompson estimates from the new sample by a ratio. The ratio is calculated as follows:

The numerator is the 2015 published, census-adjusted (based on the 2012 Economic Census) sales estimate for the industry restated on a 2012 NAICS basis from the prior sample.
The denominator is the 2015 Horvitz-Thompson sales estimate for the industry on a 2012 NAICS basis from the new sample.

The resulting sales estimates (called “modified” sales estimates) are implicitly benchmarked to 2012 Economic Census results via this linking procedure.

The following method is used to produce modified estimates for the following items: end-of-year inventories, purchases, gross selling value, sales on own account, and e-commerce. First, the sales ratio described above is multiplied by the Horvitz-Thompson estimate for the given item for 2015 and subsequent years. Then the published estimates for 2007 through 2015 from the prior sample are input into the benchmarking program. Using this program, the estimates for 2008 through 2015 for each detailed industry are revised in a manner that:

Uses the benchmarked estimate for 2007 from the prior sample as a constraint, resulting in no revision to the 2007 estimate.
Uses the modified estimate for 2015 from the new sample as a constraint.
Minimizes the sum of squared differences between the year-to-year changes of the input and revised estimates for 2008 through 2015.

A similar method is used for total operating expenses, only using the benchmarked estimate for 2012 from the prior sample as a constraint instead of 2007.

For agents, brokers, and business electronic markets, to ensure consistency with total sales, the benchmarked gross selling value and the sales on own account are raked to the benchmarked total sales estimate. This is done by calculating the proportion of gross selling value and sales on own account to total sales, then applying each of these proportions to the benchmarked total sales to get the corresponding benchmarked estimates for gross selling value and sales on own account.

Modified estimates at aggregate industry levels are computed by summing the modified estimates for the appropriate detailed industries comprising the aggregates.

The AWTS estimates are then benchmarked to the 2017 Economic Census results as described below.

Benchmarking to the 2017 Economic Census

Results of the 2017 Economic Census are used to benchmark AWTS estimates. Sales estimates are input to the benchmarking program and are revised in a manner that:

Uses the 2017 Economic Census sales total as a constraint, along with the existing 2012 modified sales estimate, which is already linked to the 2012 Economic Census.
Minimizes the sum of squared differences between the year-to-year changes of the input and revised estimates for 2013 through 2022.

The process is applied separately to merchant wholesalers except MSBOs and to MSBOs. The same process is applied to Wholesale Electronic Markets and Agents and Brokers (NAICS 425) using sales estimates defined as gross selling value plus sales on own account. The estimates output from this operation are referred to as “benchmarked.”

A similar method to the one for adjusting sales is used to adjust estimates for inventories, purchases, operating expenses, gross selling value, sales on own account, and e-commerce. Each of these items are revised in the following manner:

2012 and 2017 modified estimates are multiplied by the ratio of benchmarked sales divided by modified sales for the same year.
Modified estimates for each item are input into the benchmarking program using the two constraints calculated above.
The benchmarking program minimizes the sum of squared differences between the year-to-year changes of the input and revised estimates for 2013 through 2022.
The benchmarked estimates will equal the modified estimates for the years 2012 and earlier.

For industries with updated 2012 (or 2007) Economic Census sales, the 2012 (or 2007) benchmarked-to-modified sales ratio(s) were revised, so an additional constraint(s) was used in the process described above: the modified 2007 estimate for each item (multiplied by the 2007 benchmarked-to-modified sales ratio).

For commissions, 2013-2022 modified commissions estimates are multiplied by the ratio of benchmarked gross selling value divided by the modified gross selling value for the same year.

Benchmarked estimates at aggregate industry levels are computed by summing the benchmarked estimates for the appropriate detailed industries comprising the aggregate, and benchmarked estimates for merchant wholesales are computed by summing the benchmarked estimates for MSBOs and merchant wholesalers except MSBOs.

Disclosure Avoidance

Disclosure is the release of data that reveals information or permits deduction of information about a particular survey unit through the release of either tables or microdata. Disclosure avoidance is the process used to protect each survey unit’s identity and data from disclosure. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put information at risk of disclosure. Although it may appear that a table shows information about a specific survey unit, the Census Bureau has taken steps to disguise or suppress a unit’s data that may be “at risk” of disclosure while making sure the results are still useful.

Annual Wholesale Trade uses cell suppression for disclosure avoidance.

Cell suppression is a disclosure avoidance technique that protects the confidentiality of individual survey units by withholding cell values from release and replacing the cell value with a symbol, usually a “D”. If the suppressed cell value were known, it would allow one to estimate an individual survey unit’s too closely.

The cells that must be protected are called primary suppressions.

To make sure the cell values of the primary suppressions cannot be closely estimated by using other published cell values, additional cells may also be suppressed. These additional suppressed cells are called complementary suppressions.

The process of suppression does not usually change the higher-level totals. Values for cells that are not suppressed remain unchanged. Before the Census Bureau releases data, computer programs and analysts ensure primary and complementary suppressions have been correctly applied.

The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. P-7500133, Disclosure Review Board (DRB) approval number: CBDRB-FY24-0052).

Previous disclosure avoidance approvals include CBDRB-FY23-047, CBDRB-FY22-041, CBDRB-FY21-080, CBDRB-FY2020-ESMD003-003, and CBDRB-FY19-EWD-B00005.

For more information on disclosure avoidance practices, see FCSM Statistical Policy Working Paper 22.