The target population of the Economic Census consists of all establishments (generally single physical locations where business is conducted or where services or industrial operations are performed) that:
Note that establishments located in Puerto Rico, the U.S. Virgin Islands, Guam, the Commonwealth of the Northern Mariana Islands, or America Samoa are in the target population of the Economic Census of Island Areas, which has different methodology, documented on the Island Areas methodology webpage.
The 2022 Economic Census covers the following NAICS sectors of the U.S. economy:
More detailed descriptions of these sectors can be found at the Census Bureau NAICS webpage.
The following industries (NAICS) are not covered by the 2022 Economic Census:
The Economic Census selects establishments for its sample from a frame obtained from the U.S. Census Bureau’s Business Register. The Business Register contains information on the physical location of establishments, as well as payroll, employment, receipts (value of shipments), and industry classification data obtained from prior censuses and surveys or obtained from the administrative records of the Internal Revenue Service (IRS) and Social Security Administration (SSA) under special arrangements which safeguard the confidentiality of both tax and census records. Information from the Bureau of Labor Statistics on industry classifications is also used to supplement the classification information from the IRS and SSA.
To be included on the sampling frame, an establishment was required to satisfy the following conditions:
To reduce respondent burden and costs, the Census Bureau did not require all establishments to complete an Economic Census questionnaire. For tabulations of basic data items (receipts, payroll, employment, etc.) administrative data of these, generally smaller, single-establishment firms are used. Note that this document will use the term ‘receipts’ as a shorthand for the phrase ‘Sales, Value of Shipments, or Revenue,’ which is the standard term used on tables, since the correct term varies by sector. Establishments that were not required to complete a Census questionnaire are referred to as the “non-sampled component” of the 2022 Economic Census. This non-sampled component consists of:
The table below shows the sizes of each of the sample components and the first non-sampled component from the bulleted list, above. The last column (“Total Single-Estab Frame”) shows the total number of single-establishment firms that were on the sampling frame.
Selection procedures differ between multi- and single-establishment firms.
Multi-Establishment Firms
Any firm with more than one active establishment is included in the Economic Census with certainty and is generally expected to report for all its establishments. Each establishment is included with certainty and assigned a sample weight of 1.
Establishment Reporting Units
In most industries, multi-establishment firms are required to complete an industry-specific questionnaire for each of the establishments in their firm.
Alternative Reporting Units (ARU) for Selected Industries
In some industries, firms have difficulty reporting receipts and related data for each of their business locations (establishments). However, they can provide firm-level industry totals with relative ease, and they can report separate payroll and employment information for each business location within the industry. Table 2 shows the industries for which an alternative questionnaire was used and the expected number of affected firms.
If a firm had more than two establishments in one of the industries listed below, the firm received one questionnaire for each of those industries. Each questionnaire requested consolidated, firm-level data for receipts and related measures covering the firm’s nationwide operations. A supplementary questionnaire enumerated the firm’s establishments in the industry and requested payroll and employment information for each of them.
Single-Establishment Firms with 2022 Payroll
The sample design for single-establishment firms began with a study of the potential respondent universe. This study produced a set of industry-specific payroll cutoffs that were used to distinguish large single-establishment firms from small ones within each industry. In general, these cutoffs were chosen so that the sum of the payroll of the multi-establishment firms plus the payroll of the single-establishment firms above the cutoff equaled 75 - 95% of the total payroll in an industry, though there were exceptions. In the hypothetical example below, a payroll cutoff of $229,000 for an industry will result in 80% of total industry payroll (32.6% + 47.4%) being contained within the 6,655 establishments (2,508 + 4,147) selected with certainty.
The single-establishment firm sample selection had three phases: identifying the “large” single-establishment firms including some firms with special characteristics, selecting a sample of the “small” single-establishment firms, and determining if additional classification information was needed from the non-selected single-establishment firms.
Identifying “Large” Single-Establishment Firms
All single-establishment firms with annualized administrative payroll that equaled or exceeded the certainty payroll cutoff for their industry were included in the sample component of the Economic Census with certainty. Each had a probability of selection of 1, and a sample weight of 1, which applies only for producing industry-specific statistics where data are not available from administrative records. Note that “Large” is relative. In some industries, this payroll cutoff was zero and all establishments were selected into the sample.
In addition, certain single-establishment firms were included with certainty, regardless of size, based on other characteristics. These included firms that were likely cooperatives and firms included in the Annual Survey of Manufactures (ASM).
Sampling “Small” Single-Establishment Firms
The remaining single-establishment firms (those with annualized payroll below the cutoff for their industry) were stratified by industry and state and selected using a strata-specific probability of selection.
The probabilities of selection for these strata were determined by a study of the potential respondent universe conducted shortly before sample selection operations began. Selected small single-establishment firms were included in the sample as non-certainty cases. Each had a probability of selection that generally fell within the range of 0.8 to 0.05. In industry by state strata containing fewer than five establishments, all were included in the sample (not applicable to mining and construction sectors).
Determining Which Establishments Need Classification Information
All remaining (non-sampled) single-establishment firms with payroll were represented in the Economic Census by data from federal administrative records or through imputation and were not usually required to respond. However, in some cases, the industry classification information on the Business Register – used to tabulate the (quantitative) administrative data in the correct industry – is inadequate or outdated. The most common reasons for a deficient classification were administrative classification data provided to the Census Bureau lacking sufficient detail to assign an establishment to a publication level NAICS industry, or the administrative data were not in agreement regarding an establishment’s classification.
After the initial sample selection in September 2022, a second sample of single-establishment firms was selected in November 2022 from those establishments on the Business Register with 2022 payroll that were not on the initial sampling frame. Similarly, a third sample of single-establishment firms was selected in March 2023 from those establishments on the Business Register with 2022 payroll that were not included on the previous two sampling frames. Any single-establishment firms that started business so late in 2022 that their administrative data was not available to the Census Bureau in time for the last sampling operation were not included in the data collection but were included in the tabulations using their administrative data or via imputation.
The reference period is the calendar year 2022. Information for businesses selected into the Economic Census can be found at https://www.census.gov/programs-surveys/economic-census/information.html. The 2022 Economic Census questionnaires are available at 2022 Economic Census Survey Repository.
Respondents are given a choice from a preselected list when asked for their establishment's primary business/activity. For the 2017 Economic Census, a respondent could provide a write-in response if none of the options were appropriate. For the 2022 Economic Census, this write-in response was connected to a machine learning algorithm that allowed the respondent to select the correct primary business/activity from additional options generated by the submitted response. Additionally, for single-establishment firms, the specific set of questions asked in the remainder of the 2022 Economic Census may vary based on the given response for primary business/activity rather than instead of being predetermined.
The 2022 Economic Census was collected entirely online. Paper questionnaires were made available for establishments located in the island areas. See the Island Areas methodology for more details. Respondents were contacted in January of 2023 with an initial survey letter requesting their participation online. Respondents also received a reminder letter prior to the March 15 due date. Up to five “past due” follow-ups were sent via mail and multiple email follow-ups were sent to companies that had started, but not yet completed, the reporting process online. Select companies also received reminder calls via telephone. Initial and follow-up letters can be found under respondent materials.
For all single-establishment firms and most multi-establishment firms, the data collection unit (also referred to as the ‘reporting unit’) is the establishment. As mentioned previously in the Sample design section above, in certain industries, multi-establishment firms have difficulty reporting receipts and related data for each of their establishments. These firms received a special questionnaire that requested consolidated, firm-level data for receipts and related measures. A supplementary questionnaire listed the firm’s establishments in the industry and requested payroll and employment information for each of them.
For the third nonresponse follow-up mailing, a certified letter was mailed to roughly one-half of the single-establishment firm nonrespondents (with the other nonrespondents receiving a regular letter). The nonrespondents to receive the certified letter were selected using a process that identified industry by state combinations where the response was poorest and selecting a larger proportion of nonrespondents from those combinations.
For the fourth nonresponse follow-up mailing, single-establishment firm nonrespondents that did not receive a certified letter in the third follow-up received either a certified letter or a letter by Priority Mail.
Data captured in an Economic Census must be edited to identify and correct reporting errors. The data also must be adjusted to account for missing items and for businesses that do not respond. Data edits detect and validate data by considering factors such as proper classification for a given record, historical reporting for the record, and industry/geographic ratios and averages.
The first step of the data editing process is classification. To assign a valid kind-of-business or industry classification code to the establishment, computer programs subject their responses to a series of data edit programs. The specific items used for classification depend on the census report forms and include:
If critical information is missing, the record is flagged and fixed by analysts before further processing occurs.
If all critical information is available, the classification code is assigned automatically. After classification codes are assigned, a "verification" operation is performed to validate the industry and geography.
After an establishment has been assigned a valid industry code, the data edits further evaluate the response data for consistency and validity—for example, ensuring that employment data are consistent with payroll or receipts data. Response data is always evaluated by industry; in some cases, type of operation or tax-exempt status is also taken into account. Additional checks compare current year data to data reported in previous censuses, annual surveys, or from administrative sources.
Nonresponse is defined as the inability to obtain requested data from an eligible survey unit. Two types of nonresponse are often distinguished. Unit nonresponse is the inability to obtain any of the substantive measurements about a unit. In most cases of unit nonresponse, the Census Bureau was unable to obtain any information from the survey unit after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or the reported data is unusable.
Nonresponse is handled by estimating or imputing missing data. Imputation is defined as the replacement of a missing or incorrectly reported item with another value derived from logical edits or statistical procedures.
The primary methods for imputing missing basic data items (such as receipts/sales, payroll, and employment) are:
Expansion
Some data items on certain published tables use expansion, rather than imputation, to account for nonresponse. For example, missing data are not imputed for the employment by function variables. Consider the following example from table EC2242EMPFUNC (not yet released) for NAICS = 423 and Type of operation = 10:
The estimate in cell E1 is the sum of the estimates in cells E2-E8. This estimate in cell E1 should also equal the number of paid employees for the corresponding NAICS by Type of operation found on the Geographic Area Statistics table. However, without accounting for nonresponse in cells E2-E8, E1 can only equal the corresponding total on the Geographic Area Statistics table when there is perfect response and certainty sampling of all establishments. To account for any nonresponse, an expansion factor is calculated for every NAICS by Type of operation cell. (This expansion factor will also account for any discrepancy between the total employment estimate derived by sample weighting and the full tabulation of total employees.)
Each of the “Number of paid employees” values within a given NAICS by Type of operation cell is then multiplied by this expansion factor, resulting in an E1 value that equals the corresponding value on the Geographic Area Statistics table, and E2 through E8 values which still sum to the total E1.
The coverage, a measure of the proportion of the total quantity that is reported, has the following relationship to the expansion factor:
Many other published tables also use an expansion factor to account for nonresponse for one or more data items. Those data items also use different Geographic Area Statistics totals, such as total receipts or total annual payroll, to calculate their respective expansion factors.
North American Product Classification System (NAPCS) imputation
Sampled establishments were asked to provide a detailed breakdown of their receipts by various products or services. For simplicity, these data items are referred to as ‘products’ throughout this section. Establishments that did not report products are assigned products using a hot-deck imputation (HDI) process, as described below. In this process, the set of products and the proportions of each product’s receipts to total receipts from a similar establishment (called the donor) are assigned to the establishment missing the product data (the recipient).
Products are assigned using a set of collection codes based on the NAPCS definitions. More details on these definitions are available on the Economic Census’s guidance page for NAPCS. The NAPCS-based collection codes have two levels: a broad line or general product, and the detail lines within the broad line. Below is an example of one broad line (5001275000) and its thirteen detail lines. Not all broad lines have associated detail lines and reporting of detail lines is usually only required in industries where the product is likely to be sold.
Example of a NAPCS Broad Line and Detail Lines
NAPCS Recipients and Donors
Recipients come from two sources:
Most respondents are classified as:
In certain cases, some responses – even though they contain valid product data – are considered unique so they are not used as donors (though their data are included in the product estimates). These establishments are called ‘non-donors.’ For the purpose of categorization, we consider establishments with zero receipts to be non-donors.
Table 4, below, shows the number of establishments classified into each of the categories.
In the first stage of the HDI process, the missing detail line products of the partial donors are filled in using the distribution of those reported detail line products that passed the edits, for each broad line (from the complete donors and partial donors with detail lines). Once this is finished, all donors are essentially “complete” and are assembled in the donor pool. Depending on the industry, the donors and recipients are then matched within an imputation cell (see “Imputation Cell Definition” table, below) and a donor establishment is chosen for each recipient either randomly or based on similarity of each establishment’s total receipts (nearest neighbor). The donor’s product list and product distribution (percentage of total receipts in each product) are assigned to the recipient and dollar amounts are imputed (and rounded to integers) using this distribution and the recipient’s total receipts. The figure below illustrates this process and uses the term ‘sales’ interchangeably for ‘receipts.’
If a partial donor has no available donors from which to receive a distribution of detail lines, a fallback category average method is used to impute its detail lines. Category averages are the distributions of detail lines per NAICS by broad line combination. Subject matter experts use the NAPCS structure as the basis to develop these category average parameters.
Table 6, below, summarizes the number of imputation cells at the three levels of detail along with the number of cells with no donors. See Table 5, above, for a summary of imputation cell levels. For all reference to NAICS in Table 5, these are NAICS at the “8-digit” level, a level more detailed than the 6-digit NAICS in the official NAICS definitions. Any reference in this document to 7- or 8-digit NAICS refer to these classification details beyond the official definitions.
For each recipient establishment, the HDI process chooses a donor establishment from within the recipient’s imputation cell and assigns the donor’s product distribution to the recipient. A study was done to determine whether the HDI process chooses the donor at random or chooses the donor according to a proximity algorithm (which is also known as nearest neighbor HDI) (Bechtel et al, 2015). For each imputation cell, there is no required minimum number of donors, no required donor-to-recipient ratio, and no limit on the number of times a donor was used. In the event that any base imputation cell containing recipients has no donors, subject matter experts developed a fallback parameter file of product distributions, based on the NAPCS structure and industry knowledge, to assign product distributions to such recipients.
NAPCS Product Expansion
For the product estimates, a post-stratification weighting adjustment (using industry by state strata) is subsequently performed to ensure that the weighted product data of the sampled establishments sum to the total receipts of all establishments. This procedure is similar to the expansion used for other items detailed earlier.
After the HDI process, all sampled establishments will have been assigned valid NAPCS products so that for each establishment, the sum of the broad line receipts will equal the establishment’s total receipts, and the sum of the detail line receipts will equal the receipts of the corresponding broad lines.
Though the sample is designed to represent the frame from which it is selected, the weighted receipts of the sample will not exactly match the total receipts of all (sampled and non-sampled/non-mailed) establishments.
Because the product receipts of each individual establishment sum to that establishment’s total receipts, the sum of the weighted broad line receipts will sum to weighted total receipts of the sample establishments. In order to make the product receipts estimates sum to the total receipts of all establishments, it is necessary to make a weighting (expansion) adjustment. The formula for this adjustment is:
Economic Census tabulations for basic statistics (receipts, payroll, employment, etc.) are simple summations of data from all in-scope establishments using reported data collected from the Economic Census, plus administrative records data or imputed data for nonrespondents and single-establishment firms that were not selected into the Economic Census sample. The most common source of imputed data is administrative data from the IRS. For multi-establishment firms in alternative reporting industries (see the Sample design section above), the consolidated firm level receipts data is first allocated to the individual establishments of the firm in the industry.
Estimates involving ARU industries are treated differently in two ways. In the 2017 Economic Census, any estimates of receipts below the national level for any of the ARU industries were suppressed, appearing on tables with a ‘Q’ flag, which has the note “Revenue not collected at this level of detail for multiestablishment firms.” Estimates for aggregate-level industries that included an ARU industry appeared with an ‘N’ flag, which has the note “Not available or not comparable.” Despite those suppressions, estimates were calculated for lower levels of geography by proportionally allocating the firm-level responses for receipts to individual establishments. Starting with the 2022 Economic Census, these estimates are no longer suppressed after continued review of those allocated values of receipts indicated the allocated values were comparable in quality to other methods of imputation. 2017 estimates on comparative tables retain the ‘Q’ and ‘N’ flags.
Suppressions using ‘Q’ and ‘N’ flags remain in place in the 2022 Economic Census for estimates of number of establishments with activity in specific products defined by NAPCS-based collection codes. Firm-level responses make it impossible to estimate the number of establishments with specific product activity.
Economic Census estimates for industry-specific statistics, such as product statistics and other industry-specific special items, are derived by summing weighted data, where each certainty establishment (establishments of multi-establishment firms and “large” single-establishment firms) has a weight of 1, and each non-certainty establishment has the sample weight assigned during the sample selection process (see above). These initial weighted estimates usually go through an expansion procedure, detailed in the Nonresponse adjustment and imputation section, to ensure that they properly sum to a corresponding basic statistic.
List of weighted estimates by table
EC2223BASIC:
Dollar values are published in current dollars. In tables that compare the current Economic Census to prior Economic Census statistics, no adjustment has been made to the estimates to account for inflation during the intervening period.
The sampling error of an estimate based on a sample survey is the difference between the estimate and the result that would be obtained from a complete census conducted under the same survey conditions. This error occurs because characteristics differ among sampling units in the population and only a subset of the population is measured in a sample survey. The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same sample design. Because each unit in the sampling frame had a known probability of being selected into the sample, it was possible to estimate the sampling variability of the survey estimates.
Common measures of the variability among these estimates are the sampling variance, the standard error, and the coefficient of variation (CV), which is also referred to as the relative standard error (RSE). The sampling variance is defined as the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. For example, an estimate of 200 units that has an estimated standard error of 10 units has an estimated CV of 5 percent. The sampling variance, standard error, and CV of an estimate can be estimated from the selected sample because the sample was selected using probability sampling. Note that measures of sampling variability, such as the standard error and CV, are estimated from the sample and are also subject to sampling variability. It is also important to note that the standard error and CV only measure sampling variability. They do not measure any systematic biases in the estimates.
The Census Bureau recommends that individuals using these estimates incorporate sampling error information into their analyses, as this could affect the conclusions drawn from the estimates.
Estimates of basic data items (such as receipts, payroll, employment, inventories, etc.) included in the 2022 Economic Census publications are computed from all known in-scope establishments in the country and therefore are not subject to sampling error. For those establishments that were not sampled or did not respond, missing data items were either imputed or filled in with administrative data from other government agencies.
Many estimates that are subject to sampling error are provided with an estimate of that error, given in terms of a CV or a standard error. In general, estimates with units (dollars, number of establishments, number of employees, etc.) have sampling errors given with a CV, and estimates of ratios or percentages have sampling errors given with a standard error. Below is a list of sample-based estimates that do not have associated estimates of sampling error provided.
Sample-based estimates with no associated measurement of sampling error (by table):
EC2223BASIC:
NAPCS Variance Estimation
NAPCS variance estimation was accomplished by (see “NAPCS Variance Estimation Process” figure, below):
(For more information about the development of this NAPCS variance estimation process, see Knutson et al, 2017.)
NAPCS Variance Estimation Step 1: Finite Population Bayesian Bootstrap (FPBB)
The first step of NAPCS Variance Estimation is to create five synthetic populations from the original Economic Census sample using the Finite Population Bayesian Bootstrap (FPBB) method.
The Finite Population Bayesian Bootstrap (Zhou et al, 2012) is a non-parametric multiple imputation method that accounts for complex sampling procedures and post-stratification. With the FPBB, the idea is to expand the sample of size 𝑛 into several FPBB synthetic populations, each of size 𝑁, where 𝑁 is the original Economic Census population size. These FPBB synthetic populations are created by drawing units from stratum ℎ from the original sample with probability for the 𝑘th selection,
where wi is the post-stratified sampling weight of unit i, li,k-1 is the number of times unit 𝑖 has been selected up to the (𝑘−1)th selection, and k is the number of selections that have been made.
The (Nh - nh) resampled units are added to the original sample to complete the FPBB synthetic population. As described by Zhou et al (2012), this is an application of a Pólya sample designed to “restore the existing complex survey sample back to some SRS-type/self-weighting data structure.” This process, which Zhou refers to as “uncomplexing” the sample, is repeated several times to create five synthetic populations.
The figure “Creating three synthetic-populations from a sample by FPBB”, below, gives an example of the FPBB process creating three synthetic populations for an unequal probability sample of size n = 6, sampled from a population of size N = 11. A value of “?” indicates a nonrespondent in the sample that is likewise included as a nonrespondent in the expanded synthetic population.
Note that a post-stratified sampling weight is used instead of the design weight in the Pólya sampling procedure, and thus the expanded population sizes may differ from the original sampling frame population sizes. This adjustment of the sampling weight is done so that the sum of the post-stratified sample weights sum to an integer, which is required for the FPBB process.
NAPCS Variance Estimation Step 2: Approximate Bayesian Bootstrap (ABB)
The next step is to incorporate product nonresponse and estimate the nonresponse variance. To do this, we employ the Approximate Bayesian Bootstrap (ABB) within each FPBB synthetic population. The ABB is a straightforward way to implement multiple imputation for the HDI methodology. Rubin and Schenker (1986) and Rubin (1987) propose the ABB as a tool for introducing appropriate variability into a multiple imputation procedure. ABB is a non-Bayesian method that approximates a Bayesian procedure and adjusts for the uncertainty in the distribution parameters resulting in a proper imputation procedure. The figure below illustrates how the ABB draws a simple random sample (SRS) of respondents with replacement. Note that for a given FPBB population the resampled ABB populations have the same set of nonrespondents (recipients) but different sets of respondents (donors).
NAPCS Variance Estimation Step 3: HDI to impute NAPCS
At this stage, there are 100 different samples from the ABB process, each with a different set of donors and recipients. We also retained the donors from the 5 FPBB populations for a total of 105 different populations. The next step is to send each population through the entire NAPCS HDI process, imputing values for missing data, using the sample of respondents drawn in the previous step, as the ABB replicate donors (see figure “Multiple Imputed ABB Replicates for Population 1”, below). The first stage imputes detail product NAPCS from industry averages for population establishments with valid broad products (donors and non-donors) but missing or invalid detail products. The second stage uses the updated donors to impute broad and detail product NAPCS to population recipients.
Each round of the ABB procedure results in one complete dataset. This procedure is then repeated 20 times to obtain multiple imputed datasets. Ultimately, each of the five FPBB synthetic populations will have 20 ABB replicates.
After HDI is run, all records with NAPCS receipts equal zero are removed. These records occur when an imputed value is rounded down to zero. This is done, so that these establishments are not counted as having the associated product.
NAPCS Variance Estimation Step 4: Computing the NAPCS estimates:
The next step is to use these 105 (now imputed) samples to compute 100 sets of NAPCS total receipt estimates. In addition, 100 estimates of the number of records in the cell (establishments) are computed. These are mentioned here to note they are processed in the identical fashion to NAPCS receipts.
Totals are computed for each “NAICS x Tax Exempt Status/Type of Wholesale Operation x State x NAPCS code” cell. (NAICNEW x TWTAX x STATE x NAPCS in the figure “Example of Combining Variance…” below). The column “Variance ID” indicates which population the record is from.
For the 100 runs from the ABB populations (20 samples x 5 FPBB), donor records are dropped before tabulation. While these were run earlier through HDI to determine non-donor and recipient variance, a better estimate of donor variance comes from the FPBB populations.
NAPCS receipts estimates are combined to produce a new set of estimates. A donor estimate (underlined in the figure “Example of Combining Variance…”, below) is added to each associated recipient estimate. The 105 sets of estimates became 100 sets of estimates. The figure below shows an example of how this works for a single cell.
Note: It is possible that there may be cells among the donors that do not exist in one or more sets of recipient estimates (and vice versa).
These NAPCS receipts estimates are augmented with aggregates for the US, other NAICS code levels, and Type of Wholesale/Tax Exempt Status total as follows:
Once completed, every NAPCS product has 100 sets of estimates. Each set contains estimates at the state and US levels, for all NAICNEW levels, 8-digit through all-sectors, and for each value of Type of Wholesale Operation/Tax Exempt Status (including the aggregate) as illustrated below in the table “Example after Aggregating”.
NAPCS Variance Estimation Step 5: Computing NAPCS Variance
The variance for each NAICS x Type of Wholesale Operation/Tax Exempt Status x STATE x NAPCS is computed using the following formula:
In addition to computing the variance, the number of times a particular estimation cell appears among the 100 replicates is counted. If no estimate exists for one or more values of VARIANCE ID within an estimation cell, the missing estimates are treated as zeros for purposes of computing averages and differences in the formula above.
The coefficient of variation (CV), referred to in the 2017 Economic Census as the relative standard error or RSE, is computed for NAPCS receipts estimate j using the formula below by dividing the square root of the variance (NAPCSVARj) by the absolute value of the overall average NAPCSDOL estimate (the one computed from all 100 replicates).
Nonsampling error encompasses all factors other than sampling error that contribute to the total error associated with an estimate. This error may also be present in censuses and other nonsurvey programs. Nonsampling error arises from many sources, such as:
It is important to have metrics to measure, monitor, and manage data collection and the level of response achieved by the data collection methods so that the amount of nonresponse is minimized to the extent possible. One type of response metric is the check-in rate. The check-in rate is calculated as the ratio of the number of reporting units returning a questionnaire to the number of reporting units mailed a request to complete a questionnaire. The check-in rate (expressed as a percentage) for the 2022 Economic Census was over 73%.
A returned questionnaire includes receipt of an electronic submission authorized by the respondent, receipt of an acceptable response during telephone follow-up, or, under special circumstances, respondent-authorized submission by some other means.
Although no other direct measurement of nonsampling error was obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize its influence. Precise estimation of the magnitude of the nonsampling errors would require special experiments or access to independent data and, consequently, the magnitudes are often unavailable.
The Census Bureau recommends that individuals using these estimates factor in this information when assessing their analyses of these data, as nonsampling error could affect the conclusions drawn from the estimates.
For the 2022 Economic Census, the Census Bureau produced response metrics in accordance with Census Bureau standard response rate calculations, in order to monitor data collection and to provide additional indicators of data quality. These are the Unit Response Rate (URR), the Total Quantity Response Rate (TQRR), the Quantity Response Rate (QRR), the Administrative Data Rate (ADR), and the Imputation Rate (IR). For definitions, see the Census Bureau Statistical Quality Standards, Appendix D3-B: Requirements for Calculating and Reporting Response Rates: Economic Surveys and Censuses, at https://www.census.gov/about/policies/quality/standards/appendixd3b.html.
To produce these rates, the Census Bureau implemented a detailed method for documenting the sources of data used for correcting estimated or inconsistent data. These correction sources align with those used in the Census Bureau’s annual economic surveys, and are defined as follows:
When calculating the standard response metrics, the first four types of corrections are treated in the same manner as “reported” data. For the fifth type of correction, the data are treated as imputed. Imputation rates are indicated with tabulated 2022 Economic Census data using a coding scheme as follows:
0: Imputation rate is less than 10%
1: Imputation rate is greater than or equal to 10% but less than 20%
2: Imputation rate is greater than or equal to 20% but less than 30%
3: Imputation rate is greater than or equal to 30% but less than 40%
4: Imputation rate is greater than or equal to 40% but less than 50%
5: Imputation rate is greater than or equal to 50% but less than 60%
6: Imputation rate is greater than or equal to 60% but less than 70%
7: Imputation rate is greater than or equal to 70% but less than 80%
8: Imputation rate is greater than or equal to 80% but less than 90%
9: Imputation rate is greater than or equal to 90%
The URR for the 2022 Economic Census was just over 60%.
Qualitative research, utilizing techniques such as cognitive interviews or usability testing methods, may be undertaken to assess the performance of new or substantially changed survey questions or data collection instruments, and results are usually used to aid design decisions in order to reduce measurement error and response burden. Post-collection debriefing interviews may be conducted with respondents in order to evaluate the performance of questions/instruments, to identify error sources, and to recommend modifications for future collections. If available, paradata may also be examined to identify problematic questions or instrument designs for further improvement. Reports of findings and recommendations are prepared from these studies and provided to survey managers and sponsors and may be publicly available pursuant to confidentiality and disclosure requirements.
Disclosure is the release of data that reveals information or permits deduction of information about a particular establishment or company through the release of either tables or microdata. Disclosure avoidance is the process used to protect each survey unit’s identity and data from disclosure. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put information at risk of disclosure.
Cell suppression is a disclosure avoidance technique that protects the confidentiality of individual survey units by withholding the values of certain cells within a table from release and replacing the cell value with a symbol, usually a “D”. If the suppressed cell value were known, it would allow one to estimate an individual survey unit’s data too closely.
The cells that must be protected are called primary suppressions. To make sure the cell values of the primary suppressions cannot be closely estimated by using other published cell values, additional cells may also be suppressed. These additional suppressed cells are called complementary suppressions.
The process of suppression does not change the higher-level totals. Values for cells that are not suppressed remain unchanged. Before the Census Bureau releases data, computer programs and analysts ensure primary and complementary suppressions have been correctly applied.
In addition to cell suppression, data rows with fewer than three contributing firms or three contributing establishments are not presented.
Rounding, either from the way data is collected or published, may affect whether a small number is considered a primary suppression. Ranges are sometimes used in place of “D”s to suppress sensitive data, but still provide some meaningful information.
Background on cell suppression, cell sensitivity and the protection of statistical data can be obtained from the Federal Committee on Statistical Methodology's Working Paper 22 (Harris-Kojetin et al, 2005).
The Census Bureau has reviewed the 2022 Economic Census data products to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7504609, Disclosure Review Board (DRB) approval number: CBDRB-FY23-099).
For more information on the history of the Economic Census, see the following page from the Census Bureau’s history site: https://www.census.gov/about/history/historical-censuses-and-surveys/census-programs-surveys/economic-census.html
For more information on new content for the 2022 Economic Census and changes from the 2017 Economic Census, see the following page: https://www.census.gov/programs-surveys/economic-census/year/2022/news-updates/whats-new.html.
Bechtel, Steeg Morris, and Thompson. 2015. “Using Classification Trees to Recommend HDI Methods: A Case Study”. Proceedings of the FCSM Research Conference.
Knutson, Thompson, and Thompson. 2017. “Developing Variance Estimates for Products in the Economics Census. Proceedings of the Governments Statistics Section, American Statistical Association.
Harris-Kojetin, B.A., Alvey, W.L., Carson, L., Cohen, S.B. and others. (2005). Report on Statistical Disclosure Limitation Methodology. Statistical Policy Working Paper no. 22, Federal Committee on Statistical Methodology. https://www.fcsm.gov/assets/files/docs/spwp22WithFrontNote.pdf
Xie, X. and Meng, X.L. 2017. “Dissecting Multiple Imputation from a Multi-Phase Inference Perspective:What Happens When God’s, Imputer’s and Analyst’s Models Are Uncongenial?”, Statistica Sinica 27(4): 1485-1544. doi:10.5705/ss.2014.067
Zhou, H., Raghunathan, T., and Elliot, M. 2012. “A Semi-Parametric Approach to Account for Complex Designs in Multiple Imputation”. Proceedings of the FCSM Research Conference.