The Census Bureau provides detailed information about the methods used to collect and produce statistics, including sampling, eligibility, questions, data collection and processing, data quality, review, weighting, estimation, coding operations, etc.
The U.S. Census Bureau is working towards full electronic data collection. To further this effort, the 2012 Survey of Business Owners (SBO) was selected to implement a letter-only initial mailing. This mailing consisted of a single-page letter asking respondents to report electronically to this survey.
Letters were mailed to a random sample of businesses selected from a list of all firms operating during 2012 with receipts of $1,000 or more except those classified in the following NAICS industries:
The list of all firms (or universe) was compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtained electronic files from the Internal Revenue Service (IRS) for all companies reporting any business activity on any one of the following 2012 IRS tax forms:
The IRS provided certain identification, classification, and measurement data for businesses filing those forms.
For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2012 Economic Census.
The use of direct data substitution from existing sources, such as the American Community Survey (ACS) and the Decennial Census, reduced the 2012 SBO sample size to 1.75 million firms from 2.3 million in 2007. By substituting existing data sources, the smaller sample size yielded sufficient quality, reduced overall respondent burden, and further reduced printing, mailing, and processing costs.
Reporting instruments were tested by conducting cognitive interviews. Approximately half of the businesses selected for the 2012 SBO were asked to respond electronically to the new 2012 SBO-2 form that asked 39 fewer questions than the 2012 SBO-1 form. The SBO-2 form, covering the key topics of gender, ethnicity, race, and veteran status, was made possible because business and owner characteristics questions are only published at the national level.
Two follow-up mailings occurred at six-week intervals. "Letter-only" follow-ups were sent to nonrespondents with a prior history of having reported electronically to either the 2007 SBO or the 2012 Economic Census. English- or Spanish-language paper forms were sent to nonrespondents upon request. The remaining nonrespondents were randomly assigned follow-up mailing packages, with one-third of each mail follow-up group receiving paper forms instead of "letter-only" mailings.
An example of the initial mailout letter for electronic reporting and informational copies of the paper forms are available on the Questionnaire page.
All forms were geographically coded, data-keyed when needed, and edited. The editing process interactively performed corrections by using standard procedures to fix detectable errors.
The data were then tabulated by the 2012 NAICS, subjected to further data analysis, and the resulting corrections applied to individual data records. Tabulations were then produced for the final published results available through American FactFinder (AFF), the Census Bureau's online, self-service data access tool. Later, the data was transitioned from AFF to the more centralized Census data platform, data.census.gov.
A more detailed examination of census methodology is presented in the History of the Economic Census.
A firm is a business organization or entity consisting of one domestic establishment (location) or more under common ownership or control. All establishments are included as part of the owning or controlling firm. For the economic census, the terms "firm" and "company" are synonymous.
The industry classifications for all firms are based on the 2012 North American Industry Classification System (NAICS). Changes between 2007 and 2012 are published online at:
Firms with more than one domestic establishment are counted in each industry and geographic area in which they operate, but only once in the total for all sectors and the totals at the national and state levels. The method of assigning classifications and the level of detail at which single- or multi-unit firms were classified depends on whether an economic census report form was obtained at the establishment level.
The SBO covers both firms with paid employees and firms with no paid employees. Although firms with no paid employees are included in this survey, they are omitted from many other economic surveys. Therefore, caution should be exercised in comparing SBO data with published or unpublished data from other Census Bureau economic survey results.
All survey and census results contain measurement errors and may contain sampling errors. Information about these potential errors is provided or referenced with the data or the source of the data. The Census Bureau recommends that data users incorporate this information into their analyses as these errors could impact inferences. Researchers analyzing data to create their own estimates are responsible for the validity of those estimates and should not cite the Census Bureau as the source of the estimates but only as the source of the core data.
Please contact the Census Bureau for more detailed information and interpretation of the sampling and nonsampling errors.
The economic census is conducted on an establishment basis. A company operating at more than one location is required to file a separate report for each store, factory, shop, or other location. Each establishment is assigned a separate industry classification based on its primary activity and not that of its parent company. For selected industries, only payroll, employment, and classification are collected for individual establishments, while other data are collected on a consolidated basis.
The SBO is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments under its ownership or control at the end of 2012.
Sampling. To design the 2012 SBO sample, the Census Bureau used the following sources of information to estimate the probability that a business was minority- or women-owned:
These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:
The SBO universe was stratified by state, industry, frame, and whether the company had paid employees in 2012. The Census Bureau selected large companies, including those operating in more than one state, with certainty. These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.
Each firm selected into the sample was asked the percentage of ownership, gender, ethnicity, race, and veteran status for up to four persons owning the largest percentages in the business. Approximately half of these firms were asked additional characteristic questions (e.g., age, education level).
Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:
Businesses could be tabulated in more than one racial group. This can result because:
The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.
The sum of the detailed Hispanic origin may not add to the total because no one Hispanic subgroup (i.e., Mexican, Puerto Rican, Cuban, or Other Hispanic, Latino, or Spanish origin) owned a majority of the firm, but a combination of these subgroups did own a majority. In this case, the firm was included in the Hispanic estimate, but was not included in any of the subgroup estimates. For example, if a firm had two owners each with equal ownership, one responding Puerto Rican and the other responding Cuban, there is no one subgroup with a majority ownership, but the firm is Hispanic-owned. This firm would be tabulated in the Hispanic estimate, but would not appear in any of the subgroup estimates.
Also, the subgroup detail for both Asians and Native Hawaiians and Other Pacific Islanders may not add to the total for similar reasons as explained above.
For the tabulations by gender, ethnicity, race, and veteran status, the data for each firm in the SBO sample were weighted by the reciprocal of the firm's probability of selection.
The figures shown in these datasets are, in part, estimated from a sample and will differ from the figures that would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured error. The following is a description of the sampling and nonsampling errors associated with this tabulation.
Sampling Variability. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error and standard error are measures of the variability among the estimates from all possible samples. The estimated relative standard errors and estimated standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors and standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.
The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:
Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.
Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 + or - 1,654 or the range 50,053 to 53,361.
For the Characteristics of Businesses and Characteristics of Business Owners datasets, some data are expressed as percentages with standard errors rather than relative standard errors as indicated above. Construction of the confidence interval is illustrated by the following example:
Example of a confidence interval for percentage data. Suppose the estimate is 76.9 and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.
Nonsampling Errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable from various sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.
While explicit measures of the effects of these nonsampling errors are not available, adjustments are made to the published relative standard errors to account for error associated with imputation of missing data. It is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.
Unpublished Estimates. Some unpublished estimates can be derived directly from datasets by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to poor response, high sampling variability, or other factors that may make them potentially misleading. Individuals who use estimates in datasets to create new estimates should cite the Census Bureau as the source of only the original estimates.
The following changes were made to the survey methodology for 2012 which affect comparability with past reports:
Approximately 66.2 percent of the 1.75 million businesses in the SBO sample responded to the survey, compared to 62 percent for the 2007 survey. For the 2012 survey, 69.4 percent of the companies in the SBO sample returned a questionnaire or submitted an online response, but 4.6 percent of submissions did not contain enough information to be considered a response for the estimates by gender, ethnicity, race and veteran status.
Of the 2012 SBO nonrespondents, about 6.6 percent responded to the 2007 SBO. For these firms, data from the 2007 survey were used in place of the missing 2012 SBO responses. Administrative data were used where available for all other nonrespondents. After the use of administrative data, for the remaining nonrespondents, gender, ethnicity, race and veteran status were imputed from donor respondents in the same sampling frame with similar characteristics (state, industry, employment status, and size). Because the assignment of businesses to sampling frames relies heavily on administrative data, and there is a high level of agreement between sampling frame assignment and tabulated race or ethnicity for responding firms, the donor imputations are considered to be reliable. Estimates of sampling variability are adjusted to account for nonresponse. Estimates with high error (for example, relative standard error for sales or receipts of 50 percent or more) are suppressed.
Overall, imputed data accounted for approximately 24 percent of the firm count estimates by gender, ethnicity, race, and veteran status and approximately 21 percent of the estimates of sales.
The firm size categories, both by receipts and employment, are based on the total nationwide receipts and/or employment of the firm.
The receipts and employment of a multi-unit firm are determined by summing the receipts and employment, respectively, of all associated establishments. The receipts size and employment size of a firm are determined by the summed revenue or employment of all associated establishments. The employment size group "0" includes firms for which no associated establishments reported paid employees in the mid-March pay period, but paid employees at some time during the year.
Receipts size and employment size are determined for the entire company. Hence, counterintuitive results are possible, for example, only 100 employees in a category of firms with 500 employees or more in a particular industry.
Data by receipts size of firm are presented by the following receipts size categories:
Data by employment size of firm are presented by the following employment size categories:
Employer firms include firms with payroll at any time during 2012. Employment reflects the number of paid employees during the March 12 pay period.
Confidentiality. In accordance with federal law governing census reports (Title 13 of the United States Code), no data are published that would disclose the operations of an individual establishment or business. However, the number of firms is not considered a disclosure. Therefore, the number of firms may be released even though other information is withheld. Techniques employed to limit disclosure are discussed at the Census Business Help Site.
The information and data obtained from the Internal Revenue Service, the Social Security Administration, and other sources are also treated as confidential and can be seen only by Census Bureau employees sworn to protect the data from disclosure.
Disclosure Avoidance. Disclosure is the release of data that have been deemed confidential. It generally reveals information about a specific individual or firm or permits deduction of sensitive information about a particular individual or establishment. Disclosure avoidance is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk of disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.
Noise Infusion. The SBO uses noise infusion as the primary method of disclosure avoidance. Noise infusion is a method of disclosure avoidance in which values are perturbed prior to tabulation by applying a random noise multiplier to the magnitude data, such as the sales and receipts for all firms. Disclosure protection is accomplished in a manner that causes the vast majority of cell values to be perturbed by, at most, a few percentage points. For sample-based tabulations, such as SBO, the estimated relative standard error for a published cell includes both the estimated sampling error and the amount of perturbation in the estimated cell value due to noise.
In certain circumstances, some individual cells may be suppressed for additional disclosure avoidance and the data replaced by one of the following characters:
To provide meaningful information for cells that have suppression of sensitive employment data, these characters are used to indicate the employment range for a firm:
For a complete list of all economic programs symbols, see the Economic Census Data Dictionary.