U.S. flag

An official website of the United States government

Skip Header


The Census Bureau provides detailed information about the methods used to collect and produce statistics, including sampling, eligibility, questions, data collection and processing, data quality, review, weighting, estimation, coding operations, etc.

  • 2012
  • 2007
  • 2002
  • 1997
  • 1992

Sources of the Data

The U.S. Census Bureau is working towards full electronic data collection. To further this effort, the 2012 Survey of Business Owners (SBO) was selected to implement a letter-only initial mailing. This mailing consisted of a single-page letter asking respondents to report electronically to this survey.

Letters were mailed to a random sample of businesses selected from a list of all firms operating during 2012 with receipts of $1,000 or more except those classified in the following NAICS industries:

  • Crop and Animal Production (NAICS 111 and 112)
  • Rail Transportation (NAICS 482)
  • Postal Service (NAICS 491)
  • Monetary Authorities-Central Bank (NAICS 521)
  • Funds, Trusts, and Other Financial Vehicles (NAICS 525)
  • Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)
  • Private Households (NAICS 814)
  • Public Administration (NAICS 92)

The list of all firms (or universe) was compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtained electronic files from the Internal Revenue Service (IRS) for all companies reporting any business activity on any one of the following 2012 IRS tax forms:

  • 1040 Schedule C, "Profit or Loss from Business" (Sole Proprietorship)
  • 1065, "U.S. Return of Partnership Income"
  • any one of the 1120 corporation tax forms
  • 941, "Employer's Quarterly Federal Tax Return"
  • 944, "Employer's Annual Federal Tax Return"

The IRS provided certain identification, classification, and measurement data for businesses filing those forms.

For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2012 Economic Census.

The use of direct data substitution from existing sources, such as the American Community Survey (ACS) and the Decennial Census, reduced the 2012 SBO sample size to 1.75 million firms from 2.3 million in 2007. By substituting existing data sources, the smaller sample size yielded sufficient quality, reduced overall respondent burden, and further reduced printing, mailing, and processing costs.

Reporting instruments were tested by conducting cognitive interviews. Approximately half of the businesses selected for the 2012 SBO were asked to respond electronically to the new 2012 SBO-2 form that asked 39 fewer questions than the 2012 SBO-1 form. The SBO-2 form, covering the key topics of gender, ethnicity, race, and veteran status, was made possible because business and owner characteristics questions are only published at the national level.

Two follow-up mailings occurred at six-week intervals. "Letter-only" follow-ups were sent to nonrespondents with a prior history of having reported electronically to either the 2007 SBO or the 2012 Economic Census. English- or Spanish-language paper forms were sent to nonrespondents upon request. The remaining nonrespondents were randomly assigned follow-up mailing packages, with one-third of each mail follow-up group receiving paper forms instead of "letter-only" mailings.

An example of the initial mailout letter for electronic reporting and informational copies of the paper forms are available on the Questionnaire page.

All forms were geographically coded, data-keyed when needed, and edited. The editing process interactively performed corrections by using standard procedures to fix detectable errors.

The data were then tabulated by the 2012 NAICS, subjected to further data analysis, and the resulting corrections applied to individual data records. Tabulations were then produced for the final published results available through American FactFinder (AFF), the Census Bureau's online, self-service data access tool. Later, the data was transitioned from AFF to the more centralized Census data platform, data.census.gov.

A more detailed examination of census methodology is presented in the History of the Economic Census.

Industry Classification of Firms

A firm is a business organization or entity consisting of one domestic establishment (location) or more under common ownership or control. All establishments are included as part of the owning or controlling firm. For the economic census, the terms "firm" and "company" are synonymous.

The industry classifications for all firms are based on the 2012 North American Industry Classification System (NAICS). Changes between 2007 and 2012 are published online at:

Firms with more than one domestic establishment are counted in each industry and geographic area in which they operate, but only once in the total for all sectors and the totals at the national and state levels. The method of assigning classifications and the level of detail at which single- or multi-unit firms were classified depends on whether an economic census report form was obtained at the establishment level.

  1. In-scope establishments that returned an economic census report form were classified based on:
    1. Their self-designated kind of business
    2. Product sales or shipments
    3. Responses to other industry-specific inquiries
  2. In-scope establishments without an economic census report form:
    1. Small employers not sent a form were classified on the basis of the most current kind-of-business or industry classification available from one of the Census Bureau's current sample surveys or the 2012 Economic Census. Otherwise, the classification was obtained from administrative records of other federal agencies. If the census or administrative record classifications proved inadequate (none corresponded to a 2012 Economic Census classification in the detail required for employers), the firm was sent a brief inquiry requesting information necessary to assign a kind-of-business or industry code.
    2. Nonemployers were classified on the basis of information obtained from administrative records of other federal agencies.

Precautions in Analyzing and Interpreting Data

The SBO covers both firms with paid employees and firms with no paid employees. Although firms with no paid employees are included in this survey, they are omitted from many other economic surveys. Therefore, caution should be exercised in comparing SBO data with published or unpublished data from other Census Bureau economic survey results.

All survey and census results contain measurement errors and may contain sampling errors. Information about these potential errors is provided or referenced with the data or the source of the data. The Census Bureau recommends that data users incorporate this information into their analyses as these errors could impact inferences. Researchers analyzing data to create their own estimates are responsible for the validity of those estimates and should not cite the Census Bureau as the source of the estimates but only as the source of the core data.

Please contact the Census Bureau for more detailed information and interpretation of the sampling and nonsampling errors.

Basis of Reporting

The economic census is conducted on an establishment basis. A company operating at more than one location is required to file a separate report for each store, factory, shop, or other location. Each establishment is assigned a separate industry classification based on its primary activity and not that of its parent company. For selected industries, only payroll, employment, and classification are collected for individual establishments, while other data are collected on a consolidated basis.

The SBO is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments under its ownership or control at the end of 2012.

Sampling and Estimation Methodologies

Sampling. To design the 2012 SBO sample, the Census Bureau used the following sources of information to estimate the probability that a business was minority- or women-owned:

  • Administrative data from the Social Security Administration
  • Lists of minority- and women-owned businesses published in syndicated magazines, located on the Internet, or disseminated by trade or special interest groups
  • Word strings in the company name indicating possible minority ownership (derived from 2007 survey responses)
  • Racial distributions for various state-industry classes (derived from 2007 survey responses) and racial distributions for various ZIP Codes
  • Gender, ethnicity, race, and veteran status responses of a single-owner business to a previous SBO or to the 2010 Decennial Census

These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:

  • American Indian
  • Asian
  • Black or African American
  • Hispanic
  • Non-Hispanic white men
  • Native Hawaiian and Other Pacific Islander
  • Other (a different race was supplied as a write-in to another source)
  • Publicly owned
  • Women

The SBO universe was stratified by state, industry, frame, and whether the company had paid employees in 2012. The Census Bureau selected large companies, including those operating in more than one state, with certainty. These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.

Each firm selected into the sample was asked the percentage of ownership, gender, ethnicity, race, and veteran status for up to four persons owning the largest percentages in the business. Approximately half of these firms were asked additional characteristic questions (e.g., age, education level).

Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:

  • All firms classifiable by gender, ethnicity, race, and veteran status
    • Gender
      • Female-owned
      • Male-owned
      • Equally male-/female-owned
    • Ethnicity
      • Hispanic
        • Mexican, Mexican American, Chicano
        • Puerto Rican
        • Cuban
        • Other Hispanic, Latino, or Spanish origin
      • Equally Hispanic/non-Hispanic
      • Non-Hispanic
    • Race
      • White
      • Black or African American
      • American Indian and Alaska Native
      • Asian
        • Asian Indian
        • Chinese
        • Filipino
        • Japanese
        • Korean
        • Other Asian
      • Native Hawaiian and Other Pacific Islander
        • Native Hawaiian
        • Samoan
        • Guamanian or Chamorro
        • Other Pacific Islander
      • Some other race
      • Minority
      • Equally minority/nonminority
      • Nonminority
    • Veteran status
      • Veteran-owned
      • Equally veteran-/nonveteran-owned
      • Nonveteran-owned
  • Publicly held and other firms not classifiable by gender, ethnicity, race, and veteran status

Businesses could be tabulated in more than one racial group. This can result because:

  1. The sole owner was reported to be of more than one race.
  2. The majority owner was reported to be of more than one race.
  3. A majority combination of owners was reported to be of more than one race.

The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.

The sum of the detailed Hispanic origin may not add to the total because no one Hispanic subgroup (i.e., Mexican, Puerto Rican, Cuban, or Other Hispanic, Latino, or Spanish origin) owned a majority of the firm, but a combination of these subgroups did own a majority. In this case, the firm was included in the Hispanic estimate, but was not included in any of the subgroup estimates. For example, if a firm had two owners each with equal ownership, one responding Puerto Rican and the other responding Cuban, there is no one subgroup with a majority ownership, but the firm is Hispanic-owned. This firm would be tabulated in the Hispanic estimate, but would not appear in any of the subgroup estimates.

Also, the subgroup detail for both Asians and Native Hawaiians and Other Pacific Islanders may not add to the total for similar reasons as explained above.

For the tabulations by gender, ethnicity, race, and veteran status, the data for each firm in the SBO sample were weighted by the reciprocal of the firm's probability of selection.

Reliability of Estimates

The figures shown in these datasets are, in part, estimated from a sample and will differ from the figures that would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured error. The following is a description of the sampling and nonsampling errors associated with this tabulation.

Sampling Variability. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error and standard error are measures of the variability among the estimates from all possible samples. The estimated relative standard errors and estimated standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors and standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.

The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 + or - 1,654 or the range 50,053 to 53,361.

For the Characteristics of Businesses and Characteristics of Business Owners datasets, some data are expressed as percentages with standard errors rather than relative standard errors as indicated above. Construction of the confidence interval is illustrated by the following example:

Example of a confidence interval for percentage data. Suppose the estimate is 76.9 and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.

Nonsampling Errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable from various sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.

While explicit measures of the effects of these nonsampling errors are not available, adjustments are made to the published relative standard errors to account for error associated with imputation of missing data. It is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.

Unpublished Estimates. Some unpublished estimates can be derived directly from datasets by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to poor response, high sampling variability, or other factors that may make them potentially misleading. Individuals who use estimates in datasets to create new estimates should cite the Census Bureau as the source of only the original estimates.

Comparability of the 2012 and 2007 SBO Data

The following changes were made to the survey methodology for 2012 which affect comparability with past reports:

  1. The Census Bureau expanded the use of direct data substitution from existing sources, such as the American Community Survey (ACS) and the 2010 Decennial Census, to reduce the SBO sample size, mailing and processing costs, and respondent burden.
  2. Due to this reduction in the number of businesses receiving the full list of questions, the 2012 standard errors of the estimates in the Characteristics of Businesses and Characteristics of Business Owners can be larger than in 2007. As such, there are a greater number of differences in the characteristic estimates between 2007 and 2012 than there would have been if every sampled business in 2012 received a full questionnaire.
  3. For the 2012 SBO, nonrespondents could request a Spanish-language version of the paper forms; whereas for 2007, there was no translation of the report form into Spanish.
  4. The first eight questions on the 2007 SBO-1 form were reorganized into three questions on the 2012 SBO-1 and SBO-2 forms to improve navigation and to better identify publicly held and other firms not classifiable by gender, ethnicity, race, and veteran status. Examples of such unclassifiable firms include: business subsidiaries, employee stock ownership plans (ESOPs), cooperatives or clubs, estates, trusts, tribally owned firms, nonprofit organizations, and businesses with no individual owning 10 percent or more of the rights, claims, interests, or stock.
  5. The 2012 SBO omitted the 2007 SBO questions that asked if a franchiser owned 50 percent of the business and if the business made purchases online.
  6. To eliminate confusion for business owners born to American citizens overseas, the foreign-born question that asked if the owner was born in the United States on the 2007 SBO-1 form was replaced by a new 2012 SBO question that asked if the owner was born a citizen of the United States.
  7. For the 2012 SBO, the veteran question was revised and expanded to collect information on whether the veteran was service-disabled, served on active duty or as a reservist during the survey year, served on active duty at any time, and served active duty after September 11, 2001. The revised and expanded wording for the veteran categories and the collection of the additional service characteristics reflect input received during consultations with many leaders in the veteran community. Input was received from, among others, the Department of Defense, the Veterans Administration, the Bureau of Labor Statistics, the U.S. House of Representatives Committee on Veterans’ Affairs, the Senate Committee on Veterans’ Affairs, the Small Business Administration, the American Legion, the Veterans Entrepreneurship Task Force (VET-Force), and the American Veterans (AMVETS).
  8. For the 2012 SBO, interest from researchers on the possible correlation between intellectual property rights and business success led to the addition of a question asking whether the business owned a copyright, trademark, granted patent, or a pending patent.
  9. For the 2012 SBO, if a respondent entered a Hispanic or Latino ethnicity in one of the race write-in boxes, the case was categorized as "Some other race;" whereas in the same situation for the 2007 SBO, the case was categorized as "White." This change was incorporated to be consistent with how these write-ins were treated in the 2010 Decennial Census.
  10. For the 2012 SBO, the use of administrative data for direct substitution may have affected the equally owned estimates. Direct substitution results in only a single owner being assigned to a firm. As such, it is not possible for these firms to be classified as equally owned. Analysis by the Census Bureau indicates a reduced proportion of equally owned firms relative to non-equally owned firms in line with expectations due to the methodology.

Treatement of Nonresponse

Approximately 66.2 percent of the 1.75 million businesses in the SBO sample responded to the survey, compared to 62 percent for the 2007 survey. For the 2012 survey, 69.4 percent of the companies in the SBO sample returned a questionnaire or submitted an online response, but 4.6 percent of submissions did not contain enough information to be considered a response for the estimates by gender, ethnicity, race and veteran status.

Of the 2012 SBO nonrespondents, about 6.6 percent responded to the 2007 SBO. For these firms, data from the 2007 survey were used in place of the missing 2012 SBO responses. Administrative data were used where available for all other nonrespondents. After the use of administrative data, for the remaining nonrespondents, gender, ethnicity, race and veteran status were imputed from donor respondents in the same sampling frame with similar characteristics (state, industry, employment status, and size). Because the assignment of businesses to sampling frames relies heavily on administrative data, and there is a high level of agreement between sampling frame assignment and tabulated race or ethnicity for responding firms, the donor imputations are considered to be reliable. Estimates of sampling variability are adjusted to account for nonresponse. Estimates with high error (for example, relative standard error for sales or receipts of 50 percent or more) are suppressed.

Overall, imputed data accounted for approximately 24 percent of the firm count estimates by gender, ethnicity, race, and veteran status and approximately 21 percent of the estimates of sales.

Firm Size Categories

The firm size categories, both by receipts and employment, are based on the total nationwide receipts and/or employment of the firm.

The receipts and employment of a multi-unit firm are determined by summing the receipts and employment, respectively, of all associated establishments. The receipts size and employment size of a firm are determined by the summed revenue or employment of all associated establishments. The employment size group "0" includes firms for which no associated establishments reported paid employees in the mid-March pay period, but paid employees at some time during the year.

Receipts size and employment size are determined for the entire company. Hence, counterintuitive results are possible, for example, only 100 employees in a category of firms with 500 employees or more in a particular industry.

Data by receipts size of firm are presented by the following receipts size categories:

  • All firms
  • Firms with sales/receipts of less than $5,000
  • Firms with sales/receipts of $5,000 to $9,999
  • Firms with sales/receipts of $10,000 to $24,999
  • Firms with sales/receipts of $25,000 to $49,999
  • Firms with sales/receipts of $50,000 to $99,999
  • Firms with sales/receipts of $100,000 to $249,999
  • Firms with sales/receipts of $250,000 to $499,999
  • Firms with sales/receipts of $500,000 to $999,999
  • Firms with sales/receipts of $1,000,000 or more

Data by employment size of firm are presented by the following employment size categories:

  • All firms
  • Firms with no employees
  • Firms with 1 to 4 employees
  • Firms with 5 to 9 employees
  • Firms with 10 to 19 employees
  • Firms with 20 to 49 employees
  • Firms with 50 to 99 employees
  • Firms with 100 to 499 employees
  • Firms with 500 to 999 employees
  • Firms with 1,000 or more employees

Employer firms include firms with payroll at any time during 2012. Employment reflects the number of paid employees during the March 12 pay period.


Confidentiality. In accordance with federal law governing census reports (Title 13 of the United States Code), no data are published that would disclose the operations of an individual establishment or business. However, the number of firms is not considered a disclosure. Therefore, the number of firms may be released even though other information is withheld. Techniques employed to limit disclosure are discussed at the Census Business Help Site.

The information and data obtained from the Internal Revenue Service, the Social Security Administration, and other sources are also treated as confidential and can be seen only by Census Bureau employees sworn to protect the data from disclosure.

Disclosure Avoidance. Disclosure is the release of data that have been deemed confidential. It generally reveals information about a specific individual or firm or permits deduction of sensitive information about a particular individual or establishment. Disclosure avoidance is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk of disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.

Noise Infusion. The SBO uses noise infusion as the primary method of disclosure avoidance. Noise infusion is a method of disclosure avoidance in which values are perturbed prior to tabulation by applying a random noise multiplier to the magnitude data, such as the sales and receipts for all firms. Disclosure protection is accomplished in a manner that causes the vast majority of cell values to be perturbed by, at most, a few percentage points. For sample-based tabulations, such as SBO, the estimated relative standard error for a published cell includes both the estimated sampling error and the amount of perturbation in the estimated cell value due to noise.

In certain circumstances, some individual cells may be suppressed for additional disclosure avoidance and the data replaced by one of the following characters:

  • N - Not available or not comparable
  • S - Withheld because estimates did not meet publication standards, such as the relative standard error of the sales and receipts is 50 percent or more
  • X - Not applicable

To provide meaningful information for cells that have suppression of sensitive employment data, these characters are used to indicate the employment range for a firm:

  • a - 0 to 19 employees
  • b - 20 to 99 employees
  • c - 100 to 249 employees
  • e - 250 to 499 employees
  • f - 500 to 999 employees
  • g - 1,000 to 2,499 employees
  • h - 2,500 to 4,999 employees
  • i - 5,000 to 9,999 employees
  • j - 10,000 to 24,999 employees
  • k - 25,000 to 49,999 employees
  • l - 50,000 to 99,999 employees
  • m - 100,000 employees or more

For a complete list of all economic programs symbols, see the Economic Census Data Dictionary.

Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?


Back to Header