Advance Report on
Characteristics of Employer Business Owners: 2002

Introductory Text


This report, Advance Report on Characteristics of Employer Business Owners: 2002, provides economic and demographic characteristics for the owners of businesses with paid employees operating in the United States. The preliminary estimates in this report are based on responses to the 2002 Survey of Business Owners (SBO). The SBO was conducted as part of the 2002 Economic Census. This is the first survey requesting information about business owners since the 1992 Characteristics of Business Owners (CBO) survey.

Separate reports for minority- and women-owned businesses will be issued over the next 18 months and will include number of firms, sales and receipts, number of paid employees, and annual payroll. Data will also be presented by geographic area, industry, and size of business. The characteristics data presented in this report will be updated in subsequent reports.


The results of the survey show that out of 7.7 million owners of employer businesses, 71 percent are male and 26 percent female (3 percent did not report gender). Of those responding to the survey, 88 percent identified themselves as White; 6 percent as Asian; 2 percent as Black; 0.5 percent as American Indian or Alaska Native; and 0.1 percent as Native Hawaiian or Other Pacific Islander. Owners had the option of selecting more than one race and are included in each race they selected. Four percent identified themselves as Hispanic (who can be of any race).

Nearly half of all owners of American businesses own 51 percent or more of the interest or equity in the business; 75 percent are male, 21 percent are female, and 4 percent did not report their gender. About 52 percent of the business owners own 50 percent or less of the interest or equity in the business, of which 67 percent are male and 31 percent are female, and 2 percent did not report their gender.

All firms operating during 2002, except those classified as: agricultural production; domestically scheduled airlines; railroads; U.S. Postal Service; mutual funds (except real estate investment trusts); religious grant operations; private households and religious organizations; public administration; and government are represented in this survey. The lists of all firms (or universe) are compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtains electronic files from the Internal Revenue Service (IRS) for all companies filing IRS Form 1040, Schedule C (individual proprietorship or self-employed person); 1065 (partnership); any one of the 1120 corporation tax forms; and 941 (Employer's Quarterly Federal Tax Return). The IRS provides certain identification, classification, and measurement data for businesses filing those forms.

For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2002 Economic Census.

To design the 2002 SBO sample, the Census Bureau used administrative data from the Social Security Administration and several other sources of information to estimate the probability that a business was minority- or women-owned:

These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:

The SBO universe was stratified by state, industry, the inferred race code (based on the estimated probabilities of ownership by race), and whether the company had paid employees in 2002. The Census Bureau selected large companies (based on volume of sales, payroll, or number of paid employees) "with certainty." All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The sampling rate was lowest in states and industries with the greatest number of firms. A similar methodology was used to select a sample from the remaining universe. The purpose of this was to estimate the number of firms owned by persons of minority ancestry when no indication of minority ownership was found from any of the sources listed above.

A firm selected into the sample was mailed one of two questionnaires. The Census Bureau sent the SBO-1 questionnaire to partnerships and corporations. The businesses were asked to report the percentage of ownership, gender, race, ethnicity, and several characteristic questions (e.g., age, education level) for each of the largest three owners. The SBO-2 questionnaire was used for sole proprietors and self-employed individuals. The businesses were asked essentially the same information as asked on the SBO-1, but limited to two owners.

Tabulation. For this tabulation, only records of companies with paid employees (i.e., employer firms) were used. For each responding company record, a record was created for each owner, where the company return provided a non-zero percentage of ownership. The other characteristics of the owner were also carried to this record. Returned SBO-1 forms could create up to three owner records, while returned SBO-2 forms generated at most two. If the percentage of ownership was not reported or set to "0," no owner record was created for that owner. If the company did not report percentage of ownership for any owner, no owner records were created for that company.

Multiple race reporting was allowed, so it was possible for an owner to be classified in more than one race group (e.g., White and Black). Records could also be tabulated in multiple rows. For example an Asian Hispanic male veteran owner would have his information tabulated on the Asian, Hispanic, Male, and Veteran lines of each table. However, such a record was counted only once in the total or All Owners line of the publication. Data are tabulated by owner. Each owner record is inflated using the sampling weight assigned to its corresponding company record. Sampling weights are the inverse of the probability of selection. In addition, these weights are inflated by an adjustment factor to compensate for nonresponding records. Adjustment factors are calculated at the sampling stratum level.


The figures shown in this report are, in part, estimated from a sample and will differ from the figures which would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured errors. The following is a description of the sampling and nonsampling errors associated with this tabulation.

Sampling variability. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error is a measure of the variability among the estimates from all possible samples. The estimated relative standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.

The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimate of American Indian owners with a high school education or less is 29 percent and the estimated relative standard error is 1.3 percent. An approximate 90 percent confidence interval is 29 ± (1.6 x 1.3) or 27.0 to 31.0 percent.

Nonsampling errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable to many sources, including the inability to obtain information for all cases in the universe, adjustments to the weights of respondents to compensate for nonrespondents, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.

Explicit measures of the effects of these nonsampling errors are not available. However, it is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.

This preliminary tabulation includes about 1 million employer forms. The response data have undergone a preliminary analysis that is currently ongoing. This tabulation was also done prior to any imputation for missing responses to the gender, ethnic, or race items. Because the response database was not final at the time of tabulation, similar tabulations on the final “official” response database may yield slightly different results.