end of header

Business Formation Statistics

You are here: Census.govBusiness & EconomyBusiness Formation Statistics Methodology
Skip top of page navigation  

Methodology

SPECIAL NOTE ABOUT THE WEEKLY BFS METHODOLOGY: Weekly BFS series are created using the same methodology as the Monthly BFS estimates described below. The weekly data provide timely and granular information on the state of the economy but appropriate caution is required in interpreting fluctuations since high-frequency weekly data are subject to fluctuations from seasonal factors including holidays and beginning and end of calendar year effects.

The weekly BFS data are not seasonally adjusted. For a limited time, BFS data will be provided at the national, regional, and state levels. No business formation series are created on a weekly basis. Due to the rounding, weekly series may exhibit less variability than the actual series, especially for states with a small number of business applications. The rounded weekly BFS series will not add up to the Monthly BFS estimates.


Introduction

The Business Formation Statistics (BFS) are a product of the U.S. Census Bureau developed in research collaboration with economists affiliated with Board of Governors of the Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame.

The Business Formation Statistics (BFS) provide timely and high frequency data on business applications and employer business formations. The BFS measure business initiation activity (Business Application Series) as indicated by applications for an Employer Identification Number (EIN) on the IRS Form SS-4. The BFS also provide information on actual and projected employer business formations (Business Formation Series) that originate from these applications, based on the record of first payroll tax liability for an EIN. In addition, the BFS contain measures of delay in business starts as indicated by the average duration between the application for an EIN and the transition to an employer business.

The BFS currently cover the period starting from the July of 2004 (2004 JUL) onwards at a monthly frequency. The data are available at the national, regional, and state levels. Data are also available at the national level by 2-digit NAICS sector. For a limited time, BFS data will be provided weekly at the national, regional, and state levels.


Data Sources

The data for the BFS come from three main sources. The data on business applications are based on applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Employer business formations originating from these business applications are identified using the Census Bureau's Business Register (BR) and the Longitudinal Business Database (LBD), which together provide information on the timing of first payroll tax filing for a business based on tax records. The BR is the Census Bureau's main sampling frame for the universe of U.S. businesses and contains quarterly payroll and employment information for employer businesses. The LBD is constructed by linking annual snapshot files from the BR to provide a longitudinal history for each business establishment (Jarmin and Miranda (2002)). Through these linkages, the LBD is able to provide information on the first-ever appearance of a business in the BR as a business with payroll or employment.


Concepts and Methodology

These series describe the business applications for tax IDs as indicated by applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Business applications are presented in four different series reflecting different subsets of the applications for an EIN. All business applications series cover the period from 2004 JUL onwards.

  • Business Applications (BA): The core business applications series that correspond to a subset of all applications for an EIN. Includes all applications for an EIN, except for applications for tax liens, estates, trusts, or certain financial filings, applications outside of 50 states and DC or with no state-county geocodes, applications with certain NAICS codes in sector 11 (agriculture, forestry, fishing and hunting) or 92 (public administration) that have low transition rates, and applications in certain industries (e.g. private households, civic and social organizations).
  • High-Propensity Business Applications (HBA): Business Applications (BA) that have a high propensity of turning into businesses with payroll. The identification of high-propensity applications is based on the characteristics of applications revealed on the IRS Form SS-4 that are associated with a high rate of business formation. High-propensity applications include applications: (a) for a corporate entity, (b) that indicate they are hiring employees, purchasing a business or changing organizational type, (c) that provide a first wages-paid date (planned wages); or (d) that have a NAICS industry code in manufacturing (31-33), a portion of retail (44), health care (62), or accommodation and food services (72).
  • Business Applications with Planned Wages (WBA): High-Propensity Business Applications (HBA) that indicate a first wages-paid date on the IRS Form SS-4. The indication of a wages-paid date is associated with a high likelihood of transitioning into a business with a payroll.
  • Business Applications from Corporations (CBA): High-Propensity Business Applications (HBA) from a corporation or personal service corporation, based on the legal form of organization stated in the IRS Form SS-4. Similar to the WBA series, this series is important primarily because it consists of a set of applications that have a high rate of transitioning into businesses with payroll.

The following is a Venn diagram of the relationship between the four business applications series (BA, HBA, WBA, CBA) and EIN applications.


The Relationship Between Different Business Applications Series



Business Formation Series

These series describe employer business formations as indicated by the first instance of payroll tax liabilities for the corresponding business applications. The business formation series are forward-looking in the sense that they measure new business formations from the month of business application in any given quarter. Two series are provided: the first describes transitions within the next four quarters (12 months), and the second within the next eight quarters (24 months). Payroll information is only available on a quarterly basis so it is only possible to look ahead in terms of quarters. All business formation series start in 2004 JUL, the earliest month for which the data on business applications is available.

  • Business Formations within 4 Quarters (BF4Q): This series provides the number of employer businesses that originate from Business Applications (BA) within four quarters from the month of application. By definition, the end-point of this series is determined by the most recent quarter for which the administrative data identifies employer business startup activity based on first payroll observation.
  • Projected Business Formations within 4 Quarters (PBF4Q): The projected number of employer businesses that originate from Business Applications (BA) within four quarters from the month of application. The projections are based on an econometric model that generates estimates of the likelihood that a business application turns into an employer business. For the details of the model, see the working paper. The projected business formation series cover the period for which the actual number of business formations within 4 quarters is not yet available. Combining the projected series with the actual business formations (the BF4Q series) results in an up-to-date, forward-looking business formation series. This series is rounded to integer values.
  • Spliced Business Formations within 4 Quarters (SBF4Q): This series combines (splices) BF4Q and PBF4Q to provide the entire time series for the actual and projected business formations within 4 quarters. This series is rounded to integer values.
  • Business Formations within 8 Quarters (BF8Q): The number of employer businesses that originate from Business Applications (BA) within eight quarters from the month of application, similar to the BF4Q series. Again, the end-point of this series is determined by the most recent quarter for which the administrative data identifies employer business startup activity based on first payroll observation.
  • Projected Business Formations within 8 Quarters (PBF8Q): The projected number of employer businesses that originate from Business Applications (BA) within eight quarters from the month of application, similar to the PBF4Q series. The projected business formation series cover the period for which the actual business formations within 8 quarters are not yet available. This series is rounded to integer values.
  • Spliced Business Formations within 8 Quarters (SBF8Q): This series combines (splices) BF8Q and PBF8Q to provide the entire time series for the actual and projected business formations within 8 quarters. This series is rounded to integer values.
  • Average Duration (in Quarters) from Business Application to Formation within 4 Quarters (DUR4Q): A measure of delay between business application and formation, measured as the average duration (in quarters) between the quarter in which the month of business application falls and the quarter of business formation, conditional on business formation within four quarters. This series spans the same period as BF4Q and is rounded to two decimal places.
  • Average Duration (in Quarters) from Business Application to Formation within 8 Quarters (DUR8Q): A measure of delay between business application and formation, similar to the DUR4Q series. The difference is that the window for business formation is restricted to eight quarters, rather than four. This series spans the same period as BF8Q and is rounded to two decimal places.

Modelling Projected Business Formation Series

The information submitted by applicants in the IRS Form SS-4 for an EIN application is used to model employer projected business formation for the U.S. economy as a whole and for individual states. Let Ngt be the number of new applications in a geographic region g (e.g., a state or the entire U.S.) in quarter t. The total number of business formations that occur between quarters t and t + k from these applications is then given by

where Iigt+k is a realization of a Bernoulli random variable that governs whether application i turns into an employer business by the end of quarter t + k. The probability distribution function for Iigt+k is given by

where Pigt+k is the probability that application igt turns into an employer business between quarters t and t + k. Then, the expected number of business formations can be written as

To estimate E[Sgt+k], an estimate of Pigt+k is needed. Towards that goal, one can model Iigt+k as a function of application-level variables, Zigt, provided as part of an EIN application in the IRS Form SS-4 and a set of unknown parameters, ßgt. Using a Linear Probability Model (LPM), the probability of an application transitioning to an employer can then be estimated as

where F is a linear function, and  is an estimate of the unknown parameters, ßgt, based on the LPM. The predicted application-level probabilities, , can be used to construct an estimate of the expected number of business formations, E[Sgt+k], as

This approach amounts to reweighting each application by the predicted probability that the application becomes an employer business between quarters t and t + k. In the analysis, k is set to either four or eight, corresponding to four and eight quarters, respectively. The four and eight quarter windows were chosen to allow a long enough time for an application to become an employer business and cover a majority of transitions to employer business. These choices prevent a significant loss of information due to right censoring - some applications transition beyond the four or eight quarter window. The estimated expected number of business formations are used to generate the series Projected Business Formations within 4 Quarters (PBF4Q) and Projected Business Formations within 8 Quarters (PBF8Q). For further details on the estimation methodology, see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda, and Stevens (2018).


NAICS Improvement

The Census Bureau classifies BFS data by industry using the North American Industry Classification System (NAICS). NAICS codes are assigned using a variety of sources. A Census Bureau-developed automated industry-coding program first attempts to assign NAICS codes to all new EIN applications received from the IRS. The automated industry-coding program is based on established patterns in the business name and descriptions provided on the EIN applications. This auto-coding process assigned NAICS codes to over 80% of all incoming EIN applications in 2020. For applications that did not receive a NAICS code during the auto-coding process, BFS staff use a Census-developed machine learning algorithm to assign NAICS codes where possible. NAICS codes are revised each year as part of the BFS annual update process when more accurate and detailed NAICS codes may be available from the Social Security Administration, the Bureau of Labor Statistics, and the Census Bureau's Business Register. There are a small number of EIN applications where there is not enough information available to assign a NAICS code through any source.

More information on the Census Bureau automated industry-coding program here. More information on the machine learning algorithm here.


Comparability with Other Data

The Business Dynamic Statistics (BDS) program of the Census Bureau also provides information on new employer businesses at annual frequency. However, there are some key differences in how the BDS and BFS account for new business formation. First, the BDS use employment rather than payroll to identify new businesses. Employment in the BDS is a point-in-time measure. The BDS capture employment as of the payroll week covered by March 12 of the year. The BFS, by contrast, use the presence of payroll as a measure of business formation activity. In addition, the BFS are based on a quarterly measure of payroll within each year. The quarterly frequency leads to timing differences with respect to the BDS in the identification of business startups that hired their first employee after the payroll week of March 12. Second, because of left censoring in the business applications, the BFS do not account for employer business formations that originate from EIN applications dated before 2004 JUL. This effect, however, dissipates toward the end of the sample period, as nearly all business formations eventually tend to arise from business applications made since 2004 JUL. For these reasons, the BDS annual count of new employer businesses do not exactly match the corresponding count in BFS, but they track each other closely.


Reliability of the Data

Because the BFS are constructed using a combination of administrative data, rather than a probability sample, sampling error does not apply to the BFS. Non-sampling error, however, still exists. Non-sampling errors can occur for many reasons, such as the employer submitting corrected payroll or employment data after the end of the year as well as late filers. Other sources of error include typographical errors made by businesses when providing information on the survey or administrative forms. Such errors, however, are likely to be distributed randomly throughout the dataset.

There is also projection error in the projected number of business formations based on the econometric models. It is possible to provide measures of error and confidence bands for the projected number of business formations, and such measures will be considered for future versions of the BFS.

Changes in administrative data sometimes can also create complications in identifying business startups with payroll. The Longitudinal Business Database (LBD) addresses these issues in detail in order to avoid overstating business openings (Jarmin and Miranda (2002)). The BFS are subject to periodic changes based on corrections to the LBD due to updates coming from the new BR files. Such changes will reflect themselves on actual and projected business formation series on an annual basis once the BFS are revised based on the updated LBD-based firm birth information. There are also some changes in the content of the IRS Form SS-4 over time, and new information in the form is incorporated in to the analysis as it becomes available.


Seasonal Adjustment

Seasonal adjustment is the process of estimating and removing seasonal effects from a time series to better reveal certain nonseasonal features. Examples of seasonal effects include a July drop in automobile production as factories retool for new models and increases in heating oil production during September in anticipation of the winter heating season. When applicable, we also estimate and remove trading day effects and moving holiday effects (e.g., Easter, Labor Day, etc.) during the seasonal adjustment process. Trading day effects are recurring effects related to the weekday composition of the month. Because of strong seasonality detected in most of the business application and formation series, all series are provided with and without seasonal adjustment. In the case of the duration series (DUR4Q and DUR8Q), seasonality is not significant in general. Therefore, no seasonally adjusted duration series are provided. All data, with the exception of the industry data and the weekly data, are seasonally adjusted at the state level and summed to create seasonally adjusted United States total and regional data. Industry data, except for Utilities (22), is seasonally adjusted at the national level to create adjusted United States totals. During the seasonal adjustment process, industry-level estimates are raked to the United States total, in order to ensure consistency with total estimates at the national level. In some cases, this raking process forces HBA to be greater than BA. Seasonal adjustment is performed concurrently using the X-13ARIMA-SEATS seasonal adjustment program of the U.S. Census Bureau. Concurrent seasonal factors result from re-estimating the seasonal adjustment each month or quarter when the new time series values become available. For more information on X-13ARIMA-SEATS, see the reference manuals posted on the Census Bureau's website. An assumption underlying the seasonal adjustment process is that the original series can be separated into a seasonal component, a trend-cycle component, and an irregular component, and possibly a trading day component and/or moving holiday component. The seasonally adjusted series consists of the trend-cycle and irregular components taken together. The trend-cycle component includes the long-term trend and the business cycle. The irregular component is made up of residual variations, such as the sudden impact of political events and the effects of strikes, unusual weather conditions, reporting and sampling errors, etc. Users can implement their own seasonal adjustment methods using the unadjusted data. User's results may differ from those published due to rounding. The Census Bureau rounds in the final step after running seasonal adjustment.

Disclosure Avoidance

Disclosure is the release of data that reveals information or permits deduction of information about a particular survey unit through the release of either tables or microdata. Disclosure avoidance is the process used to protect each unit's identity and data from disclosure. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put information at risk of disclosure. Although it may appear that a table shows information about a specific unit, the Census Bureau has taken steps to disguise or suppress a unit's data that may be "at risk" of disclosure while making sure the results are still useful.

The Census Bureau has reviewed the monthly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY21-094). The Census Bureau has reviewed the weekly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY20-214) The Census Bureau has reviewed quarterly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY20-115).

For the annual counts of state by county business applications, the Census Bureau implemented differentially private geometric noise into counts for all counties. For theory and development in differential privacy, refer to (Haney et al. 2017) and references therein. In addition to injecting noise, the Census Bureau rounded up negative values to zero and made invariant the number of BAs within each state. Since previously released state totals are available and post-processing outputs does not degrade the privacy, we calibrate the county BA counts to the published state totals. There is no sampling weight and the global sensitivity for the count data is one, we use a privacy budget of 0.25. The Census Bureau has reviewed the annual county level data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY20-417).



References

Jarmin, Ron and Javier Miranda (2002). The Longitudinal Business Database

July 16, 2002


Bayard, Kim, Dinlersoz, Emin, Dunne, Timothy, Haltiwanger, John, Miranda, Javier and John Stevens (2018). Early-Stage Business Formation: An Analysis of Applications for Employer Identification Numbers

February 14, 2018


Dinlersoz, Emin, Dunne, Timothy, Haltiwanger, John and Veronika Penciakova (2021). Business Formation: A Tale of Two Recessions

January 2021


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe. This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
Source: U.S. Census Bureau | Business Formation Statistics | 301-763-2000 |   Last Revised: February 22, 2021