Methodology
SPECIAL NOTE ABOUT THE WEEKLY BFS METHODOLOGY: Weekly BFS series are created using the same methodology as the Monthly BFS
estimates described below. The weekly data provide timely and granular information on the state of the economy but appropriate caution
is required in interpreting fluctuations since high-frequency weekly data are subject to fluctuations from seasonal factors including holidays and
beginning and end of calendar year effects.
The weekly BFS data are not seasonally adjusted. For a limited time, BFS data will be provided at the national, regional, and state levels. No business
formation series are created on a weekly basis. Due to the rounding, weekly series may exhibit less variability than the actual series, especially for states
with a small number of business applications. The rounded weekly BFS series will not add up to the Monthly BFS estimates.
Introduction
Business Formation Statistics (BFS) are a product of the U.S. Census Bureau developed in research collaboration with economists affiliated with Board of Governors of the
Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame.
Business Formation Statistics (BFS) provide timely and high frequency data on business applications and employer business formations. BFS measure business initiation
activity (Business Application Series) as indicated by applications for an Employer Identification Number (EIN) on the
IRS Form SS-4. BFS also provide information on actual
and projected employer business formations (Business Formation Series) that originate from these applications, based on the record of first payroll tax liability for an EIN. In
addition, BFS contain measures of delay in business starts as indicated by the average duration between the application for an EIN and the transition to an employer business.
BFS currently cover the period starting from the July 2004 onwards at a monthly frequency. The data are available at the national, regional, state, and county levels. Data are also available at the national level by 2-digit NAICS sector.
For a limited time, BFS data will be provided weekly at the national, regional, and state levels.
Data Sources
The data for BFS come from three main sources. The data on business applications are based on applications for an Employer Identification Number (EIN) through filings of IRS
Form SS-4. Employer business formations originating from these business applications are identified using the Census Bureau's
Business Register (BR) and the
Longitudinal Business Database (LBD), which together provide information
on the timing of first payroll tax filing for a business based on tax records. The BR is the Census Bureau's main sampling frame for the universe of U.S. businesses and contains quarterly
payroll and employment information for employer businesses. LBD is constructed by linking annual snapshot files from the BR to provide a longitudinal history for each business
establishment (Jarmin and Miranda (2002)). Through these linkages, LBD is able to provide information on the first-ever appearance of a business in the BR as a business with payroll
or employment.
Concepts and Methodology
These series describe the business applications for tax IDs as indicated by applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Business applications are
presented in four different series reflecting different subsets of the applications for an EIN. All business applications series cover the period from July 2004 onwards.
- Business Applications (BA): The core business applications series that correspond to a subset of all applications for an EIN. Includes all applications for an EIN, except
for applications for tax liens, estates, trusts, or certain financial filings, applications outside of 50 states and DC or with no state-county geocodes, applications with certain NAICS codes in sector 11 (agriculture, forestry, fishing and hunting) or 92 (public administration) that have low transition rates, and applications in certain industries (e.g. private households, civic and social organizations).
- High-Propensity Business Applications (HBA): Business Applications (BA) that have a high propensity of turning into businesses with payroll. The identification of
high-propensity applications is based on the characteristics of applications revealed on the IRS Form SS-4 that are associated with a high rate of business formation. High-propensity
applications include applications: (a) for a corporate entity, (b) that indicate they are hiring employees, (c) that provide a first wages-paid date (planned wages);
or (d) that have a NAICS industry code in accommodation and food services (72) or in portions of construction (237, 238), manufacturing (312, 321, 322, 332), retail (44, 452),
professional, scientific, and technical services (5411, 5413), educational services (6111), and health care (621, 623).
- Business Applications with Planned Wages (WBA): High-Propensity Business Applications (HBA) that indicate a first wages-paid date on the IRS Form SS-4. The indication
of a wages-paid date is associated with a high likelihood of transitioning into a business with a payroll.
- Business Applications from Corporations (CBA): High-Propensity Business Applications (HBA) from a corporation or personal service corporation, based on the legal form
of organization stated in the IRS Form SS-4. Similar to the WBA series, this series is important primarily because it consists of a set of applications that have a high rate of
transitioning into businesses with payroll.
The following is a Venn diagram of the relationship between the four business applications series (BA, HBA, WBA, CBA) and EIN applications.
The Relationship Between Different Business Applications Series
Business Formation Series
These series describe employer business formations as indicated by the first instance of payroll tax liabilities for the corresponding business applications. The business formation series
are forward-looking in the sense that they measure new business formations from the month of business application in any given quarter. Two series are provided: the first describes transitions
within the next four quarters (12 months), and the second within the next eight quarters (24 months). Payroll information is only available on a quarterly basis so it is only possible to look ahead in terms of quarters. All business formation series start in July 2004, the earliest month for which the data on business applications
is available.
- Business Formations within 4 Quarters (BF4Q): This series provides the number of employer businesses that originate from Business Applications (BA) within four quarters
from the month of application. By definition, the end-point of this series is determined by the most recent quarter for which the administrative data identifies employer business startup activity based on first payroll observation.
- Projected Business Formations within 4 Quarters (PBF4Q): The projected number of employer businesses that originate from Business Applications (BA) within four quarters
from the month of application. The projections are based on an econometric model that generates estimates of the likelihood that a business application turns into an employer business.
For the details of the model, see the working paper. The projected business formation series cover the period for which the actual number of business formations within 4 quarters is not yet
available. Combining the projected series with the actual business formations (the BF4Q series) results in an up-to-date, forward-looking business formation series. This series is rounded
to integer values.
- Spliced Business Formations within 4 Quarters (SBF4Q): This series combines (splices) BF4Q and PBF4Q to provide the entire time series for the actual and projected
business formations within 4 quarters. This series is rounded to integer values.
- Business Formations within 8 Quarters (BF8Q): The number of employer businesses that originate from Business Applications (BA) within eight quarters from the month of
application, similar to the BF4Q series. Again, the end-point of this series is determined by the most recent quarter for which the administrative data identifies employer business startup activity based on first payroll observation.
- Projected Business Formations within 8 Quarters (PBF8Q): The projected number of employer businesses that originate from Business Applications (BA) within eight
quarters from the month of application, similar to the PBF4Q series. The projected business formation series cover the period for which the actual business formations within 8 quarters are not yet
available. This series is rounded to integer values.
- Spliced Business Formations within 8 Quarters (SBF8Q): This series combines (splices) BF8Q and PBF8Q to provide the entire time series for the actual and projected
business formations within 8 quarters. This series is rounded to integer values.
- Average Duration (in Quarters) from Business Application to Formation within 4 Quarters (DUR4Q): A measure of delay between business application and formation,
measured as the average duration (in quarters) between the quarter in which the month of business application falls and the quarter of business formation, conditional on business formation within four quarters.
This series spans the same period as BF4Q and is rounded to two decimal places.
- Average Duration (in Quarters) from Business Application to Formation within 8 Quarters (DUR8Q): A measure of delay between business application and formation,
similar to the DUR4Q series. The difference is that the window for business formation is restricted to eight quarters, rather than four. This series spans the same period as BF8Q and is
rounded to two decimal places.
Modelling Projected Business Formation Series
The information submitted by applicants in the IRS Form SS-4 for an EIN application is used to model employer projected business formation for the U.S. economy as a whole and for individual states.
Let
Ngt be the number of new applications in a geographic region
g (e.g., a state or the entire U.S.) in quarter
t. The total number of business formations
that occur between quarters
t and
t + k from these applications is then given by
where Iigt+k is a realization of a Bernoulli random variable that governs whether application i turns into an employer business by the end of quarter t + k.
The probability distribution function for Iigt+k is given by
where Pigt+k is the probability that application igt turns into an employer business between quarters t and t + k. Then, the expected number of
business formations can be written as
To estimate E[Sgt+k], an estimate of Pigt+k is needed. Towards that goal, one can model Iigt+k as a function of application-level
variables, Zigt, provided as part of an EIN application in the IRS Form SS-4 and a set of unknown parameters, ßgt. Using a Linear
Probability Model (LPM), the probability of an application transitioning to an employer can then be estimated as
where F is a linear function, and is an estimate of the unknown parameters,
ßgt, based on the LPM. The predicted application-level probabilities, ,
can be used to construct an estimate of the expected number of business formations, E[Sgt+k], as
This approach amounts to reweighting each application by the predicted probability that the application becomes an employer business between quarters t and t + k. In the
analysis, k is set to either four or eight, corresponding to four and eight quarters, respectively. The four and eight quarter windows were chosen to allow a long enough time for an
application to become an employer business and cover a majority of transitions to employer business. These choices prevent a significant loss of information due to right censoring - some
applications transition beyond the four or eight quarter window. The estimated expected number of business formations are used to generate the series Projected Business Formations within
4 Quarters (PBF4Q) and Projected Business Formations within 8 Quarters (PBF8Q). For further details on the estimation methodology,
see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda, and Stevens (2018).
NAICS Improvement
The Census Bureau classifies BFS data by industry using the North American Industry Classification System (NAICS). NAICS codes are assigned using a variety of sources. A Census Bureau-developed automated
industry-coding program first attempts to assign NAICS codes to all new EIN applications received from the IRS. The automated industry-coding program is based on established patterns in the business name
and descriptions provided on the EIN applications. This auto-coding process assigned NAICS codes to over 80% of all incoming EIN applications in 2020. For applications that did not receive a NAICS code during
the auto-coding process, BFS staff use a Census-developed machine learning algorithm to assign NAICS codes where possible. NAICS codes are revised each year for the previous five years as part of the BFS annual
update process when more accurate and detailed NAICS codes may be available from the Social Security Administration, the Bureau of Labor Statistics, and the Census Bureau's Business Register. There are a small
number of EIN applications where there is not enough information available to assign a NAICS code through any source.
Please read this paper for more information about the Census Bureau automated industry-coding program. This
presentation
provides more information on the machine learning algorithm.
Comparability with Other Data
The
Business Dynamic Statistics (BDS) program of the Census Bureau also provides information on new employer businesses at annual frequency. However, there are some key differences in how
BDS and BFS account for new business formation. First, BDS use employment rather than payroll to identify new businesses. Employment in BDS is a point-in-time measure. BDS
capture employment as of the payroll week covered by March 12 of the year. BFS, by contrast, use the presence of payroll as a measure of business formation activity. In addition, BFS
are based on a quarterly measure of payroll within each year. The quarterly frequency leads to timing differences with respect to BDS in the identification of business startups that hired
their first employee after the payroll week of March 12. Second, because of left censoring in the business applications, BFS do not account for employer business formations that originate
from EIN applications dated before July 2004. This effect, however, dissipates toward the end of the sample period, as nearly all business formations eventually tend to arise from business
applications made since July 2004. For these reasons, BDS annual count of new employer businesses do not exactly match the corresponding count in BFS, but they track each other closely.
Reliability of the Data
Because BFS are constructed using a combination of administrative data, rather than a probability sample, sampling error does not apply to the BFS. Non-sampling error, however, still
exists. Non-sampling errors can occur for many reasons, such as the employer submitting corrected payroll or employment data after the end of the year as well as late filers. Other sources
of error include typographical errors made by businesses when providing information on the survey or administrative forms. Such errors, however, are likely to be distributed randomly
throughout the dataset.
There is also projection error in the projected number of business formations based on the econometric models. It is possible to provide measures of error and confidence bands
for the projected number of business formations, and such measures will be considered for future versions of BFS.
Changes in administrative data sometimes can also create complications in identifying business startups with payroll. The Longitudinal Business Database (LBD) addresses these issues in
detail in order to avoid overstating business openings (Chow et al. (2021)). BFS formation series are revised annually based on corrections to LBD due to updates coming from the new BR files.
The updated formations data are typically released by the publication of December monthly data. There are also some changes in the content of the IRS Form SS-4 over time, and new information in
the form is incorporated into the analysis as it becomes available.
BFS periodically evaluates the characteristics associated with high-propensity applications and their likelihood to turn to a business formation. The evaluation may result in updates to the
definition of high-propensity applications. In November 2021, the definition for high-propensity business applications was updated and applied to data from 2012-current. This update was made possible
in part by the NAICS improvement methodology discussed above.
new HBA: High-Propensity Business Applications - Business Applications (BA) that have a high-propensity of turning into businesses with payroll. The identification of high-propensity
applications is based on the characteristics of applications revealed on the IRS Form SS-4 that are associated with a high rate of business formation. High-propensity applications include applications: (a) from a
corporate entity, (b) that indicate they are hiring employees, (c) that provide a first wages-paid date (planned wages); or (d) that have a NAICS industry code in accommodation and food services (72) or in portions
of construction (237, 238), manufacturing (312, 321, 322, 332), retail (44, 452), professional, scientific, and technical services (5411, 5413), educational services (6111), and health care (621, 623).
The characteristics associated with high-propensity business applications were first determined during initial BFS program research. The original definition for high-propensity business applications is more
representative of the older data and is applied to the data from 2004-2011.
original HBA: High-Propensity Business Applications - Business Applications (BA) that have a high-propensity of turning into businesses with payroll. The identification of high-propensity
applications is based on the characteristics of applications revealed on the IRS Form SS-4 that are associated with a high rate of business formation. High-propensity applications include applications: (a) from a
corporate entity, (b) that indicate they are hiring employees, purchasing a business or changing organizational type, (c) that provide a first wages-paid date (planned wages); or (d) that have a NAICS industry code
in manufacturing (31-33), a portion of retail (44), health care (62), or accommodation and food services (72).
In order to keep the data between the definitions comparable, the original definition of high-propensity applications has an additional linking methodology applied. We did so by maintaining the growth rate of the
time series but raising or lowering the level of the original definition series, increasing or decreasing its weight in the aggregation hierarchy. This method is known as retrapolation, see the
Handbook on Backcasting for more information.
An adjustment factor was calculated using a simple ratio of the new HBA definition over the original HBA definition, for each month in an overlap year, 2012.
The mean adjustment factor was then calculated across all months of 2012.
The mean adjustment factor was then applied, via multiplication, to the data with the original HBA definition (2004-2011). For the years 2004-2011, the HBA data in our publications have retrapolation applied.
The adjustment factor is computed at the state-level for the NSA HBA data. The region and national NSA HBA totals are tabulated from this state-level data. For the industry NSA HBA data, separate adjustment
factors for each industry are computed. Seasonal adjustment methodology is applied to the updated spliced series.
Seasonal Adjustment
Seasonal adjustment is the process of estimating and removing seasonal effects from a time series to better reveal certain nonseasonal features. Examples of seasonal effects include a July drop in automobile
production as factories retool for new models and increases in heating oil production during September in anticipation of the winter heating season. When applicable, we also estimate and remove trading day effects
and moving holiday effects (e.g., Easter, Labor Day, etc.) during the seasonal adjustment process. Trading day effects are recurring effects related to the weekday composition of the month. Because of strong
seasonality detected in most of the business application and formation series, all series are provided with and without seasonal adjustment. Each month adjusted application and formation series are revised on for
the prior two months, as well as the current and previous month in the prior year. For example, with the release of September 2021 data, the following months would be revised: August 2021, July 2021, September 2020,
and August 2020. Factors for seasonal adjustments are recomputed and the seasonally adjusted applications and formations series are revised annually.
In the case of the duration series (DUR4Q and DUR8Q), seasonality is not significant in general. Therefore, no seasonally adjusted duration series are provided. All data, with the exception of the industry data
and the weekly data, are seasonally adjusted at the state level and summed to create seasonally adjusted United States total and regional data. Industry data, except for Utilities (22), is seasonally adjusted at the
national level to create adjusted United States totals.
During the seasonal adjustment process, industry-level estimates are raked to the United States total, in order to ensure consistency with total estimates at the national level. This raking process may introduce a rounding
error to data early in the time series, and in some cases, forces HBA to be greater than BA.
Seasonal adjustment is performed concurrently using the X-13ARIMA-SEATS seasonal adjustment program of the U.S. Census Bureau. Concurrent seasonal factors result from re-estimating the seasonal adjustment each month
or quarter when the new time series values become available. For more information on X-13ARIMA-SEATS, see the reference manuals
posted on the Census Bureau's website. An assumption underlying the seasonal adjustment process is that the original series can be separated into a seasonal component, a trend-cycle component, and an irregular component,
and possibly a trading day component and/or moving holiday component. The seasonally adjusted series consists of the trend-cycle and irregular components taken together. The trend-cycle component includes the long-term
trend and the business cycle. The irregular component is made up of residual variations, such as the sudden impact of political events and the effects of strikes, unusual weather conditions, reporting and sampling errors, etc.
Users can implement their own seasonal adjustment methods using the unadjusted data. User's results may differ from those published due to rounding. The Census Bureau rounds in the final step after running seasonal adjustment.
Disclosure Avoidance
Disclosure is the release of data that reveals information or permits deduction of information about a particular survey unit through the release of either tables or microdata.
Disclosure avoidance is the process used to protect each unit's identity and data from disclosure. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics
that put information at risk of disclosure. Although it may appear that a table shows information about a specific unit, the Census Bureau has taken steps to disguise or suppress a unit's data that
may be "at risk" of disclosure while making sure the results are still useful.
For the annual counts of state by county business applications, the Census Bureau implemented differentially private geometric noise into counts for all counties. For theory and development in differential privacy,
refer to Haney et al. 2017 and references therein. In addition to injecting noise, the Census Bureau rounded up
negative values to zero and made invariant the number of BAs within each state, where the privacy budget allows. In order to stay within privacy budget constraints, some years of county data will not sum to the
previously published totals. There is no sampling weight and the global sensitivity for the count data is one. We use a privacy budget of 0.5 for years 2005 through 2018, and .75 for years 2019 and beyond.
The Census Bureau has reviewed the monthly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY22-102)
The Census Bureau has reviewed the weekly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY22-103)
The Census Bureau has reviewed quarterly data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY20-115)
The Census Bureau has reviewed the weekly data by NAICS code product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY22-104)
The Census Bureau has reviewed the annual county level data product for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY22-105)
References
Jarmin, Ron and Javier Miranda (2002). The Longitudinal Business Database
July 16, 2002
Bayard, Kim, Dinlersoz, Emin, Dunne, Timothy, Haltiwanger, John, Miranda, Javier and John Stevens (2018).
Early-Stage Business Formation: An Analysis of Applications for Employer Identification Numbers
February 14, 2018
Dinlersoz, Emin, Dunne, Timothy, Haltiwanger, John and Veronika Penciakova (2021). Business Formation: A Tale of Two Recessions
January 2021
Chow, Melissa, Fort, Teresa C., Goetz, Christopher, Goldschlag, Nathan, Lawrence, James, Perlman, Elisabeth Ruth, Stinson, Martha, and T. Kirk White (2021).
Redesigning the Longitudinal Business Database
May 2021
Asturias, Jose, Dinlersoz, Emin, Haltiwanger, John, and Rebecca Hutchinson (2021). Business Applications as Economic Indicators
May 2021