The Business Formation Statistics (BFS) are a product of the U.S. Census Bureau developed in research collaboration with economists affiliated with Board of Governors of the Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame. The current BFS is released as a research product in “beta” form. A final version is in the research and development phase.
The Business Formation Statistics (BFS) provide timely and high frequency data on business applications and employer business formations. The BFS measure business initiation activity (Business Application Series) as indicated by applications for an Employer Identification Number (EIN) on the IRS Form SS-4. The BFS also provide information on actual and projected employer business formations (Business Formation Series) that originate from these applications, based on the record of first payroll tax liability for an EIN. In addition, the BFS contain measures of delay in business starts as indicated by the average duration between the application for an EIN and the transition to an employer business.
The BFS currently cover the period starting from the third quarter of 2004 (2004q3) onwards at a quarterly frequency. The data are available nationally and by individual states.
Understanding how business formation is related to current national and local economic conditions is a challenging task that requires accurate, timely and comprehensive high frequency data. The BFS respond to this challenge by providing comprehensive data on business application and formation in a timely manner. The BFS can help economists, policymakers, regional planners and businesses assess the current state of early entrepreneurship at the national and state levels. The BFS uncover the trends in business applications and formations at previously unavailable levels of frequency, coverage, and timeliness. The data can be used to study a variety of issues in entrepreneurship, including, but not limited to, the high-frequency dynamics of entrepreneurial activity, the effects of business cycles on entrepreneurship, the effects of regional economic development policies on new business formation, the impact of state tax policies and regulations on business initiation, and the formation of new industrial clusters and agglomerations. A key benefit from these data are their timeliness and high frequency, which allow policymakers, analysts and researchers to better monitor the state of entrepreneurial activity in the United States.
The data for the BFS come from three main sources. The data on business applications are based on applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Employer business formations originating from these business applications are identified using the Census Bureau’s Business Register (BR) and the Longitudinal Business Database (LBD), which together provide information on the timing of first payroll tax filing for a business based on tax records. The BR is the Census Bureau’s main sampling frame for the universe of U.S. businesses and contains quarterly payroll and employment information for employer businesses. The LBD is constructed by linking annual snapshot files from the BR to provide a longitudinal history for each business establishment (see Jarmin and Miranda (2002)). Through these linkages, the LBD is able to provide information on the first-ever appearance of a business in the BR as a business with payroll or employment.
The BFS contain EIN applications made in the United States, including those associated with starting a new employer business. EINs are IDs used by business entities for tax purposes. All employer businesses in the United States must have an EIN to ﬁle payroll taxes (see the IRS guidance on who needs an EIN; sole proprietors with no employees can use Social Security Numbers (SSN) instead of an EIN for tax filings). Applications for new EINs are ﬁled through IRS Form SS-4. Applicants submit information such as the name and address of the applicant and the intended business, the reason for application, the type of business entity, information on the principal activity of the business, plans to hire employees and date of initial wage payments, and the business start date. The Census Bureau uses the EIN applications to support its Business Register (BR). The BR is the enumeration list for the Economic Census and is the sampling frame for other business surveys conducted by the Census Bureau. It serves as the central storage for administrative business data at the Census Bureau and is the source of statistical products including the County Business Patterns (CBP) and the Business Dynamics Statistics (BDS). EIN applications are used to keep the BR and the associated sampling frames current.
The BFS currently cover the entire set of EIN applications transmitted to the Census Bureau starting with the period 2004q3. The data are presented at quarterly frequency.
A number of restrictions are placed on the set of applications that are used by the BFS to generate business applications series and to model business formation. Four broad types of applications are omitted from the analysis based on type of entity, industry, geography, and the observed concentration of applications from a speciﬁc source. With regard to type of entity, three groups are removed from the data: applications associated with tax-liens, trusts, and estates. These applications are generally not associated with business formation and their presence in the data varies over time. Applications from a set of detailed industries within the agricultural, ﬁnancial services and private household sectors are also excluded. Applications from these speciﬁc industries have very low transition rates to employer businesses. Applications by public entities (e.g., state or local governments) are also not included. The analysis also omits applications with missing state information (a small fraction of applications) and applications made from outside the 50 states or the District of Columbia, such as Puerto Rico or the Virgin Islands. Finally, applications are also excluded if they are part of concentrated ﬁling spikes. A concentrated ﬁling spike is deﬁned as a group of EIN applications that appear in the same weekly application cycle, come from the same zip code, and share the same industry code. These ﬁlings are mainly related to some type of ﬁnancial ﬁling and do not represent an intent to form a business.
The resulting business applications are matched to the set of businesses in the BR that are identified as new employer businesses in the Longitudinal Business Database. The match to the BR reveals which applications become employer businesses and the quarter in which they begin to pay employees. Currently, the applications are matched to the employer business universe only, though many applications for new businesses may end up as non-employer businesses. In addition, the BFS emphasize new business formations, so EIN applications from existing business entities due to changes in legal form, reorganizations or expansions are not included in the match.
It is important to note that, while comprehensive, EIN applications may leave out some business initiations in the form of sole proprietorships with no employees. These business initiations represent many types of entrepreneurship in the form of independent contractors who rely on the entrepreneur's Social Security Number (SSN) for tax purposes instead of an EIN.
For a detailed description of various business application and formation series, see Business Application Series and Business Formation Series.
The information submitted by applicants in the IRS Form SS-4 for an EIN application is used to model employer business formation for the U.S. economy as a whole and for individual states. Let Ngt be the number of new applications in a geographic region g (e.g., a state or the entire U.S.) in quarter t. The total number of business formations that occur between quarters t and t + k from these applications is then given by
where Iigt+k is a realization of a Bernoulli random variable that governs whether application i turns into an employer business by the end of quarter t + k. The probability distribution function for Iigt+k is given by
where Pigt+k is the probability that application igt turns into an employer business between quarters t and t + k. Then, the expected number of business formations can be written as
To estimate E[Sgt+k], an estimate of Pigt+k is needed. Towards that goal, one can model Iigt+k as a function of application-level variables, Zigt, provided as part of an EIN application in the IRS Form SS-4 and a set of unknown parameters, βgt. Using a Linear Probability Model (LPM) model, the probability of an application transitioning to an employer can then be estimated as
where F is a linear function, and is an estimate of the unknown parameters, βgt, based on the LPM. The predicted application-level probabilities, , can be used to construct an estimate of the expected number of business formations, E[Sgt+k], as
This approach amounts to reweighting each application by the predicted probability that the application becomes an employer business between quarters t and t + k. In the analysis, k is set to either four or eight, corresponding to four and eight quarters, respectively. The four and eight quarter windows were chosen to allow a long enough time for an application to become an employer business and cover a majority of transitions to employer business. These choices prevent a signiﬁcant loss of information due to right censoring – some applications transition beyond the four or eight quarter window. The estimated expected number of business formations are used to generate the series Projected Business Formations within 4 Quarters (PBF4Q) and Projected Business Formations within 8 Quarters (PBF8Q), as described in the Business Formation Series section below. For further details on the estimation methodology, see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda, and Stevens (2018).
The Business Dynamics Statistics (BDS) program of the Census Bureau also provides information on new employer businesses at annual frequency. However, there are some key differences in how the BDS and BFS account for new business formation. First, the BDS use employment rather than payroll to identify new businesses. Employment in the BDS is a point-in-time measure. The BDS capture employment as of the payroll week covered by March 12 of the year. The BFS, by contrast, use the presence of payroll as a measure of business formation activity. In addition, the BFS are based on a quarterly measure of payroll within each year. The quarterly frequency leads to timing differences with respect to the BDS in the identification of business startups that hired their first employee after the payroll week of March 12. Second, because of left censoring in the business applications, the BFS do not account for employer business formations that originate from EIN applications dated before 2004q3. This effect, however, dissipates toward the end of the sample period, as nearly all business formations eventually tend to arise from business applications made since 2004q3. For these reasons, the BDS annual count of new employer businesses do not exactly match the corresponding count in the BFS, but they track each other closely.
Because the BFS are constructed using a combination of administrative data, rather than a probability sample, sampling error does not apply to the BFS. Non-sampling error, however, still exists. Non-sampling errors can occur for many reasons, such as the employer submitting corrected payroll or employment data after the end of the year as well as late filers. Other sources of error include typographical errors made by businesses when providing information on the survey or administrative forms. Such errors, however, are likely to be distributed randomly throughout the dataset.
There is also projection error in the projected number of business formations based on the econometric models. The models perform well in terms of prediction error within the estimation sample and in out-of-sample projection exercises (see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda and Stevens (2017)). It is possible to provide measures of error and confidence bands for the projected number of business formations, and such measures will be considered for future versions of the BFS.
Changes in administrative data sometimes can also create complications in identifying business startups with payroll. The Longitudinal Business Database (LBD) addresses these issues in detail in order to avoid overstating business openings (Jarmin and Miranda (2002)). The BFS are subject to periodic changes based on corrections to the LBD due to updates coming from the new BR files. Such changes will reflect themselves on actual and projected business formation series on an annual basis once the BFS are revised based on the updated LBD-based firm birth information. There are also some changes in the content of the IRS Form SS-4 over time, and new information in the form is incorporated in to the analysis as it becomes available.
These series describe the business applications for tax IDs as indicated by applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Business applications are presented in four different series reflecting different subsets of the applications for an EIN. All business applications series cover the period from 2004q3 onwards.
The following is a Venn diagram of the relationship between the four business applications series (BA, HBA, WBA, CBA) and EIN applications.
These series describe employer business formations as indicated by the first instance of payroll tax liabilities for the corresponding business applications. The business formation series are forward-looking in the sense that they measure new business formations from the time of business application in any given quarter. Two series are provided: the first describes transitions within the next four quarters, and the second within the next eight quarters. All business formation series start in 2004q3, the earliest quarter for which the data on business applications is available.
The following is a graphical representation of the relationship between business application and formation series.
Because of strong seasonality detected in most of the business application and formation series, all series are provided with and without seasonal adjustment. In the case of the duration series (DUR4Q and DUR8Q), seasonality is not significant in general. Therefore, no seasonally adjusted duration series are provided. Seasonal adjustment is performed using the X-13ARIMA-SEATS seasonal adjustment program of the US Census Bureau. Users can implement their own seasonal adjustment methods using the unadjusted data.
July 16, 2002
Bayard, Kim, Dinlersoz, Emin, Dunne, Timothy, Haltiwanger, John, Miranda, Javier and John Stevens (2018). Early-Stage Business Formation: An Analysis of Applications for Employer Identification Numbers
February 14, 2018