Skip Main Navigation Skip To Navigation Content

Characteristics of New Housing

You are here:  Census.gov ›  Business & Industry ›  Construction ›  Characteristics of New Housing ›  How The Data Are Collected
Skip top of page navigation

How the Data are Collected

PURPOSE

The purpose of the Survey of Construction (SOC) is to provide national and regional statistics on starts and completions of new single-family and multifamily housing units and statistics on sales of new single-family houses in the United States. The United States Code, Title 13, authorizes this survey and provides for voluntary responses. The Department of Housing and Urban Development partially funds this survey. The SOC also provides statistics on characteristics of new privately-owned residential structures in the United States. Data included are various characteristics of new single-family houses completed, new multifamily housing completed, new single-family houses sold, and new contractor-built houses started.



SOURCE OF DATA AND SURVEY QUESTIONNAIRES

The Survey of Construction includes two parts: the Survey of Use of Permits (SUP), which estimates the amount of new construction in areas that require a building permit, and the Non-Permit Survey (NP), which estimates the amount of new construction in areas that do not require a building permit. Less than 2 percent of all new construction takes place in non-permit areas. Data from both parts of SOC are collected by Census field representatives. For SUP, they visit a sample of permit offices and select a sample of permits issued for new housing. These permits are then followed through to see when they are started and completed, and when they are sold for single-family units that are built to be sold. Each project is also surveyed to collect information on characteristics of the structure. For NP, roads in sampled non-permit land areas are driven at least once every 3 months to see if there is any new construction. Once new residential construction is found, it is followed up the same as in SUP.

The Census field representatives use interviewing software on laptop computers to collect the data. Facsimiles of the computer-based questionnaires are provided to respondents to familiarize them with the survey. These facsimiles show the questions that are asked for housing units in single-family buildings on Form SOC-QI/SF.1 and in multifamily buildings on Form SOC-QI/MF.1. In addition the Census field representatives provide an introductory letter explaining the survey. Field representatives also use Form SOC-QBPO.1 to collect information regarding procedures for handling building permits in building permit offices (BPOs) sampled for the SUP.



GEOGRAPHIC COVERAGE

Statistics from the SOC are tabulated only for the United States and four Census Regions. The SOC does not have a large enough sample size to make state or local area estimates.

Public-use microdata files are available for download. The data begin in 1999. The files allow for tabulation of selected estimates by the nine Census Divisions. The annual microdata files contain information on all sampled single-family houses started, completed, and/or sold during the year. Houses authorized by building permits but not started at the end of the year, under construction at the end of the year, or for sale at the end of the year are also included. The files contain data on more than 60 physical and financial characteristics. Please go to the microdata documentation for more information.

The only series on new residential construction that is available at a smaller geographic area is the housing units authorized by building permits. Building permits data are collected from individual permit offices, most of which are municipalities; the remainder are counties, townships, or New England and Middle Atlantic-type towns. Because building permits are public records, local area data can be published without any confidentiality concerns. From local area data, estimates are tabulated for counties, states, Metropolitan Areas, Census Divisions, Census Regions, and the United States. Please go to the Building Permits Survey for more information.

For more geographic information, please refer to the definitions of Census Regions and Divisions and metropolitan areas.



SAMPLE DESIGN

The design of the monthly sample that has been used since January 2005 is as follows:

The Survey of Construction sample design consists of three stages: (1) a subsample of the 2004 Current Population Survey (CPS) primary sampling units (PSUs), which are are land areas (groups of counties, towns or townships within a state) that represent the entire United States; (2) selection of permit/non-permit areas; and (3) selection of permits.

In the first stage, the 820 CPS primary sampling units were classified as self-representing or non-self-representing. If a PSU had a large population age 16 and over or high permit activity, it was classified as self-representing; otherwise it was classified as non-self-representing. There were 48 self-representing PSUs. The 772 non-self-representing PSUs were grouped into 121 strata by Census Division, permit activity, metropolitan status, and population (non-institutional population age 16 and over based on the 2000 Census). One non-self-representing PSU was selected per stratum using a procedure that maximized the overlap between the old and new sample of PSUs.

Within the 169 selected PSUs, the second stage of sampling was performed separately for permit-issuing places and areas that do not require permits for new residential construction, referred to as non-permit. The permit-issuing places were stratified by permit activity. Approximately 900 permit-issuing places were selected. The selection of the non-permit areas was based on the 2000 Census geography tract and block definitions. Within each state, the land area was divided into blocks, which are components of tracts. Blocks were classified as permit and non-permit. The non-permit blocks were combined at the tract level, and these areas were stratified by Census 2000 housing unit population. Higher population areas had greater probabilities of selection. Approximately 80 land areas (groups of blocks) within the selected PSUs were selected.

The third stage of selection is performed monthly in the approximately 900 permit-issuing places and 80 non-permit areas. In permit-issuing places, field representatives list all permits for new residential construction and select a sample of those permits. Permits for buildings with 1 to 4 units are sampled at an overall rate of 1 in 50 units. All permits for buildings with five units or more are included in the sample. In non-permit areas, field representatives canvass the areas looking for housing units started, all of which are included in the sample.

Possible Impact on New Home Prices and Characteristics of the 2005 Survey of Construction Sample Change

Effective with the January 2005 data release, the Survey of Construction implemented a new sample of building permit offices from which sampled houses are selected. The sample of permit offices was redesigned to reflect the location of building permit activity in the previous few years, replacing the sample selected in 1985. The selection of the approximately 900 permit offices in the new sample was designed to optimize the precision of the housing starts estimates. No attempt was made to select offices representing geographic areas with similar housing prices and characteristics as the old sample. As a result, data users should use caution when analyzing year over year changes in housing prices and characteristics between 2004 and 2005. It may be possible, for example, that many jurisdictions in the 1985 sample are more built up now, remaining land may be more expensive, remaining lots may be smaller, etc. These jurisdictions may have been replaced in the new sample with more outlying areas that are now actively issuing building permits. In these locations, land may be more abundant, lot sizes might be larger, and sales prices possibly lower. It is important to note that estimates from the old and new samples are both statistically correct. The actual values fall within intervals around the estimate, which can be calculated from the sample data with known probabilities. For example, if a published estimate has a relative standard error (RSE) of 5%, there is a 90% chance that the true value is within 8% (statistically defined as 1.6 times the RSE) of the estimate.



COMPILATION OF DATA

Methodology Used for Compilation of Estimates of Housing Units Started, Authorized but Not Started, Under Construction, and Completed

The compilation of the housing starts series is a multistage process. First, a monthly estimate of the number of housing units for which building permits have been issued in all permit-issuing places is obtained from the Census Bureau's Building Permits Survey.

Second, for each permit selected from the 900 permit-issuing places, an inquiry is made of the owner or the builder to determine in which month and year the unit(s) covered by the permit was (were) started. In case the units authorized by permits in a particular month are not started by the end of that month, follow-ups are made in successive months to find out when the units were actually started.

Ratios are calculated (by type of structure) of the number of units authorized by permits, based on the Building Permits Survey, to the number of units authorized by permits based on estimates generated from the 900 SOC permit offices; separate ratios are calculated for the current month of permit issuance and the prior 11 months, producing 12 ratios. Continuing back, months 13 through 18 are summed to produce a single ratio, and months 19 through 60 are summed to produce another ratio . This yields a set of 14 ratios that are based on 5 years-worth of data. These ratios are then applied to the appropriate estimate of the number of units started, based on the 900 SOC permit offices, in the corresponding months or groups of months to provide ratio adjusted estimates of the number of units started for each month or group of months.

The procedure described above is computed by size of structure. A total of eight different sets of authorization ratios that change from month-to-month are utilized to calculate the number of housing units started by type of structure in permit places. The rates are calculated for single-family structures for each of the four Census Regions and for structures with two units or more for each of the four Regions.

Adjustments are made to account for those units started prior to permit authorization and for late reports. These adjustments are based on historical patterns of pre-permit starts and late data. No adjustment is made for units in permit areas built without a permit.

The estimates for housing units started in permit-issuing places result from the procedures outlined above.

Third, units identified as started in the monthly canvass of non-permit areas are weighted appropriately to provide an estimate of total housing starts in areas not covered by building permit systems.

Adding this estimate of starts in non-permit areas to the estimate of starts in permit-issuing places results in an estimate of total private housing units started.

This same methodology is used for the estimates of housing units authorized but not started, under construction, and completed.

The same basic methodology is used for estimates of houses sold and for sale. For those estimates, four sets of authorization ratios are used: one for each of the four Census Regions. Adjustments are made to account for those houses sold prior to permit authorization and for late reports. These adjustments are applied by stage of construction based on historical patterns of pre-permit sales and late reports. No adjustment is made for units in permit areas built without a permit.

Comparison with Previous Compilation Methodology

The current methodology for compilation of estimates was implemented in April 2001, with revisions back to January 1999. A comparison to the previous compilation methodology is summarized in the explanation of the April 2001 revisions to New Residential Construction. Also, please go to the SOC compliation methodology through 1998.

Adjustments for Non-Reporting of Characteristics

Information on selected characteristics is not reported for every case in the sample. In estimates of characteristics of new housing, cases for which a characteristic is not reported have been distributed proportionally to those for which the characteristic was reported.



RELEASE AND REVISION SCHEDULE

Preliminary estimates from SOC, for the United States and Census Regions, are available each month in the New Residential Construction (NRC) press release according to the NRC release schedule. Estimates from SOC are for the number of housing units authorized but not started, started, under construction, and completed. Estimates from the Building Permits Survey of the number of housing units authorized are also included in the NRC press release.

For estimates of housing units authorized but not started, started, under construction, and completed, two months of data are revised along with the release of each month's preliminary estimates. An analysis of the NRC revisions for estimates of starts and completions is updated with the release of each year's preliminary January and July data. With the release of April data on New Residential Construction, seasonally adjusted annual rates for the previous 27 months are also revised to reflect updated seasonal factors. However, not seasonally adjusted monthly estimates of housing units authorized but not started, started, under construction, and completed are not revised again after the second revision of each estimate.

Annual estimates of New Residential Construction are finalized with the release of data for February of the following year. At that time, annual data are also released on the length of time from authorization of construction to start and from start of construction to completion.

Quarterly estimates of housing units started and completed by purpose and design are released along with the release of New Residential Construction data for the month after the end of each quarter. For example, data for the first quarter are released with the April estimates of New Residential Construction. The previous quarter is revised with the release of preliminary data for the most recent quarter.

Preliminary estimates of single-family homes sold and for sale are available each month in the New Residential Sales (NRS) press release according to the NRS Release Schedule. Three months of data are revised along with the release of each month's preliminary estimates. An analysis of the NRS revisions is updated with the release of each year's preliminary January and July data. With the release of April data on New Residential Sales, seasonally adjusted annual rates for the previous 27 months are also revised to reflect updated seasonal factors. However, not seasonally adjusted monthly estimates of houses sold and for sale are not revised again after the third revision of each estimate.

Quarterly estimates of sales by price and financing are released along with the release of New Residential Sales data for the last month of each quarter. For example, data for the first quarter are released with the March estimates of New Residential Sales. The previous quarter is revised with the release of preliminary data for the most recent quarter.

Monthly price indexes for houses under construction are released each month with the release of New Residential Sales data. Two months of data are revised along with the release of each month's preliminary estimates. Quarterly price indexes for houses sold are released along with the release of New Residential Sales data for the last month of each quarter. For example, data for the first quarter are released with the March estimates of New Residential Sales. The previous quarter is revised with the release of preliminary data for the most recent quarter. Please go to the Construction Price Index website for more information.

Detailed annual data on characteristics of new housing for the previous year are released each year on the first working day in June.



RELIABILITY OF DATA

These estimates are based on sample surveys and may differ from statistics which would have been obtained from a complete census using the same schedules and procedures. An estimate based on a sample survey is subject to both sampling error and nonsampling error. The accuracy of a survey result is determined by the joint effects of these errors.

Sampling Error

Sampling error reflects the fact that only a particular sample was surveyed rather than the entire population. Each sample selected for the SOC is one of a large number of similar probability samples that, by chance, might have been selected under the same specifications. Estimates derived from the different samples would differ from each other. The standard error (SE), or sampling error, of a survey estimate is a measure of the variation among the estimates from all possible samples and thus, is a measure of the precision with which an estimate from a particular sample approximates the average from all possible samples.

Estimates of the standard errors have been computed from the sample data for selected statistics. They are presented in the tables in the form of average relative standard errors (RSEs). The relative standard error equals the standard error divided by the estimated value to which it refers.

The sample estimate and an estimate of its standard error allow us to construct interval estimates with prescribed confidence that the interval includes the average result of all possible samples with the same size and design. To illustrate, if all possible samples were surveyed under essentially the same conditions, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval. For example, suppose that an estimated 100,000 housing units were started in a particular month and that the average relative standard error of this estimate is 4 percent. Multiplying 100,000 by .04, we obtain 4,000 as the standard error. This means that we are confident, with 68% chance of being correct, that the average estimate from all possible samples of housing units started during the particular month is between 96,000 and 104,000 homes. To increase the probability to a 90% chance that the interval contains the average value over all possible samples (this is called a 90-percent confidence interval), multiply 4,000 by 1.645, yielding limits of 93,420 and 106,580 (100,000 units plus or minus 6,580 units). The average estimate of housing units started during the specified month may or may not be contained in any one of these computed intervals; but for a particular sample, one can say that the average estimate from all possible samples is included in the constructed interval with a specified confidence of 90 percent. It is important to note that the standard error and the relative standard error only measure sampling error. They do not measure any systematic nonsampling errors in the estimates.

Nonsampling Error

Nonsampling error encompasses all factors, other than sampling error, that contribute to the total error of a sample survey estimate and may also occur in censuses. It is often helpful to think of nonsampling error as arising from deficiencies or mistakes in the survey process. Nonsampling errors are usually attributed to many possible sources: (1) coverage error - failure to accurately represent all population units in the sample, (2) inability to obtain information about all sample cases (nonresponse), (3) response errors, possibly caused by definitional difficulties or misreporting, (4) mistakes in recording or coding the data obtained, and (5) other errors of coverage, collection, processing, or imputation for missing items or inconsistent data. Although nonsampling error is not measured directly, the Census Bureau employs quality control procedures throughout the process to minimize this type of error.

A potential source of bias prior to 1999 was the upward adjustment of 3.3 percent made to account for single-family structures started in permit-issuing areas without permit authorization, as described in the SOC compilation methodology through 1998.



NONRESPONSE

Data for the SOC are collected by Census field representatives (FRs). When FRs cannot obtain information for the survey's key data items (start date, completion date, sales category, and sale date) from the builder or owner of a new residential building, they are instructed to obtain the information by observation (via a site visit) or from another source, such as the building permit office or a realtor handling the sale of the unit. While the survey is designed to collect information for every sampled unit, under-coverage can occur if a building in a permit-issuing area is started, completed, or sold before the permit is issued. In addition, late reports for key data items can occur if the respondent provides incomplete or inaccurate information, the information obtained by observation is incorrect or unavailable, or the FR does not follow procedures properly. However, each month, key dates are collected for approximately 94% of buildings requiring an interview.

When a building is started, the respondent is asked to estimate the expected completion date, and data on the completion date are not requested until the month when completion was expected. This reduces survey costs and respondent burden; however, it can cause completion dates to be reported late if construction was completed ahead of schedule. The final estimates of housing units started are adjusted about 4 percent for pre-permit starts and late reports. The final estimates of housing units completed are adjusted about 5 percent for completions prior to the expected completion date and other late reports.

For this survey, the sale date is defined as the date when a when a deposit was made toward the purchase of the house or a sales agreement was signed. Because this often occurs before construction begins, preliminary estimates of houses sold include large adjustments for sales before the permit is issued, as well as adjustments for late reports. The final estimates of houses sold are adjusted about 12 percent for pre-permit sales and late reports.

Information on selected characteristics that are not among those considered to be the survey's key data items is not reported for every case in the sample. In estimates of characteristics of new housing, cases for which a characteristic is not reported have been distributed proportionally to those for which the characteristic was reported.

The weighted item response rate for a characteristic is defined as the weighted number of cases for which the characteristic was provided divided by the weighted number of cases for which the characteristic was attempted to be collected. The weighted item response rates for most characteristics of new single-family units completed or sold in 2013 were at least 95 percent, and those for new multifamily units completed were at least 82 percent. The weighted item response rate for single-family sales price was 90 percent, and the weighted item response rates for single-family contract price, type of financing, and lot size were between 77 and 82 percent.



SEASONAL ADJUSTMENT

Seasonal adjustment is the process of estimating and removing seasonal effects from a time series to better reveal certain nonseasonal features such as underlying trends and business cycles. Seasonal adjustment procedures estimate effects that occur in the same calendar month with similar magnitude and direction from year to year. In series whose seasonal effects come primarily from weather, the seasonal factors are estimates of average weather effects for each month. Seasonal adjustment does not account for abnormal weather conditions or for year-to-year changes in weather. Seasonal factors are estimates based on present and past experience. Future data may show a different pattern.

The mechanics of seasonal adjustment involve breaking down a time series into a trend-cycle, a seasonal component, and an irregular component.

The trend-cycle is the long-term tendency of a series to grow or decline.

The seasonal component consists of seasonal effects that are reasonably stable in terms of timing, direction, and magnitude. Possible causes include natural factors (the weather), administrative measures, and social/cultural/religious traditions.

Monthly time series that are totals of daily activities can be influenced by each calendar month's weekday composition. This influence is revealed when monthly values consistently depend on which days of the week occur five times in the month. For example, building permit offices are usually closed on Saturday and Sunday. Thus, the number of building permits issued in a given month is likely to be higher if the month contains a surplus of weekdays and lower if the month contains a surplus of weekend days. Recurring effects associated with individual days of the week are called trading-day effects.

Trading-day effects can make it difficult to compare time series values or to compare movements in one series with movements in another. For this reason, when estimates of trading-day effects are statistically significant, they are adjusted out of the series. The removal of such estimates is referred to as trading-day adjustment.

Series may have moving-holiday effects. Economic effects from holidays such as Easter, Labor Day, and Thanksgiving may affect more than one month, so their timing is not strictly seasonal, but they are predictable calendar events. When these moving-holiday effects are statistically significant, they are adjusted out of the series. The removal of such estimates is referred to as moving-holiday adjustment.

The irregular component is anything not included in the trend-cycle or the seasonal effects (which include trading-day effects and moving-holiday effects). Its values are unpredictable with respect to timing, impact, and duration. It can arise from sampling error, nonsampling error, unseasonable weather, natural disasters, strikes, etc.

Most of the seasonally adjusted series are shown as seasonally adjusted annual rates (SAAR). The seasonally adjusted annual rate is the seasonally adjusted monthly value multiplied by 12. The benefit of the annual rate is that not only can one monthly estimate be compared with another, monthly data can also be compared to an annual total. The seasonally adjusted annual rate is neither a forecast nor a projection; rather it is a description of the rate of housing starts, completions, or sales, etc., in the particular month for which they are calculated.

The seasonal adjustment factors for these data are indexes, that is, they are the factor times 100. They were developed using X-13ARIMA-SEATS software. The X-13ARIMA-SEATS software improves upon the X-12-ARIMA seasonal adjustment software by providing enhanced diagnostics as well as incorporating an enhanced version of the Bank of Spain's SEATS (Signal Extraction in ARIMA Time Series) software, which uses an ARIMA model-based procedure instead of the X-11 filter-based approach to estimate seasonal factors. The X-13ARIMA-SEATS and X-12-ARIMA software produce identical results when using the X-11 filter-based adjustment methodology. The X-13ARIMA-SEATS software will be available from the Census Bureau's Internet site in the coming months. Note that SOC estimates continue to be adjusted using the X-11 filter-based adjustment procedure. For more information on X-12-ARIMA please refer to the Census Bureau's X-12 website.

Seasonally adjusted annual rates are developed each month for housing units started, under construction, and completed by Census Region and type of structure. Each month (separately for units started, under construction, and completed), five series are run through the X-13ARIMA-SEATS program. These series are the four regional series for single-family structures and the U.S. total for structures with two units or more. The seasonally adjusted U.S. total is the sum of the five seasonally adjusted components. The seasonally adjusted U.S. total for structures with five units or more is the product of the seasonally adjusted U.S. total for structures with two units or more and the ratio of the unadjusted U.S. total for structures with five units or more to the unadjusted U.S. total for structures with two units or more. Each seasonally adjusted regional total is the sum of the seasonally adjusted single-family regional estimate and the seasonally adjusted regional estimate for structures with two units or more (adjusted using the seasonal factor for the U.S. total for structures with two units or more). The seasonally adjusted regional totals add up to the seasonally adjusted U.S. total without any modifications.

Also see our summary of average percent changes and related measures for New Residential Construction.

For new single-family houses sold, the seasonally adjusted annual rates are developed each month by Census Region. Each month, the four regional series are run through the X-13ARIMA-SEATS program. The seasonally adjusted U.S. total is the sum of the four seasonally adjusted components. The seasonally adjusted number of new single-family houses for sale is developed only at the national level by running the national series through X-13ARIMA-SEATS. The methodology for seasonally adjusted month's supply of new houses for sale was changed beginning with data for January 2007. The seasonal index is now implicitly derived as the ratio of the seasonal index for new houses for sale to the implicit seasonal index for U.S. houses sold. Prior to January 2007, the seasonal index was explicitly derived at the national level.

Also see our summary of average percent changes and related measures for New Residential Sales.

For further information on time series and seasonal adjustment, please refer to the Seasonal Adjustment Frequently Asked Questions.


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe. [Excel] or the letters [xls] indicate a document is in the Microsoft® Excel® Spreadsheet Format (XLS). To view the file, you will need the Microsoft® Excel® Viewer Off Site available for free from Microsoft®.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Characteristics of New Housing | (301) 763-5160 |  Last Revised: June 28, 2012