Description of Price Index for Sales Price of New One-Family Houses Sold
Price Index Computation
The basic form of a Laspeyres type price index is:
S(pti x q0i)
i
--------------
S(p0i x q0i)
i
Where the p0i's and pti's are the prices in the base and current period, respectively, and the q0i's are the quantities in the base period. This represents the ratio of the current cost of the quantity of goods purchased in the base year to the cost in base year prices of the same quantity of goods.
It is necessary to obtain the prices and quantities for various commodities to compute this index. In computing a new one-family house price index, we estimate from survey data the q's for commodities which we refer to as house characteristics. Our survey does not collect prices of the characteristics so we estimate prices for them from a regression model. For this reason the sums in the above equation can be thought of as regressed values taken from a regression model.
Experience has show that regression estimation of the price in the following multiplicative model is superior to estimation for the above additive model:
S(dti x q0i)
e i
-----------------------------------
S(d0i x q0i)
e i
Where the d0i's and dti's are coefficients estimated from a semi-log regression model and the qoi's are the quantities in the base period.
Forming strata also improves the regression estimates. We use five strata: four defined as the detached units in the four Census regions and the fifth as all attached units. The Laspeyres price index at the national index is then:
Swh x S(dtih
x q0ih)
e h
i
-----------------------------------
Swh x S(d0ih
x q0ih)
e h
i
Where the first sum in each term is over the five strata and wh
is the proportion of the units sold
(activity) in the stratum and the other terms are computed by stratum.
The regression models used to estimate the price index are described in more detail below.
Indexes for the four Census regions are shown annually. These are computed as follows:
Iri = SQRTwi[(Idi)wdi x (Ia)wai]Where Iri is the annual index for region i, Idi is the annual index for the appropriate stratum of detached units, Ia is the annual index for the attached units, wdi is the activity in the detached stratum, wai is the number of attached units from the given region that were include in the attached stratum, and wi is the sum of wdi and wai.
There are five separate regression models used to calculate the price indexes: one model for detached units in each census region and one model for all attached units. Each of these models is designed to measure the contributions of important physical and geographic characteristics to the prices of new houses sold.
The characteristics used in each model are described later in this appendix. All characteristics except for floor area are divided into categories as shown in tables A-1 (detached houses) and A-2 (attached houses). For example, each house is classified by whether it has less than three bedrooms, three bedrooms, or more than three bedrooms; whether it has no garage or a carport, a one or two car garage, or a garage for three or more cars; etc. Each category is treated qualitatively in that a value of "1" indicates that the house has that characteristic and "0" indicates that the house does not have it. One category from each of the qualitative characteristics must be omitted to avoid an over determined system. The price and floor area are treated quantitatively: the logarithm of the actual values are used directly in the model.
Since the regression does not include all of the characteristics which explain price variability and because the characteristics are interdependent, the estimated regression coefficients should not be regarded as estimates of the true proportionality factors.
The base year for the index is 1996. The base weights (quantities) for the individual price models are given in tables A-1 and A-2. The weights (levels of activity in 1996) used to combine these five indexes to form the United States index and the weights used to form the regional indexes are given in the following two tables.
Weights Used in Calculating the United States Index
[in percent]
| Detached houses | Attached
houses | |||
| Northeast | Midwest | South | West | |
| 6.2 | 15.9 | 40.3 | 27.1 | 10.5 |
Weights Used in Calculating the Regional Indexes
[in percent]
| Northeast | Midwest | South | West | ||||
| Detached | Attached | Detached | Attached | Detached | Attached | Detached | Attached |
| 68.1 | 31.9 | 82.6 | 17.4 | 89.8 | 10.2 | 94.0 | 6.0 |
Limitations of the Data
The reported data for each house in the sample are edited before being used in the index computation. If the sales price or any characteristic is not reported, that sample case is rejected. A resistant regression procedure is used which incorporates Tukey's biweight. Resistant regression significantly reduces the influence on the model of houses with unusual characteristics, price, or location by reducing the sample weight of each such case. In this way a case with an extreme value resulting from incorrect reporting or processing has a reduced impact upon the index.
The prices we estimate for our index computations may be influenced by characteristics of workmanship, materials, and mechanical equipment which are not measured. Hence, it should be kept in mind that the price indexes in this report only account for such quality characteristics insofar as they may be correlated with the characteristics actually used. These characteristics account for from 60 to 80 percent of the variation in the logarithm of the sales prices.
Since the price index applies to the total sales price, it covers not only cost of labor, materials, but also land cost, direct and indirect selling expenses, and the seller's profits. The index is thus conceptually broader in coverage than a cost index. Reflecting the sales price, the price index is affected by all factors which influence movement of house prices: both supply factors such as wage rates, material costs and productivity, and demand factors such as demographic changes, income, and availability of mortgage money.
Our price index is computed from actual transaction prices of houses sold which includes the value of the developed lot. Not included are any amenities the builder provides to the buyer that are not included in the initial sales price. For example, the price of a two car garage may not be included in the initial sales price. Excluded from the index are houses built for the exclusive use of the land owner who either hires a general contractor to build the house or acts as his own general contractor and houses built to be rented.
A house is defined as sold when a sales contract is signed or a deposit is accepted regardless of the stage of construction. The month of sale refers to the contract or deposit date.
Sampling error reflects the fact that only a particular sample was surveyed rather than the entire population. The price index in a given period is calculated from a particular sample of houses sold. If a separate index number were calculated from each of all possible samples of identical size that could have been selected, using the particular procedure for calculating the index that is used for single-family houses, each of these numbers would differ from one another. The standard error, or sampling error, of a survey estimate is a measure of the variation among the estimates from all possible samples and, thus, is a measure of the precision with which an estimate from a particular sample approximates the average from all possible samples. The relative standard error equals the standard error divided by the estimated value to which it refers.
The relative standard error of the annual index for the United States is 0.5 percent. The relative standard errors for the quarterly index as well as for the Midwest, South, and West regions annual indexes are about 1.0 percent. The Northeast annual index has a relative standard error of about 2.0 percent.
The sample estimate and an estimate of its relative standard error allow us to construct interval estimates with prescribed confidence that the interval includes the average result of all possible samples with the same size and design. A 90% confidence interval is defined to be from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate. If all possible samples were selected and surveyed under essentially the same conditions and all the respective 90% confidence intervals were generated, then approximately one-tenth would not include this average estimate. For example, table 1 of this report shows the 1993 annual price index to be 91.1. Multiplying 91.1 by the relative standard error of 0.5%, we obtain 0.5 as the standard error. To obtain a 90% confidence interval, multiply 0.5 by 1.6, yielding limits of 90.3 and 91.9 (91.1. plus or minus 0.8). The average estimate of this annual price index may or may not be contained in this computed interval; but in 9 out of 10 samples, the interval calculated in this manner will contain the average estimate from all possible samples.
As calculated for this report, the estimated relative standard error measures certain nonsampling errors, but does not measure any systematic biases in the data. Bias is the difference, averaged over all possible samples with the same size and design, between the estimates and the true value being estimated. Nonsampling errors for the Survey of Construction can be attributed to many sources: inability to obtain information about all cases in the sample, definitional difficulties, differences in interpretation of questions, inability or unwillingness of respondents to provide correct information, and errors made in processing the data. Nonsampling errors for the price index can result from excluding important characteristics like the quality of building materials from the regression, high correlation among regression characteristics, and use of an improper regression model. These nonsampling errors also occur in complete censuses. We believe that most of the important response and operational errors are controlled in the course of reviewing the data for reasonableness and consistency. The regression model was chosen to minimize the amount of nonsampling error associated with the price index.