Documentation for Puerto Rico Population Estimates
Source: U.S. Bureau of the Census
Internet Release date: April 30, 1997
The 1996 total population estimate for the Commonwealth of Puerto Rico and the estimates
for its municipios were produced using two different methods. The total estimate for the
Commonwealth was produced using a cohort component method developed by the Rural Urban
Projections office (RUP) of Population Division, while the municipio estimates were
calculated using a ratio correlation method developed by the Population Estimates Branch
(PEB).
The Cohort Component Method
The cohort component method developed by RUP uses components of population change (births,
deaths, and migration) to produce Puerto Rico population estimates by single year of age
and sex.
To make estimates using the cohort component method, a base populaton is required.
The base year for the 1996 estimates is 1990. Since the Census Bureau makes midyear
to midyear estimates, the census population was moved to midyear using the intercensal
growth rate.
Base mortality for 1990 for Puerto Rico was derived by calculating a life table using
an average of registered age-specific deaths for 1989-1991 and the aforementioned midyear
1990 population estimate. Registered deaths by age and sex for 1990-1993 were used to
project the population from midyear 1990. For 1994, only total deaths and infant deaths
were available.
In the case of fertility, the population was estimated using total reported births by age
of mother, by sex, for 1990-1993. For 1994, only total registered births were available.
The estimate for migration was derived by using the difference between two independently
calculated populations for Puerto Rico. One populaton estimate used intercensal migration
data while the second method used housing unit data.
Methodology for Puerto Rico Municipio Population Estimates
In the ratio-correlation model used for Puerto Rico, a multiple-regression equation is used
to relate changes in the distribution of births, deaths, and housing units to changes in
the distribution of population among municipios. For both development of the regression
equation and the computation of the population calculating ratios of percentage shares in
the later year to corresponding percentage shares in the earlier year. These
transformations cause the resulting coefficients in the predicting equation to add to
approximately 1.0. The regression equation is given by:
Y predicted = .03 + .13births + .09deaths + .75housing units
The r-square is .81
Below is an extract of a paper by Michael J. Batutis that provides a general-purpose
explanation of the regression method used at various times in the population estimates
program. For a hardcopy of the paper call (301) 457-2380.
Subnational Population Estimates Methods of the U.S. Bureau of the Census
Prepared By
Michael J. Batutis
Chief, Population Estimates Branch
Population Division
Bureau of the Census
October 1991
(From page 9 of the Batutis paper)
B. Regression Method
Regression in a variety of forms has a long history in population estimates at nearly
all levels of geography. In the usual applications, regression is a stock model
whereby the population at time t is used as a base and a regression equation is used
to estimate, or predict, the population at time t+n. There is no attempt in this
method to deal with the demographic dynamics of population change.
The application of regression that is used by the Census Bureau traces its roots to
a 1954 article by Schmitt and Crosetti in which they tested the accuracy of several
methods of estimating population (Schmitt and Crosetti, 1954). One of the methods was
a so called ratio-correlation model, although the reasons for this name are obscure.
The model is a least-squares, linear regression model in which the independent
variables are ratios of county proportions of selected symptomatic indicators of
population change in the estimate interval to the corresponding proportions in the
base interval. The dependent variable in the model is the change in a county's share
of the state population between the base point and the estimate date.
Schmitt and Crosetti presumably called this method the ratio-correlation method
because they chose the symptomatic indicators of population change for the model by
examining a zero-order correlation matrix and selecting independent variables that
were highly correlated with population. Although this procedure may be desirable,
it is not intrinsic to the model and a better name would be ratio-regression, or
simply regression. For purposes of this handbook, the term ratio-regression is used.
In equation form, the generalized ratio-regression model is:
Editor's note: formulas did not translate well to the text file, phone author for hardcopy
(6) Y SUB {t} = X SUB {1} + BX SUB {2t} + BX SUB {3t}...
BX SUB {i,t} + U SUB {t}
where
Y SUB {t} = the estimated value as of the most recent census.
X SUB {1} = a constant.
B = regression coefficient.
X SUB {2}...X SUB {i} = independent variable.
U SUB {t} = a term for random error.
In a ratio-regression model, the dependent variable y SUB t takes the form:
(7) y SUB {t,k} = (P SUB {t,k}/ SUM from {k=1} to m P SUB
{t,k})/(P SUB {t-10,k}/SUM from {k=1} to m P SUB {t-
10,k})
where
P = total population.
t = the year of the most recent census.
k = an index for geographic areas.
m = the number of geographic areas.
Therefore in the ratio-regression model, the dependent variable is the ratio of the
k SUP{th} geographic area's share of the population across all k areas at the most
recent census to the k SUP {th} geographic area's share of population at the census
ten years prior. If the additional step is taken of subtracting 1 from Y SUP {t},
as in
(8) {þ SUP {*}} SUB {t,k} = þ SUB {t,k} - 1.0
then the interpretation of Y SUB {t,k} is the change in population share in the k
SUB {th} geographic area from one census to the next.
The independent variables are defined in a similar way in the ratio regression model,
except that each independent variable represents a different symptomatic indicator
variable, like school enrollment or births. In equation form, the independent
variables are:
(9) X SUB {i,t,k} = (X SUB {i,t,k}/ SUM from {k=1} to m)/(X
SUB {i,t-10,k}/SUM from {k=1} to m X SUB {i,t-10,k})
where
X = the value of the symptomatic indicator variable.
i = an index for the symptomatic indicators.
t = the most recent decennial census year.
k = an index for geographic areas.
m = the number of geographic areas.
All of these equations really express the rather simple idea that the change in a
geographic area's, say a county, share of a number of symptomatic indicators over all
counties in a state is related to the change in that county's share of state
population. The choice of symptomatic indicator variables is guided by a demonstrated
or presumed relationship to population coupled with the availability of the
symptomatic indicator data in the post-censal period. This last point is
particularly important since the application of the ratio-regression model to a
population estimate requires the substitution of current data into the independent
variables. Thus, equation (9) becomes
(10) X SUB {i,t+n,k} = (X SUB {i,t+n,k}/SUM from {k=1} to m
X SUB {i,t+n,k})/(X SUB {i,t,k}/SUM from {k=1} to m X
SUB {i,t,k})
where
n = the number of years between the most recent census and the
estimate date.
the result of the model, þ, becomes
(11) þ sub {t+n,k} = (P sub {t+n,k}/sum from {k=1} to m P
sub {t+n,k})/(P sub {t,k}/sum from {k=1} to m P sub
{tk})
and
(12) {þ sup {*}} sub {t+n,k} = þ sub {t+n,k} - 1.0
The ratio-regression model is used to estimate the change in a geographic
sub-area's share of a larger geographic area's population since the last
decennial census. The estimated change is used to compute a new share at the
estimate date and applying the new share to an independent population total for
the parent geographic area results in the estimated population at time t+n for
the kth geographic sub-area.
The effectiveness of the ratio-regression model for preparing population
estimates depends on two major factors. First, the method presumes that an
accurate population estimate for some higher level of geography is available.
If the higher level estimate has a large degree of bias, the ratio-regression
method may distort local variations in population. Second, and more seriously,
the model assumes stability among the relationships of the independent and
dependent variables. The regression coefficients are calculated on the basis
of the experience of the prior decade and are held constant through the
post-censal estimating interval until the next succeeding census. If the
assumed relationships change differentially over time for various types of
sub-areas, the method will produce a seriously distorted distribution of
population. Another factor that affects the accuracy of the ratio-regression
model is multicollinearity among the independent variables. The choice of
independent variables is based on the strength of their correlation with
population. Therefore, the independent variables are also likely to exhibit
high correlations among themselves. If the pattern of intercorrelation changes
over time, the accuracy of the regression coefficient is reduced.
The strengths of the ratio regression method are ease of applic ation, provided
that symptomatic indicators are available at the appropriate levels of geography,
and the flexibility and stability that are inherent in the use of symptomatic
indicator data. The method has been widely applied in numerous estimating
environments and variations are suggested frequently. The model has generated
a voluminous literature and has generally tested well against decennial census
results.