Author: Michel Isnard
Presenter: Jean Dumais
Question(s): What are the criteria for defining metropolitan areas in France? How many
metropolitan areas of one million or more population are there in France?
Response: An Urban Area captures the economic influence of an urban core. It comprises an
urban core offering at least 5000 jobs and all the surrounding communities for which at least
40% of the population is active in the area (1990 Census definition).
There were 368 Urban Areas in 1990.
Listed below, Urban areas of at least 350,000 people in 1990:
Urban Area Household Population (x 1000)
PARIS - 10,293,096
LYON - 1,508,572
MARSEILLE-AIX - 1,345,521
LILLE - 1,079,390
BORDEAUX - 831,327
TOULOUSE - 797,313
NANTES - 609,231
NICE - 540,234
STRASBOURG - 519,203
GRENOBLE - 477,180
ROUEN - 459,177
TOULON - 456,186
RENNES - 430,160
NANCY - 393,152
MONTPELLIER - 378,150
VALENCIENNES - 370,128
Population Estimates for Small areas in the UK: Performance and Promise
Presenter: Stephen Simpson
Question(s): What kinds of specific difficulties are encountered in estimating the population of
city areas?
Response: The result that city areas are more difficult to estimate is an empirical one: population
estimates for small areas in United Kingdom city cores have higher inaccuracy than would be
expected given their population size, change over the past decade, and presence of special
student or armed forces populations. This is not a problem for city suburbs outside city cores.
Our understanding is that the administrative records, which are involved to some extent in all
estimates, are deficient in these city core areas more than in others and that this is the cause of the
difficulty of estimation. They are deficient because of higher mobility of residents, and higher
numbers of people who are hard to capture in administrative records - healthy young adults,
adults who wish to avoid representation in official records, and no-income, no-residence adults
who are not in the target population of some official records.
This result replicates many other studies' findings from many other countries that urban areas are
difficult to enumerate in censuses and to interview in surveys. However in Australia population
estimates have been found (Andrew Howe, ABS) on average to perform better in city areas: due
to incompleteness of administrative registers in some rural areas. The specific weaknesses of
specific estimation data have always to be borne in mind.
Population and Housing Estimates for Census Blocks: The San Diego Experience
Presenter: Jeff Tayman
Question(s): How does your methodology account for boundary changes?
Response: We make adjustments to the geographic boundary files and apply them to the 1990
census base and the last year of the current estimates. If necessary, we partition the population
and housing data into the new pieces, but often it just requires a change to the geographic code.
In this way, we can compare to the census and have a base file for developing the next estimate.
The downside is that data for any years between the census and the last estimate point are not
based on the most current boundaries.
Boundary changes are a problem. We realize that our current solution is not the best solution,
but it is workable, albeit with a lot of effort when many boundary changes are involved.
Fortunately, there have been relatively few boundary changes in San Diego County this decade.
We are working on less labor-intensive solutions at present.
Population and Housing Estimates for Census Blocks: The San Diego Experience
Presenter: Jeff Tayman
Question(s): How often is the GQ list and long-term residents updated? Are there any problems
with rapid change? Do you think that changes in occupancy rates can be modeled using changes
in economic conditions (assuming that occupancy is the equilibrating factor when demand for
labor declines), as a way of updating small area occupancy rates?
Response: We update the group quarters list as needed. We get good information on military,
college, and prison group quarters. The military, especially shipboard population can change
significantly from year to year, but we can pick these changes up. I think we probably miss
smaller changes in other kinds of group quarters, say under 25, but large changes are usually
reported by local agencies.
I think it is very possible to develop supply-demand models for vacancy rates for counties and
higher levels of geography. In fact, our region wide forecasting model includes such a model.
Whether these relationships hold for subcounty areas is any ones guess. A major problem in
developing and maintaining such a model is the lack of subcounty economic information,
especially current estimates. Although, I think it is bad practice to hold vacancy rates constant at
the last census during major shifts in the economy and which do not reflect current building
trends. A start would at least to adjust county level rates and use share or trend methods to
estimate subcounty rates off of the county-level change.
Use of Property Tax Records and Household Composition Matrices to Improve the Household Unites Method for Small Area Population Estimates
Presenter: Warren Brown
Question(s): I assume that your method calculates a projected PPH matrix by PPH * T, where T
is the transition matrix. From where do you estimate the transition matrix? Or, which margin in
the matrix gets updated over time?
Response: I am proposing the use of Household Composition matrices to improve small area
estimates, not projections. They can be used in projections as well, but my focus in this project is
on estimates.
I am working on two items:
1. For small areas--such as tracts, block groups, and block group parts--are there a limited number of patterns of household composition that can be identified using a data reduction technique such as cluser analysis.
2. Are there typical paths by which these small areas change their household composition? That
is "college town" neighborhoods that don't change, as one type; and "Levittown" neighborhoods
that change dramatically as they age in place, as another type.
The transition matrices are calculated from decennial census data. Determine the change in the
household composition that took place between 1980 and 1990 thereby deriving the transition
matrix.
Use past paths as guides to expected changes in household composition, thereby affecting the
appropriate PPH multiplier in the Housing Unit method for estimates. Investigate the use of the
ACS as a source for monitoring changes in household composition for small areas, by type of
neighborhood.
Design Alternatives for Building Block Estimates
Presenter: Ron Prevost
Question(s): Do you think that, by 2010, you will be able to use your matched administrative
records for a nearly direct count? If not, what is the best use of these matched databases for
estimates purposes-what methods do you favor: Synthetic, sample, ratio-correlation, or some
other use of your records?
Response: We are creating a long-range research strategy to test administrative records as a
design alternative for Census2010. This strategy includes experimentation as part of Census2000
for a census simulation experiment and the potential to enhance or provide expanded input data
to support the Intercensal Population Estimates Program. This strategy will be presented at the
fall conference of the Federal Committee for Statistical Methodology.
Administrative records have been used in estimation operations for the majority this century. It
is difficult to say precisely which method will provide the best approach. The variety of potential
applications could include the Housing Unit Method, Component Methods, Shift Share
Regression Models or Micro-simulation.
My personal favorite is the Housing Unit Method because of its simplistic elegance and
verifiability. The majority of customers employing and reviewing current estimates are not
statisticians. Most individuals understand the concepts of housing units, vacancy, and household
size. This type of method provides an avenue to develop lasting partnerships between the Census
Bureau/FSCPE, and local governments and data users, to understand, use, and improve final
products.
Spatially Arrayed Growth Forces and Small Area Population Estimates Methodology
Presenter: Roger Hammer
Question(s): Are there any substantive assumptions or economic model behind your calculation
of FG?
Response: The underlying assumption of our approach is that municipality-level population
changes are influenced not only by the past behavior and contemporaneous indicators for the
municipality of interest but also by the characteristics of neighboring areas. This constitutes
something of a contagion model of population growth. The calculation of an adjusted set of
estimates utilizing the characteristics of neighboring municipalities tests this assumption.
The formula for the "Force of Growth" incorporates several assumptions, although they are less substantive. First, the population density and the population growth rate are the relevant characteristics of neighboring municipalities with regard to population growth. Second, the specifications of density and growth are ratios comparing the neighboring area with the municipality of interest. Third, the "Force of Growth" is a multiplicative function of these ratios and finally density and growth are equally weighted.
In summary, I think that we are not making any strong assumptions and the assumptions that we
are making can be empirically tested within the model in the preparation and comparison of a set
of adjusted estimates.
Development of a National Accounting of Address and Housing Inventory: A BaselineInformation for Post-Censal Population Estimates
Presenter: Ching-Li Wang
Question(s): What are your recommendations for updating/geocoding remote areas, non-
city-style, etc? Latitude/Longitude coordinates? Are there other forms of updating?
Response: The geocoding/updating remote, non-city-style address has been very challenging.
The Bureau uses address listing procedure and assigns an ID number on a spot on the map.
Following the same procedure, we can do the following:
(1). Any new non-city street type address needs to provide a map;
(2). The assessor's offices provide the parcel code and other information associated with the parcel (such as coordinates, directions);
(3) Develop a localtional system based on coordinates.
I think the coordinates will be a more precise locational reference.
Once these information are available, the Census Bureau can assign a temporary ID before the
address is incorporated with the MAF and TIGER update.
Applying Data from the American Community Survey (ACS) and the Master Address File (MAF) to the Intercensal Population Estimates Program
Presenter: Gregg Diffendal
Question(s): You talk about controlling ACS to ARSH estimates-which implies that ARSH
estimates must be more "correct." Yet, a well-done survey should be good in and of
itself-especially in capturing recent trends in migration. Shouldn't the updating be done in the
other direction as well, letting ACS correct ARSH estimates? What are your thoughts on this?
Response: There are a variety of reasons that surveys use population controls in their weighting.
I will attempt to describe a few of them.
Sampling theory tells us that if we know the total population (or any total for the universe) then
the estimates will have smaller variances if we use the control in the weighting.
The ACS does not wish to be in conflict with the official estimates produced by the Census
Bureau. It is best for us to be consistent, even if we are consistently wrong.
Part of our talk was to show how the ACS estimates could impact the population controls. In the
future, the results we saw for Rockland County for blacks would lead us to revise the estimates
for blacks in future years.
When you start splitting the data by age race sex and Hispanic origin, you may have very few
sample cases in specific cells. If you are measuring the difference between migrant and
non-migrant and looking at the residual, you probably are dealing with numbers that would not
differ significantly from zero when you account for sampling error.
The ACS does ask for movers and I think this could impact the estimates of migration from the
population estimates.
Surveys have to make a variety of assumptions that may not always be exactly correct. For
example there probably is undercoverage of new housing units, noninterviews and missing data
for specific questions, etc. Using population controls helps minimize the uncertainty and bias
from these types of errors.
Presenters: Ron Prevost, Patty Becker, Ching-Li Wang, Gregg Diffendal, Warren Brown, Roger
Hammer, Jean Dumais, Greg Williams
Question(s): It is well known and documented that a problem with LUCA updating was
"conceptual differences across databases"-that is, the object being measured/described in one
database is not exactly the object being measured in the other, causing significant matching
difficulties. Now, this problem occurs in lots of databases-do you have any thoughts on how to
attack this problem in general?
Responses:
Ron Prevost:
Some of the conceptual differences can only be achieved through an understanding
of the data. An example might be converting a county assessor's file with plat numbers to
housing unit numbers. We don't have a good way to perform this task without a file that could
give us a "cross-walk". Another example might be that we are processing files containing
businesses, residences, or a combination of addresses. Our standard approach discussed below
appears to accomplish that task, as well as a variety of address types and street aliases.
In general ARRS's approach to address matching has been to run both the input and the target
databases through an address sanitizer/standardizer that is CASS compliant. In our case we use
Group 1/Code 1 software. In our latest national test approximately 5% of all addresses do not
match to this software's databases. We've started testing a process to enhance matching through
the use of probabilistic matching software and a second run through Group 1/Code 1.
From early tests it appears that with this additional process we can accurately process about 50%
of the addresses not handled in the first pass. We are analyzing output from these procedures and
determining if these statistics are consistent across the nation. Furthermore, we plan to research
what sort of biases might be inherent in the final 2% of all addresses.
Once the basic street address processing has been completed we will review unit identifiers.
Folks have completed work for the MAF Quality Improvement Program (national survey) and
have tested probabilistic matching processes to convert the wide range of unit identifiers to a
common identifier to improve matching techniques. Example "Upstairs, Downstairs"; or "Front",
"Back" might be converted to 1 and 2.
Note: The ability to process addresses through CASS compliant software does not imply an
evaluation of matching records to the MAF. Presently we do not have a method to convert Post
Office Boxes and General Delivery Addresses to City-Style street addresses or Physical Location
Addresses on the MAF. ARRS will be exploring techniques to accomplish this task in our 2000
experiments program (AREX2000).
Ching-Li Wang:
The basic problems in dealing with the address are related to the address naming
and numbering system, and how the addresses are reported or key in the database. In addition,
the addresses and street names also changed over time depending on local authority decisions.
As a result, the address which can be match in the past is no longer can be matched. Therefore,
we need a data collection network through a Federal-State-Local Cooperative Program for
Address Gathering. That is also what I like to see the EAGLE (Enhance Address Gathering for
Local Estimates) to fly.
With the EAGLE, any information about changes or additions in addresses can be quickly
transmitted to the Bureau through the network on a regular basis - like the "Daily Entry" in the
accounting practice. That is what I say it is important to have a National Accounting of
Addresses and Housing Inventory.
Through the EAGLE and the data collection network, the Bureau will be in a better position to
have constant local input. At the same time, the Bureau will have the opportunity to set a
nationally standardized address data system. With the standardization of addressing system, the
address level database can become comparable.
For the time being, it is necessary to analyze the addressing system in each database and develop
various matching rules. But, eventually, we need a new data input system to update
MAF/TIGER. Struggling with different database will not solve the addressing problems. I still
feel we need a National Accounting System of Address and Housing Inventory.
Patty Becker:
Matching on street name can be very difficult when you don't know the area. It has
taken us many years to develop a standard for the spelling of Detroit
streets, and there's only about 2000 of them.
In the LUCA connection, it is necessary to standardize both the census MAF and the local MAF
on a common basis in order to do the match. The easiest standard to use is CASS, the Post
Office standard, although I often don't agree with what it does around here. In Detroit we have a
routine to standardize other files to our standard street names.
In general, this is just one of the many pitfalls for LUCA which was not sufficiently anticipated
ahead of time. Geography Division staff were surprised at how difficult it was to match, and they
never gave local governments any guidance at all.
I should also note that commercial geocoding progams have ways of trying to get around this
which may or may not result in an accurate geocode, and I've never been happy with them. For
Detroit, of course we have our own. The other thing that often happens in using the commercial
software is that there is a high fallout rate, and users don't know why it's there. So either they
live with it (biasing their data in favor of the streets that don't cause problems) or else they just
give up.