Areas covered in this section include:
- a discussion of the 5-digit ZIP Code and its relationship to geographic areas;
- the post office's ZIP+4 assignment process;
- the post office's ZIP+4-to-county cross reference file;
- the editing of the file; and
- the development of a ZIP/sector-to-county coding file.
A. Relationship of 5-digit ZIP Code to County
One specific goal of our research efforts is to be able to accurately code to county quickly and efficiently so that the state and county population estimates can be produced in an integrated process. Given the current production schedule and methods of processing, that leaves 1 to 2 weeks for the county coding and migration production process. This production schedule precludes the use of address-based coding systems for county coding. However, coding to county by using ZIP+4-to-county cross reference files is a promising avenue.
Coding to county by using the ZIP Code in the mailing address does not assume that the mailing address is the same as the residence address, but it does implicitly assume that they are in the same county.
B. ZIP Code Assignment
ZIP Codes are designed to deliver mail. The ZIP Codes and area of responsibility are assigned to handle the mail as efficiently as possible and (mostly) without regard to geographic boundaries. In a technical sense, ZIP Codes are not area based, but a collection of delivery points. However, each ZIP Code usually can be assembled (with boundaries). A ZIP Code can also be assigned to a unique delivery point such as a university, government building, business, or a group of post office boxes.
At the state level, most ZIP Codes deliver wholly within the state, but a few do deliver to out-of-state areas. At the county level, some ZIP Codes cross county boundaries, but most deliver wholly within the county. The ZIP Codes that are split by state or county, however, pose problems for coding by ZIP Code.
In selected parts of the country, there are also postal delivery processes that pose special problems. In Alaska, for example, there are post offices that are an intermediate drop off point where they hold mail in pouches for later delivery to a remote area such as logging camp, fishery, etc. These are now being changed to post office boxes, with a three character alpha as part of the box number, but they still pose problems for geographic coding by ZIP Code. Also, there are areas that have no house-by-house delivery and individuals have to pick up their mail from the post office. Such individuals may also have a choice of post offices. In such cases, direct coding by ZIP Code may be problematic.
One option is to create a ZIP to county cross-reference file by collapsing the 1980 primary coding guide to ZIP/state/county, using only one possible county. This incorporates the 1980 mailing to residence adjustment. However, it is an old adjustment, and only the 5-digit ZIP Code is available. Based on our experience with the 1980 coding guide, we estimate we could code to the county level using 5-digit ZIP Code (only) with about 96 percent accuracy overall. However, quality of coding will vary dramatically by county. For many of the large counties, the coding will be good, but for most of the small counties, the coding will be very poor. Some counties will not be coded at all. Additionally, independent cities, such as Baltimore city, MD or Manassas Park City, VA and the surrounding counties will have substantial problems in the coding.
For many large cities (excluding the independent cities), most of the ZIP Codes are wholly contained within the city. Geographic coding to large cities using 5-digit ZIP code (only) may be feasible. For small places and the more sparsely populated areas, the ZIP codes tend to cover several subcounty areas. Geographic coding to such subcounty areas using 5-digit ZIP Codes would not be very good at all.
C. ZIP + 4 Code
A few years ago, the post office assigned an additional 4 digits to the existing 5-digit ZIP Code to make mail handling and delivery more efficient. The +4 code is actually two codes in one -- the first 2 codes are sector and the second 2 codes are segments within the sector. The following describes the ZIP+4 assignment process prepared by Suzanne Shepherd 3/. But first a cautionary note. These are guidelines established by the post office and there is flexibility of implementation by the individual postmasters.
"The U.S.P.S. perceives ZIP+4 codes in city-style address areas as essentially geographic in nature. A city-style address typically is an address in structure number-street name form, such as "4320 Huntingtown Road." The first two digits of the +4 add-on, which is referred to as the "sector" component, typically represents a block group (but is not coincident with Census Bureau-defined block groups). The last two digits of the +4 add-on, which is referred to as the "segment" component, typically represents a block side, a company, a unit within a company, a building, or a floor within a building.
To establish ZIP+4 Codes, the U.S.P.S. plots a 5-digit ZIP Code boundary on a street map and uses main thoroughfares to cut the 5-digit ZIP Code area into preliminary sectors. The U.S.P.S. then counts the number of block sides and the number of companies that receive 10 or more mailing pieces. If these two numbers total more than 50 in a primarily commercial area, the preliminary sector usually is further divided. If these two numbers total more than 70 in a primarily residential area, the preliminary sector usually is further divided. These thresholds are merely guidelines that change somewhat due to a preliminary sector's growth potential. For example, if a preliminary sector contains a lot of open area, the U.S.P.S. will lower the number, but if a preliminary sector is already quite congested, the U.S.P.S. will raise the number." 3/
The map on page 8 shows the City of Cambridge, Ohio and a small portion of the surrounding area. The city and the surrounding area are covered by a single 5-digit ZIP Code. The sectors for the city style deliveries have been overlaid on the map. These boundaries have been derived from an examination of the ZIP+4 Codes on residential address lists. For exposition purposes, the boundaries have been expanded to the nearest physical feature (river, interstate highway, etc.), to include uninhabited area (such as city parks, cemeteries, etc.). Also, some sectors that have only business deliveries may not be shown on the map.
From the map, we can see that the sectors are formed by adjacent blocks and block faces, and can be bounded by a polygon. The polygons are mutually exclusive and encompass the entire city style delivery area. We can also see from the map that a sector includes deliveries on both sides of a street at a sector boundary. Other Post Offices may choose to have the sector boundary in the middle of the street, with even numbered addresses in one sector and odd numbered addresses in another sector.
The shaded areas to the north and to the southeast of the delineated sectors shows area inside the city limits that does not have city style deliveries. These areas are covered by the rural route style deliveries, even though the addresses are of the house number/ street name format.
"When segment numbers are depleted within a particular sector area, which we may also refer to as a ZIP+2 area, the U.S.P.S. inserts another sector area within the original sector area. This additional sector area may split the original sector area, creating two discontiguous sector areas with the same sector number. Segment numbers are unique for a sector number. The number assigned to the new, inserted sector area is previously unused within the particular 5-digit ZIP Code area. Residential-to-commercial rezoning typically causes segment number depletion." 3/
Text Chart A -- City Style Sectors in the Cambridge, Ohio Post Office (GIF 187k)
"In areas that have rural-style addresses, the U.S.P.S. assigns +4 add-ons according to a letter carrier's line of travel. Therefore, ZIP+4 Codes in these areas do not refer to geographic areas.
In areas that have rural-style addresses, a street segment receives a +4 add-on only if it is part of a letter carrier's route. The U.S.P.S. differentiates between block sides only if a carrier stops on both sides of the street to deliver mail. The first rural route for a 5-digit ZIP Code usually has a sector number of "97", the second rural route has a sector number of "96", and so forth. The +4 add-ons for a rural route typically go from "9701" to "97nn", with "9701" being the first street segment on which the carrier delivers mail and "97nn" being the last." 3/
The map on page 10 shows the delivery path of two of the 9 rural route sectors from the Cambridge, Ohio Post Office. The dotted line is sector 94 and the dashed line is sector 97. It is obvious from the map that these sectors are not geographically based. They deliver to a few addresses in the city limits, and to addresses in several townships outside the city limits. In short, these sectors wind all over the countryside. They do not, however, cross into another county.
There are two other interesting facets of the rural route deliveries for the Cambridge, Ohio Post Office. Most of the area has been converted to house number/ street name format and are covered by sectors 90 to 97. Sectors 91 to 97 cover most of the area in a linear fashion. Sector 90 is comprised of scattered street segments not covered by sectors 92 to 97. Also, the few areas that have not been converted to house number/ street name format are all lumped together in sector 98.
"If a rural route crosses a county boundary, the sector number changes, typically to another number in the nineties, and the U.S.P.S. numbers the segments in sequence beginning with "01". If the rural route crosses back into the original county, the +4 numbering resumes where the original +4 numbering left off. For example, if "9718" was the last +4 number assigned before the rural route crossed into another county, then "9719" is the first +4 assigned when the rural route crosses back into the original county.
When a group of rural mail boxes receive mail from different letter carriers, their sector numbers are different and there may be no pattern to the +4 add-ons. For example, the +4 add-ons for a group of rural mail boxes may be "9601", "9622", "9705", and "9601" again, because the mail boxes are not only on different rural routes, but on routes coming out of different 5-digit ZIP Codes. If a structure receives mail via a rural route, its mail box does not need to be anywhere near the structure." 3/
Text Chart B -- Two of the 9 Rural Route Sectors in the Cambridge, Ohio Post Office (GIF 229k)
"If a jurisdiction establishes city-style addresses and the U.S.P.S. adopts them for mail delivery, the U.S.P.S. reassigns the +4 numbers." 3/
Additionally, sectors 00 through 09 are usually reserved for the P.O. boxes. Sectors 98 and 99 are usually reserved for the postmaster and for "business mail reply".
The +4 codes are used by the IRS in the mailing address. For the 1988 IRS 1-percent sample file, 94 percent of all addresses had the +4 codes. Also, 98 percent of the house number/street name type addresses had a +4 code, 91 percent of the rural route type addresses had a +4 code, and 98 percent of the P.O. box addresses had a +4 code.
D. ZIP+4 to County Cross Reference File
The post office has created a ZIP+4-to-county cross reference file which could serve as the basis for the county coding process. The file is a quarterly product and is updated to reflect changes occurring since the prior release. That is, new ZIP Codes are added, discontinued ZIP Codes are deleted, changes to ZIP Codes or +4 codes incorporated.
The ZIP+4 to county cross reference file contains a record for each unique ZIP+4 Code, or about 24 million records. Two exceptions to this are as follows: (a) If a business (or government agency) has more than one +4 code assigned to it, the file will have only one record with the data on the record showing the range of +4 codes assigned; (b) the same may be true for post office boxes.
The file contains the following data items:
- ZIP Code;
- sector/segment for lowest of the sector/segment range;
- sector/segment for highest of the sector/segment range;
- a 2-character state abbreviation;
- county code; and
- county name.
Note that the 2-character state abbreviation is the state in which the post office is located and the county represents the county in which the mail is delivered. That is, in a few cases, the county may be in a different state than the state name identified. There are no street name or address range information contained in this file.
The file should cover all ZIP Codes in the U.S., all ZIP Codes for U.S. possessions (Puerto Rico, Virgin Islands, etc.), and all APO/FPO ZIP Codes. All counties and county equivalents in the U.S. and U.S. possessions are represented in the file with the exception of Yellowstone National Park, MT (30-133), and, for the 1991 file, Denali Borough, AK (02-068).
The county should represent the county in which the mail is delivered. For post office boxes, it is the county in which the boxes are located. The APO/FPO ZIP Codes are assigned to the county the mail is delivered from, with the exception of APO/FPO ZIP Codes for military bases in Alaska and Hawaii. These are assigned appropriate county codes in Alaska or Hawaii.
I was not able to exactly determine how the ZIP+4-to-county cross reference file was prepared, but my understanding of the process is as follows. The data file was manually prepared at the local post office level, under general guidelines provided from "headquarters" USPS. The posted work sheets were data keyed and the file compiled by the regional or national information centers. Thus, it is reasonable to expect errors in the posting and in the data keying of the county codes. Also, it is important to note that the local post offices are relatively autonomous. They usually try to adhere to the guidelines provided by "headquarters"; but, one should expect variations to occur. Further, there will be no documentation of such variation. In short, the ZIP+4-to-county cross reference needs to be thoroughly edited. Sections E, F, and G describe the edits we performed on the cross reference file.
E. Coverage Edit
The ZIP+4-to-county cross reference file may not include all ZIP Codes. Some are post office errors. Some are ZIP Codes actually used by local areas that are not known by the office assembling the file. Some may be discontinued ZIP Codes. However, because of lags in implementing ZIP Code changes, administrative record systems are likely to include outdated ZIP Codes. Also, some people continue to use the old ZIP Code even though it has been changed.
The first step was to compare the ZIP Codes in the file with those actually used in the IRS file and with those listed in recent ZIP Code directories. Where needed, additional ZIP Codes were incorporated into the file. Also, when the tax year 1990-1991 and 1991-1992 1-percent test files were processed, we examined all ZIP codes that had at least 5 uncoded returns. We assigned a county code to the ZIP codes and incorporated them into the coding files.
F. APO/FPO County Code Update
Post Offices for the U.S. military overseas (APO, FPO) are handled out of 4 cities in the U.S. -- New York, Miami, San Francisco and Seattle. The county codes assigned to the APO/FPO ZIP codes reflected these cities. First, these state/county codes needed to be changed to a separate category denoting APO/FPO, with an exception. The APO/FPO ZIP codes for military bases in Alaska and Hawaii are assigned the appropriate county in Alaska or Hawaii. Second, the complete list of APO/FPO ZIP Codes was reviewed to make sure that all appropriate ZIP codes were included. Additions were made where necessary.
The state/county codes for the Trust Territories were also reviewed and modified, as necessary, to reflect the FIPS state and county equivalent codes.
G. Illegal County Code Edit
The ZIP+4-to-county cross reference file contains some illegal county codes. A county code of 999 was occasionally used and there were other non-existent county codes. All records in a ZIP Code that contained an illegal county code were examined and a correct county code determined.
The 999s were cases where the ZIP Code crossed into another state and the person assembling the data did not know what county to code. This occurred most often in North Dakota and South Dakota. These were recoded to a contiguous county in an adjoining state where it seemed reasonable to do so (by looking at the ZIP Code map, the ZIP Code directory, and atlas).
Most of the other illegal county codes were obvious typographic errors from posting and data keying (such as digit transposition). However, some were because the state code is the ZIP state and the county code is in another state. These were reviewed and the state code changed.
A few (but not many), of the illegal county codes were cases where the person preparing the county codes simply made up a new code to represent some special case in their area. It was not possible to tell what these were. For these, and the remainder of the illegal county codes, a county code was assigned (frequently the dominant county code for the sector). Thus, all illegal state/county codes were changed to legal state/county codes.
Also, in ZIP Codes that had more than one county listed, there were some that contained at least one county that was not contiguous to the other(s).
- A few of these were plausible (e.g. where counties are very close but not contiguous) and were not changed.
- Some of these were actually for a contiguous county across the state line (the state code was repaired).
- Some were typographic errors not caught in previous edits (and were fixed).
- A significant number were inexplicable. These were replaced with the dominant code for the sector.
These reviews and corrections are based primarily on educated guesses and "most likely" corrections. We simply did not have resources to do a thorough review/correction to obtain exact information (for example, by calling the local post office). Still, a substantial amount of effort was expended to clean up the file. It is reasonable to expect that there are still some errors in the file that were not caught by the edits, and some errors introduced by the review/correction process.
The above discussion focused on "bad" codes within ZIP/Sectors but did not give a feel for how many there were. There were 1,494 ZIP Codes with a change, and 3,438 (out of 857,400) ZIP/Sector records with a change. There were 17,539 ZIP+4 records (out of about 24,000,000) with a change.
As mentioned earlier, there are a few ZIP Codes that deliver across state lines, and there are a few ZIP/sectors that cross county lines. There are 153 ZIP Codes in more than one state. There are 9,000 ZIP Codes in more than one county. There were 11,331 (out of the total 857,400) ZIP/sectors that were split by county. All states had some split sectors, with Virginia, Michigan and Ohio having an especially larger dosage. The rural route sectors, as expected, contained (relatively) the lion's share of split sectors. Most of the other cases are in the lower sector range (reserved for post office boxes) and in Sector 99 (reserved for the postmaster and business mail return). There must be some non-standard county code assignment occurring for these selected cases. We will have to further investigate these at a later date.
H. ZIP/Sector to County Coding Guide
Most ZIP Codes are entirely within one county. For those that are split by counties, most of the ZIP/sectors are entirely within one county. Therefore, the file could be collapsed down without loss of information. The collapsed version would provide for a fast and efficient method of coding. We collapsed the file down to a file containing ZIP Code and sector range for the strings of sectors in the same county. This formed the basis for the CCRS coding file. For 77 percent of the ZIP Codes, the ZIP range will be 00 to 99 (as the ZIP delivers within one county). Split sectors were assigned the dominant county. Where a ZIP Code was split by county, an auxiliary coding guide was created which contains the dominant county code in the ZIP Code.