Before embarking on the development of new methods of geographically coding mailing addresses to the subcounty level, an assessment needs to be done of the numbers and types of mailing addresses that are actually used on the income tax returns. This section outlines the post office's addressing guidelines, shows the address types contained in the 1990 decennial census address control file, and discusses the various address types and the quality of the address information in the individual income tax returns.
A. Post Office Addressing Guidelines
Local planning authorities have the responsibility for deciding on addressing conventions as well as assigning individual addresses. This could be any of the local groups responsible for assigning city-type addresses, such as local planning boards, developers, municipalities, utility companies, etc.
The post office maintains ongoing relationships with the local address planning authority, primarily in a consulting or advisory capacity. They (the post office) has no actual authority over the address conventions or actual addresses assigned by the local planning authority. The post office does, however, have the right to refuse to adopt new city-style addresses prepared by the local planning authority if it determines that it will be unable to provide efficient, cost-effective service. For all other units requiring individual delivery, the post office has assigned rural routes or has general delivery (by customers name). There are still some areas that do not have delivery services and the individuals must pick up the mail from the post office.
Also, the post office actively encourages local address planning authorities to assign city style addresses in areas that currently do not, as do local emergency services and utilities.
The post office has defined the standard parts of a city style address. This is the usual house number/street name addressing convention. It can be composed of many parts in the primary and secondary addresses, but the minimum content is the primary number, primary street name and suffix such as 124 Maple St. Content can include:
- primary address number
- directional prefix
- primary street name
- suffix (e.g. RD, ST, LA, AVE, etc.)
- directional suffix
- secondary address (APT, SUITE, etc.)
- secondary number (Note that "number" may be alphabetic, numeric or mixed such as APT 401, SUITE G or SUITE 401 G)
Further, the post office recommends the following conventions:
- Each noncontinuous street should have one correct name and should be uniquely identifiable without directionals or suffixes. For example, avoid Palm Court, Palm Avenue and Palm Street as names for separate streets.
- The use of names that also are suffixes, such as Court Street, Southeast Boulevard or East West Hwy should not be used.
- Sound-alike names such as Beech and Beach, Main and Maine should be avoided.
- Street names should not be longer than 15 characters.
- Special characters (hyphens, apostrophes, periods, etc) such as St. Lawrence St, O'Connor Boulevard, etc., should be avoided.
- The use of nonspecific addresses (such as corner of 5th and Main Streets) should be avoided. Use a specific address (such as 501 Main Street).
- The house number assignment should proceed from a logical point of origin and be in proper numerical sequence in relation to other lots with frontage on the same street. Odd numbers should be assigned to one side, even numbers assigned to the other. Also, numeric assignment should be sufficiently flexible to accommodate maximum density permitted by zoning regulations (have room for growth).
- Street numbers should be no longer than 6 digits.
- Do not use fractionals (such as "101-1/2 Main St"), alphas (such as "A101 Main St" or "101G Main St"), or hyphenated numbers (such as 10-101 Main St) in the house number field.
- Use individually addressed primary numbers rather than secondaries wherever possible (such as "101 Main" and "103 Main" vs. "101 Main Apt. A" and "101 Main Apt. B").
- Maintain addressing continuity through the municipality (and municipality to municipality where possible).
- The rural box numbers should be sequential along the carriers line of travel, with sufficient flexibility to accommodate growth, and avoiding hyphenation or alphabetics (e.g. Box 124A, Box 124B, etc.).
- Rural route name conventions are "RR" or "HC" and do not include "Rural", "Route", "#", "No", "Number", "RD", "RFD", "Star Route", etc.
- Do not use both rural route and street name and do not use secondary address. (such as "RR 1 Box 124 Boden RD" or "RR1 Box 124 Apt A").
- Post office box format should be "PO Box" followed by the box number. Excluded are other names such as "caller", "drawer", "lockbox", "PB", etc. Also, the box numbers should not include alphabetics or hyphens.
Again, note that these are guidelines and may or may not be followed by the local planning authority. Also, there may be addresses previously defined that do not conform to the current guidelines. In fact, we have seen examples of violations of all of these guidelines in the addresses on the individual income tax returns.
B. Address Types in The 1990 Census Address Control File (ACF)
The attached Table 1 shows the number of addresses contained in the ACF 3/. The data are shown by type of enumeration area (TEA), address type and state. Tape address register (TAR) areas are the 384 urban areas in Metropolitan Statistical Areas for which there was city type delivery, and commercially available mailing lists that could be geographically coded by the TIGER system. These are essentially areas that were covered by the 1980 GBF/DIME files). Prelist addresses are those addresses outside TAR areas that were manually compiled by Census Bureau personnel prior to the 1990 census. List/enumerate are those addresses listed at the time of the actual enumeration.
Overall, 83.1 percent of the addresses in the ACF were city types. The percent varies considerably by state. There are 12 states with a city type usage rate of 90 percent or more, and nine states with the usage rate in the 80 percent range. There were 14 states in the 70 percent range, eight states in the 60 percent range, and five states in the 50 percent range. Finally, there were two states (Maine and West Virginia) in the 40 percent range and one state (Vermont) in the 30 percent range.
Overall, 8.2 percent of the addresses in the ACF were rural routes. Sixteen states had rural route usage rates of 10 to 19 percent, and nine states had usage rates of 20 percent or more. These were Alabama, Arizona, Maine, Mississippi, North Carolina, North Dakota, South Dakota, Vermont and West Virginia.
Post office boxes were 3.7 percent of the addresses in the ACF, with nine states having a usage rate in the 10 to 19 percent range. These were Idaho, Maine, Montana, New Mexico, North Dakota, South Dakota, Vermont, West Virginia and Wyoming. Then there was Alaska, weighing in at 22.9 percent.
The usage rate for general delivery was 0.2 percent, with Alaska at 12.5 percent and West Virginia at 2.5 percent.
The percent of all other addresses was 4.7 percent for the U.S. with the largest "other" address prevalence occurring in the following states:
|Arizona - 11.9
||Oklahoma - 12.0
|Missouri - 10.1
||West Virginia - 14.7
|North Dakota - 10.2
||Maine - 16.0
|Vermont - 21.7
||New Hampshire - 14.3
|Kentucky - 10.0
||Tennessee - 12.1
|Montana - 11.7
C. Address Formats Actually Used on Income Tax Returns
It is important to note the very real difference in address types and quality of address information between a controlled environment, such as the decennial address list, and in an uncontrolled environment, such as the taxpayer supplied addresses. The tabulation of address types contained in the ACF does not account for many of the nuances in address formats, and there are numerous mailing address formats used that are not residential. We know that there are a host of nonstandard postal delivery methods and associated address types (such as house number/street name, rural route, post office box, etc.). There are also different address descriptions, formats, abbreviations used by the taxpayer. Also, the address format may not be standard (e.g. "Box 17B Route 7" vs. "Route 7 Box 17B"), and abbreviations may not be standard (RT vs. RR vs. Rural Route), etc. The address could be a host of other locations, such as accountants, financial institutions, place of work or business, parents' address, post office box, etc.
D. Address Classification Process
The first step was to pore through addresses in selected parts of the country to find various nonstandard addresses. The second step was to develop a simple classification system, apply it to the sample test file and tabulate the results. Addresses not falling into any of the classifications were reviewed, refinements made to the classifications, and the process repeated. After this work was done, we found some more nuances that did not show up in the test. The following describes the various nonstandard address formats that we found.
Military -- There is a standard on-base mail delivery system, akin to P.O. boxes. The format is "PSC #", such as PSC 125. Also, the returns may be addressed by the military unit. The following character strings also were used to recode such military addresses if it occurred at the beginning of the adddress: HQ, HHB, HHC, HHD, HHT, MSSG, MACS, VF, WPNS, MWSS, VMF, VMFA, VFMAT, HMH, MALS, USS, QTRS, QUARTERS, BEQ, CO A, CO B, ... CO Z, A CO, B CO, ... Z CO, ACO, BCO, ... ZCO, MAG, NTTC, A BTRY, B BTRY,... Z BTRY, ALPHA, BRAVO, CHARLIE, DELTA, ECHO, FOX, GULF, HOTEL, and NAS. Also, another recode was assigned if the ZIP or the ZIP+4 was for a military installation, or the post office name was for military bases.
Relative position -- these are formats that have two directionals and two numbers as the address, such as "N 123 E 234". The recode looked for "N # E #" or "# N # E" for any of the eight possible combinations of directionals. These are prevalent in selected areas of Utah and Wisconsin. These are mislabeled in tables 5 and 8 as "latitude/longitude".
Directional prefixes -- Instead of a standard format such as "123 N Main st", the format is "N 123 Main St". The recode was for any of the eight combinations of directionals (N, E, S, W, NE, SE, NW, and SW). These were mostly in Washington state.
Mile -- This is where the street name is a number followed by "Mile", such as "26 Mile Road". An example of an address might be "123 26 Mile Road". We found these in Michigan.
Address number is by mile -- An Example would be "Mile 23 Main Hyw". To be recoded as such, the first four characters of the address had to be "MILE". These were found in Alaska.
Blank address -- In some cases, the address field is blank, where delivery is identified by addressee's name.
General delivery -- In some cases, the address is "General Delivery", "GEN DEL" or "LOCAL", where delivery is identified by addressee's name.
Rural Routes -- The following character strings were used to identify the rural route type (yes, all of these were actually used in the income tax addresses) if it occurred at the beginning of the address field: RTE, RR, R RT, R R, R T, R #, ROUTE, RURAL ROUTE, RURAL, RD, R D, RFD, R F D, STAR, MTD RT, MTD RTE, SUB RT, SUB RTE, SUBURBAN RT, KEYSTONE RT, SEARING RT, SKAAR RT, HCR, H C, HCO, HC, HWC, and R NO.
Route Number -- This represents the highway number, such as "RT 40" rather than the post office's rural route. There can be abbreviations for U.S. routes, state routes, county routes and even township routes. The recode included addresses that had " RT " anywhere in the address, and selected character strings if it occurred at the beginning of the address (such as "SR 662 BOX 125", or was preceded by a numeric (such as "1234 SR 662". The character strings included: RT, ST RTE, ST R, SR, CR, TR, HWY, etc. Although each of the character strings were recoded separately, they are included in with other categories in the tabulations.
- If the highway route designation was preceded by a numeric (such as "1234 ST RT 662"), then it was included in the category "Old Type 1's". There were 188,000 (out of 84,540,000) of this highway route classification.
- If the highway route designation occurred at the beginning of the address (such as "ST RT 662"), then it was included in the category "Other". There were 14,000 (out of 1,098,000) of this highway route classification.
- If the highway route designation was not at the beginning and was not preceded by a numeric and contained "RT" somewhere in the address, then it was included in the category "Other RT in address". There were 6,000 (out of 54,000) of the highway route classification.
Post Office Boxes -- The following character strings were used to identify the post office boxes if it occurred at the beginning of the address: BOX, BX, P O B, P O BX, P O BOX, POB, P BOX, P O, PO, PO B, and PO BOX. If the string "DRAWER" or "POUCH" occurred anywhere in the address, it was recoded as post office boxes.
Trailer parks -- Addresses in trailer parks sometimes include the character string "LOT" in the address. These were recoded (if not recoded in above categories).
Buildings -- A recode was created if the address contained the character string "BLDG" anywhere in the address and not if recoded in above categories.
City Type addresses -- These were recoded if the first character was numeric and not recoded in any of the above categories. They are shown in the tables as "Old Type 1's".
Alpha prefix -- In some areas, the house number may be preceded by a nondirectional prefix, such as "G 123 Elm St". For the cases in Michigan, apparently the prefix is the first character of the county name. These were not recoded separately, but were left in the "other" category.
VIA -- In Alaska, there are addresses such as "Red Mountain VIA Manly", where Manly is a legitimate post office name. The character string "VIA" is also used in Puerto Rico, and stands for "road". These were also left in the "other" category.
Other -- This is everything not recoded above. Most of these are street names without a house number. They may or may not also have a box number. Examples might be "Boden Rd" or "Boden Rd Bx 123". Other stuff included are building or business names, apartment names, community or trailer park names, names of group quarters (such as a monastery, hospital, fraternity, etc.) and address types noted above that were not recoded by the above algorithm.
E. Tabulation results
Table 2 shows a tabulation of the sample test file by the summary recode and by mailing state code for AL-WY, PR (Puerto Rico, Virgin Islands, etc.), FR (other Foreign) and the U.S. total.
Of the 104,416 U.S. returns in the test file, 81.3 percent [0.1] were city type, 9.0 percent [0.1] were rural routes and 7.7 percent [0.1] were post office boxes. There were 0.4 percent military, 0.4 percent relative position, 0.1 percent "# mile", 0.2 percent directional prefix, 0.2 percent with a blank address and 1.1 percent were all others.
The usage of post office boxes in the IRS (7.7 percent [0.1]) is more than double the usage in the Census (3.7 percent). There are two influences in this difference. First, in the more rural areas, it may be more convenient to use a post office box or there may be no house delivery. Second, some people use post office boxes for various pieces of mail even though they have regular delivery to the house (with a city-type address).
The relative usage of addresses classified as other is also striking -- 1.0 percent in the IRS vs. 4.7 percent in the census. In D.C., the pattern is reversed, there are 6.1 percent [1.4] "others" in the IRS vs. 0.0 percent in the census. These 6.1 percent include examples of mail deliveries to place of work.
The composition varies dramatically by state. There were nine states with 90 percent or greater city type addresses, and 10 states with 80 to 89 percent. There were 12 states with 70 to 79 percent (under the national average), nine states with 60 to 69 percent, and five states with 50 to 59 percent. There were six states with less than 50 percent.
The rates of rural routes and post office box usage also varies by state. For the six states with less than 50 percent city type, the percent rural route and post office box usage is as follows:
|| 6 [1.4]
|| 2 [0.6]
|| 8 [1.1]
Also, Alaska has 4.3 percent [1.2] military, Michigan has 1.2 percent [0.2] directional prefix, Connecticut has 2.2 percent [0.4] other, The District of Columbia has 5.8 percent [1.4] other, Maine has 6.8 percent [1.1] other, New Hampshire has 8.4 percent [1.2] other, and Vermont has 5.2 percent [1.4] other. Washington has 5.6 percent [0.5] "directional prefix" and Wisconsin has 2.3% [0.3] relative position.
UT has 44.9 percent [2.1] relative position type.
Relatively speaking, the problem areas are a small proportion overall. However, they are concentrated in localities and failure to account for them will preclude development of reliable migration data for these areas.
Using a KEY-4 to probability geographic code cross-tabulation of all tax returns, we were able to put tables 3 and 4 together. These tables show the number of counties and number of places by population size and the percent of each of four address types.
Counties - Less than half the counties had percent city type address in excess of 50 percent (1,305 out of 3,023); 553 had percent rural routes in excess of 50 percent; 318 had percent P.O. box usage in excess of 50 percent; and 10 had percent "others" in excess of 50 percent.
Places - As expected, cities had much larger concentration of city type addresses, especially for larger places. There were 85 percent of all places sized 25,000 to 50,000 with a city type usage rate in excess of 90 percent; and 97 percent of all places sized 50,000 or more had a percent city type in excess of 90 percent. For smaller places, the percent with high concentration of city type deliveries drops off dramatically. These smaller places typically had higher post office box usage (especially), somewhat higher rural route usage and to some extent, other address types.
F. Quality of Addresses
In addition to the variant address types, there are questions about the quality of the address information that will affect the ability to geographically code. First, the address is supplied by the taxpayer. It can contain address parts in any combination, in any order, and can contain bits of different types of addresses (especially prevalent in rural areas). It can contain numerous variations of name spellings and a plethora of nonstandard abbreviations. The address information is handwritten, which is then read by a data entry person and then data keyed. Even carefully written or printed addresses can be easily misread or miskeyed. The script form of "Ct" (for court) can easily be misread as "Cl". Another example is misreading a printed "M" (as in Mill) as a "H". The data entry persons work under strict production schedules, and address keying quality is of less importance than the quality of the other information on the form, such as income and tax amounts (If the address is deliverable, then it is good enough).