ASCII data set information
Deciding which data files to use
Using Microsoft Access ™
Using SAS ™
Using other database programs
Data file directory
File documentation [, 5.32 MB]
Census 2000 Data Products support
A data set consists of a geographic identifier (Geo ID) file and thirty eight data files. The Geo ID file should initially appear last in the data directory and will contain "geo" as part of its' filename. The Geo ID file is not a "header file" as it is linked horizontally with the data files, not placed on top of them vertically. Any data file used must be linked to the Geo ID file (on the unique key field LOGRECNO 1) because the data files do not contain any geographic identifiers. Each data file contains a different set of demographic data tables.
None of the files contain a header record (first record or row with field names). Microsoft Access ™ and SAS ™ templates and instructions are provided to assist in importing the ASCII text files into these programs. The Geo ID file is fixed width with no field delimiters while all thirty eight data files are variable length with comma field delimiters.
The field SUMLEV 2 in the Geo ID file identifies the summary level (area type) of each record. A combination of the geographic identifier codes for each element in the complete summary level description is used to identify the specific area being tabulated. 100% housing unit and population counts are also contained in the Geo ID file.
The same set of tables is repeated (iterated) for each of 336 different population groups in separate data sets. The same geographic file is used in each data set. The field CHARITER contains the code of the population group being tabulated. A complete code list for this field appears in Appendix H of the file documentation. The value of CHARITER in the geographic file is always "001". Always use the value of CHARITER in the linked data file when doing a query.
There is a one to one correspondence based on LOGRECNO between the Geo ID file and each data file for the "All Persons" iteration. This is not necessarily the case for other iterations as areas where a population threshold is not met are not included in the data files for that iteration.
2 Chapter 4 (Summary Level Sequence Chart) of the file documentation contains a code list for SUMLEV (summary level) and a list of available geographic component codes for each summary level or area type (see the Footnote Section of Chapter 7 (Data Dictionary) for a code list for GEOCOMP). Chapter 4 includes separate charts for the state files and the national file.
Chapter 7 includes the geographic ID file's record layout and a complete list of demographic data tables and data items. See Appendix A for definitions of geographic terms. It is recommended that GIS users also see notes on using boundary files.
Which data files within a data set to use
This section involves making a note of the file number(s) within a data set that contain specific tables of interest. Data files are numbered from 01 to 38. See Chapter 3 Subject Locator in the file documentation to identify table numbers of interest. See Chapter 2 How to Use This File Figure 2-2 to identify the data file(s) that contain these tables. See Chapter 7 Data Dictionary Table Matrix Section to see a complete list of data items contained in these tables. The complete table including the title, universe, all headings and data items is shown.
Which data sets (population groups) to use
This section involves making a note of the three digit code for each population group that you are interested in. See Appendix H. Characteristic Iterations from the file documentation for a list of population groups along with their codes. These codes appear in the CHARITER field of the data files.
Data file naming convention
This section involves using the information from the previous two sections to know which filenames you want to download. The data file directory will contain a long list of files named as shown below.
Geographic file (should be last in the list)
st is the state postal abbrieviation, ccc is the characteristic iteration code (CHARITER) and the last two digits is the data file number within the data set.
Example - If you are interested in table PCT01 (this table is in data file 01) for the Total Population and for Hispanic or Latino (of any race) for California and Oregon, you would download the following files from the California and Oregon directories : cageo, ca00101, ca40001, orgeo, or00101, or40001.
If you are interested in the same information for areas within the entire United States you would download usgeo, us00101, us40001 from the National directory. Note that the National file does not contain the exact same set of areas as the state files.
The steps above complete the data file selection process.
File documentation [, 5.32 MB]
Download Summary File 4 template file (Access 2000 format) provided here to get started. Next, open it in Microsoft Access™ and follow these procedures to import the ASCII data and attach the header (field name) information.
Download SAS programs or contact your local state data center (SDC) for an alternate version of the SAS ™ code. SPSS ™ code may also be available from your local SDC. The SAS programs convert the ASCII text data files to SAS data sets. Light modifications such as changing the input and output file names and the directories used to store data may need to be made.
SF4GEO.SAS - Converts the geographic identifier file
SF4xx.SAS - Converts the matching (by number) data file and merges this with SAS data set created by SF4geo.sas. There are thirty eight of these files numbered from SF401.sas to SF438.sas. Go to additional SAS instructions for more information.
This section assumes familiarity with operations in database management programs such as opening a data table and appending records to it as well as setting up a relationship between two data tables based on a common field.