For technical reasons, the data were typically downloaded from the Unisys in two forms, assuming the data were all in:
Excess-3 character set
FIELDATA character set
If the character set of the underlying data is known, recovered files can be converted to usable formats (e.g., ASCII) through known mappings
Challenges
Data in a record might employ multiple character sets
Record layout explaining variables, their lengths, and character set employed may no longer exist or may be incorrect
A file showing the first 100 records using 5 different character set assumptions created when the data were downloaded can help reveal variable length and character set
Existing microdata and/or published data can help in determining variable content
Data files may have other problems such as missing variables or observations, corrupted data, not matching the description in the paper records, etc.
Census Research Data Centers (RDCs) are U.S. Census Bureau facilities, staffed by a Census Bureau employee, which meet all physical and computer security requirements for access to restricted–use data. At RDCs, qualified researchers with approved projects receive restricted access to selected non–public Census Bureau data files.
[PDF] or denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® available free from Adobe.
This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
Source: U.S. Census Bureau | Center for Economic Studies | (301) 763-6460 | E-mail CES |
Last Revised:
October 23, 2012