In this paper, we describe and analyze a new dataset consisting of matched ACS and IRS 1040 occupation reports. This dataset allows validation and quality analysis of the IRS’s large Form 1040 occupational write-in database by comparing it with the high-quality ACS write-in and coding process. We analyze the similarity between the two datasets both along the token and semantic dimensions. We find a bimodal distribution of response quality in the token dimension, with over 50 percent of the ACS sample a high-quality token match with its IRS counterpart, but also a significant set of seeming no-matches.