A Large Scale, High Quality U.S. Occupational Database: Results from Merged IRS and ACS Write-Ins

October 30, 2024

Written by:

Victoria L. Bryant, Thomas N. Hertz, Kevin Pierce, Julia Beckhusen, Liana Christin Landivar, Lynda Laughlin, Carl Sanders, David B. Grusky, Michael Hout, Ananda Martin-Caughey, and Javier Miranda

Working Paper Number: SEHSD-WP2024-26

In this paper, we describe and analyze a new dataset consisting of matched ACS and IRS 1040 occupation reports. This dataset allows validation and quality analysis of the IRS’s large Form 1040 occupational write-in database by comparing it with the high-quality ACS write-in and coding process. We analyze the similarity between the two datasets both along the token and semantic dimensions. We find a bimodal distribution of response quality in the token dimension, with over 50 percent of the ACS sample a high-quality token match with its IRS counterpart, but also a significant set of seeming no-matches.