Every year the American Community Survey (ACS) collects data on millions of individuals on a variety of topics, including the industry and occupation in which individuals work. These data are collected in the form of a series of open-ended questions. Clerical coders take these open-ended responses and assign a numeric code for the industry and occupation.
The coding of industry and occupation for the ACS is a massive operation with over 2 million industry and occupation codes assigned every year. To reduce costs, a process was developed to assign industry and occupation codes using the open-ended responses and a logistic regression model. This paper discusses the development of this model and the early results. It is expected that beginning in 2012, 56% of industry codes and 43% of occupation codes will be assigned through this automated coding process for the ACS.