A Machine Learning Approach for Counting Language Minority Groups in the United States

April 25, 2025

Written by:

Joseph Kang and Adam C. Hall

RRS2025-03

Abstract

The U.S. Voting Rights Act (VRA) prohibits discrimination at the polls based on language minority status. The VRA requires the U.S. Census Bureau to use data on the voting-age population, including the number of citizens, limited English proficient individuals, and those with limited education, to identify those language minorities. In the 2021 cycle of determining which jurisdictions (states, counties, cities) must provide voting materials in languages in addition to English, Census Bureau statisticians developed both frequentist and Bayesian models to estimate the population sizes of language minority groups. In this paper, we present a new machine learning model that outperformed the previous 2021 statistical models for some language minority groups. Our machine learning model was developed in the framework of random forests (RF), which adopted the beta-binomial posterior as the objective function to construct RF trees. This adoption is in the spirit of soft computing because the new RF method relaxed a typical objective function used for the RF to accommodate the unique VRA data structure.