2025 Federal CASIC Workshops

WebEx Event number (if needed): 2828 473 2191

WebEx Event password (if needed): Census#1

Day 1: Tuesday, April 22

10:00 am - 11:00 am, April 22

Welcoming Remarks and Keynote Address

Three new survey modes and their impact on data quality

Frederick Conrad, University of Michigan

11:15 am - 12:45 pm, April 22

Concurrent Sessions

Session 1A: Alternative Data and Record Linkage

WebEx Event number (if needed): 2828 473 2191

WebEx Event password (if needed): Census#1

Blocking Strategies for Optimizing Database Structures

Theodore Charm, U.S. Census Bureau

Researchers commonly use blocking as a strategy to enhance the efficiency of record linkage operations, where blocking restricts comparisons to records for which certain discriminating identifiers agree. Using a dataset with a defined truth deck, this paper identifies and compares several blocking strategies for de-duplication of person records at a nationwide scale and for modern database structures. It includes both deterministic traditional methods and locality sensitive hashing using vector representations of the sample data. This study evaluates the performance metrics for the blocking strategies. Specifically, it computes the number of candidate record pairs and percentage of real duplicate sample pairs placed in the same block across several blocking strategies. It also examines the number of overlapping record pairs between blocking passes and runtime to evaluate the efficiency of the blocking approaches. This study finds that both traditional methods and locality sensitive hashing perform well in creating candidate record pairs that include duplicate pairs at a nationwide scale, and the efficiency of locality sensitive hashing can be enhanced by hyperparameter tuning.

Close Enough: A Python Package for Advanced String Comparisons

Neeka Sewnath, Census EWD Statistics Modernization Branch

The ability to link records from multiple datasets is essential for maintaining data quality and minimizing respondent burden. Record linkage is straightforward when data sources share common keys, but very difficult when keys are absent. Typically, agencies rely on specialized code for comparing the similarity of strings of characters. Federal statistical agencies using Python to do string comparisons often must choose between highly efficient packages with only a few string comparators, less efficient packages with wider sets of methods, and packages that are not well-maintained. Close Enough is a Python package designed for fast and accurate string comparisons. Utilizing Python implementations provided in FEBRL (Freely Extensible Biomedical Record Linkage) and Jellyfish, Close Enough consolidates and improves upon existing code for string comparisons. Close Enough's Python code is compiled using Cython and is written to use multiprocessing when applied to columns of data, enabling faster execution and scalability for large datasets. By leveraging open-source methods, the Close Enough package supports collaboration across agencies. Early testing of Close Enough has demonstrated promising results when benchmarked against FEBRL and Python's TextDistance package. While full evaluations and performance results are forthcoming, preliminary analysis suggests significant time savings in record linkage activities while maintaining highly accurate string matching.

Triple A: AIES Ad hoc Address Matching for Large Companies

Clayton Knappenberger, U.S. Census Bureau

The U.S. Census Bureau recently launched the new Annual Integrated Economic Survey (AIES) which combined seven annual economic surveys into a single streamlined survey with the goal of making responding easier for businesses. A large focus in the first year of AIES was allowing companies to report data via spreadsheet. Given that respondent-provided spreadsheets varied in quality, the Census Bureau experimented with neural network record linkage tools that would allow Census to automatically match a spreadsheet of company-provided locations against the Census Bureau's internal list of locations. This presentation will outline a prototype version of this location matching tool that takes a company-provided list of locations, validates, and fixes common errors using publicly available tools like the Census Geocoder, and then uses Siamese Long Short-Term Memory (LSTM) neural networks to match the locations against an internal list of Census locations for that company. We will also discuss some of the practical challenges in adopting this tool more widely in AIES data collection and review and next steps for future research.

Augmenting traditional surveillance methods with Social Media data

Jonah Bregstone, RTI

Social media platforms provide valuable data for survey research, as traditional survey methods become increasingly costly to produce representative results. This presentation explores how social media listening can augment traditional surveillance methods, focusing on applications within FDA's food-safety monitoring programs. Using industry-standard marketing analytics platforms, our methodology encompassed comprehensive social media data collection, coding, and sentiment analysis. By repurposing these commercial tools typically used by food manufacturers, we developed an effective framework to aid the FDA's food-safety surveillance and industry monitoring. To facilitate the FDA's goals, an interactive dashboard was developed for FDA officials, integrating real-time social media metrics with traditional surveillance data.

Session 1B: Innovations in Survey Design

WebEx Event number (if needed): 2825 705 1214

WebEx Event password (if needed): Census#1

Leveraging images to improve School Pulse Panel data quality: Lessons learned from cognitive testing

Caitlyn Keeve, U.S. Census Bureau

Using images to assist respondents in answering questions is one method of questionnaire design that may help respondents answer topic-specific questions. The School Pulse Panel (SPP), conducted by the U.S. Census Bureau for the National Center for Education Statistics, is an online self-response questionnaire that surveys public K-12 schools on high-priority, education-related topics once a month during the school year. The questions can use highly technical terminology, which may lead to respondent difficulties including inaccurate answers. In fall 2024, we used cognitive testing to assess the use of images with item descriptions for school transportation questions. Participants were asked to think-aloud while answering select-all that apply questions about traffic measures (e.g., speed safety cameras, raised intersection crossings) and bike infrastructure (e.g., sharrows, buffered lanes) surrounding their school. During the think-aloud, interviewers noted usability issues (e.g., image size and clarity, length of descriptions, items per page) and/or difficulty with the questions that appeared to affect the survey experience. After completing the survey, participants were asked whether they found the images and item descriptions helpful and how they would have responded if the images had not been provided. This presentation documents the methodology, design, and qualitative results of using images and item descriptions in topic-specific questions. We also highlight differences among participants' reactions to the images and review strategies that may enhance usability, efficiency, and response rates in fielding questions with images on the SPP survey. To conclude, we briefly discuss what other online self-response surveys should consider when deciding to use images.

Endorsement of Select All versus Forced Choice Response Options in Behavioral, Factual, and Attitudinal Questions in a Web Survey

Robin Kaplan, U. S. Bureau of Labor Statistics

A common practice when asking a battery of items in web surveys is to use a select all that apply format. This format asks survey respondents to endorse individual items within a single question, which reduces the amount of space or questions within a survey and decreases respondent burden. While the select all that apply format is popular for its efficiency, an unintended consequence is the potential for reduced data quality. The alternative is a forced choice format, where respondents are asked to consider and respond yes or no to each item individually, which has been shown to result in greater item-level endorsement. One hypothesis is that the forced choice format results in greater item-level endorsement because respondents engage in deeper cognitive processing. However, others have suggested that the greater item-level endorsement may be due to acquiescence bias. Another explanation is that question type may moderate these effects, where the amount of endorsement may depend on whether the question asks about behaviors, facts, or attitudes. The present research randomly assigned 1,008 web-survey participants to answer a battery of questions using either a select all versus forced choice format across different question types (behavioral, factual, or attitudinal). Response distributions to the questions based on format and question type will be presented. We discuss findings in the context of adding web-modes to existing surveys.

Online question design for fixed-length data

Elizabeth Nichols, U.S. Census Bureau

Dates, time, and telephone numbers are examples of fixed-length data. For each of these items, the number of characters does not vary. Most fixed-length data are chunked visually into parts to help people remember the information. For example, a10-digit telephone number has three digits for area code, three for the prefix, and four for the line number. A 16-digit credit card number is often presented visually as four separate chunks of four digits each. For online input of fixed-length data, there are several design options: an open field with no formatting; separate fields - one for each chunk; and fields with masking. Masking provides a visual cue within the field indicating the expected value, such as a mask of (_ _ _) _ _ _-_ _ _ _ for a telephone number. The visual cues could appear when the user enters the field, as the user starts typing, or even after the user finishes typing. Staff at the U.S. Census Bureau conducted a series of A/B experiments with a nonprobability panel to test the usability of four different fixed-length field designs. In the experiments, the data were provided to the participant, and they entered the data into the assigned design. To compare the usability of the designs, we measured accuracy, efficiency, and satisfaction. The masking design was the most usable of the designs tested as measured by these three criteria. This presentation shares our online form design findings and recommendation for fixed-length data.

Comparing Web-Scraped Establishment Survey Frames of Industrial Hemp Growers in Seven States: Costs, Contact Data, and Accuracy of Frame

Michael Gerling, USDA-NASS; Chad Garber, USDA-NASS

The United States Department of Agriculture's National Agricultural Statistics Service (NASS) conducts over 300 agricultural surveys annually to provide official statistics on U.S. agriculture. In 2020, the Agricultural Marketing Service (AMS) collaborated with NASS to conduct the first national survey of industrial hemp growers. The first step was to develop a list frame of these operations for the proposed 2021 Hemp Survey. NASS developed three frames: (1) a list frame derived from administrative data sources provided by AMS; (2) a web-scraped list frame developed by a contractor using automated processes; and (3) a web-scraped list frame constructed within NASS. Both web-scraped list frames were designed to assess the undercoverage of the administrative data sources list frame. This presentation highlights comparisons among the three frames, including the number of records scraped, completeness of the information, and the overlap between survey list frames. Implications for the use of web-scraped list frames are also discussed. Key Words: list frame, web scrapping, list building, agriculture, industrial hemp.

2:00 pm - 3:30 pm, April 22

Concurrent Sessions

Session 2A: Advances in Computer-Assisted Coding

WebEx Event number (if needed): 2828 473 2191

WebEx Event password (if needed): Census#1

Top N: A Tool to Simplify Coding of Open-Ended Text Survey Responses

Curtiss Chapman, U.S. Census Bureau

While closed-ended survey responses can be analyzed in relatively quick and direct ways, analysis of open-ended responses can require more time and technical knowledge. Depending on the size of the survey, open-ended items may include from hundreds to millions of unique responses, all of which require time-consuming manual labeling to summarize. Beyond time requirements, coding open-ended responses efficiently can require advanced technical skills. Modern natural language processing (NLP) techniques can improve the speed and quality of analysis, but not all staff have the knowledge necessary to implement these techniques. Census Bureau staff created the Top N Tool to address both of these issues. It significantly reduces the time required to manually label open-ended survey responses while also implementing NLP tools in a way that is accessible to most any user. The tool identifies the most common words among sets of text responses and adds columns to the dataset for each one. Responses with words in common tend to share themes and may thus share labels in common. When manually labeling, one may sort responses by the presence of each common word, which groups responses with common themes together. The labeling process is then dramatically expedited, as large numbers of responses may be given the same label simultaneously. To employ the tool, analysts need only know how to sort a spreadsheet. The Top N Tool, therefore, allows survey owners the latitude to include valuable open-ended response items on their surveys with less worry about incurring a lengthy and difficult-to-staff analysis phase. This presentation will demonstrate the tool and provide some example case studies.

Enhancing the Industry and Occupation Autocoding Process in the American Community Survey through LLMs and Semantic Search

Jackson Chen, Reveal Global Consulting; Yezzi Angi Lee, Reveal Global Consulting

The American Community Survey (ACS), conducted by the U.S. Census Bureau, collects comprehensive demographic, social, housing, and economic data, providing critical insights for policy making, resource allocation, and community planning. One key component of the ACS is the collection of industry and occupation information through open-ended questions, which are assigned the best Census Industry and Occupation codes using a combination of autocoding and clerical coding. The automated coding system is designed to enhance efficiency and accuracy, supplementing traditional manual coding methods and reducing the burden on clerical coders. This research explores the development of an autocoding process leveraging Large Language Models (LLMs) and semantic search techniques. Specifically, we implemented an improved semantic search engine powered by LLM embeddings to enhance coding precision. To ensure data quality and maintain alignment with clerical standards, we designed a robust Quality Control (QC) framework to validate current autocoder outputs, establish baseline metrics, and identify discrepancies. This presentation will detail the methodology, experimental results, challenges, and next steps for improvement of the ACS Autocoder. Additionally, we will highlight how the QC framework supports error detection and informed decision-making, contributing to the continuous improvement of the autocoder's performance.

Automating Offense Text Standardization for Criminal Justice Data

Anthony Berghammer, RTI International; Peter Baumgartner, RTI International

The Bureau of Justice Statistics' National Pretrial Reporting Program (NPRP) faces significant challenges in standardizing free-text offense descriptions, with over 100,000 unique text combinations reported across 125 counties. Manual coding of this data into 78 standardized categories from the National Corrections Reporting Program (NCRP) is time-consuming and resource-intensive. To address this, RTI developed the Offense Text Auto Coder (OTAC), a machine learning algorithm designed to automate this process. OTAC assigns probabilities to each NCRP code for a given input text, recommending the most likely matches for human validation. The tool achieves an 84% accuracy rate (0.83 MCC) across 78 categories, with a top-5 accuracy exceeding 89%. By automating much of the standardization process, OTAC reduces the volume of texts requiring full human review to just 10%, significantly decreasing the time and cost associated with manual coding. This presentation will explore the development, performance, and operational benefits of OTAC, highlighting its potential to streamline data processing for large-scale federal justice programs.

So you want to share your autocoder model, but what about disclosure risk? Real-world applications of differentially private machine learning

Alexander J.Preiss, RTI International

Federal statistical agencies are increasingly using machine learning "autocoder" models to classify text obtained from survey respondents into standardized categories. Partner agencies and other stakeholders often request access to autocoder models to perform classifications on their own data. However, machine learning models can leak information about the individual data records on which they were trained, which introduces disclosure risk. Differentially private (DP) machine learning algorithms seek to solve this problem by providing strong guarantees for protecting the privacy of individuals whose data is used to train models. However, industry standard DP algorithms, like differentially private stochastic gradient descent (DP-SGD), often perform poorly in real-world applications with high class imbalance, thus limiting their adoption. In this work, we propose a new scalable DP mechanism for deep learning models, SWAG-PPM (stochastic weight averaging gaussian pseudo posterior mechanism), which downweights by-record likelihood contributions proportionally to their disclosure risks. As a motivating example, we test SWAG-PPM on an autocoder model developed at the US Bureau of Labor Statistics to classify illness and injury narratives. We show that SWAG-PPM performs nearly as well as a non-private baseline model, while greatly outperforming DP-SGD. The combination of scalability, utility, and a strong privacy guarantee makes SWAG-PPM feasible in real-world settings.

Session 2B: Data Quality

WebEx Event number (if needed): 2825 705 1214

WebEx Event password (if needed): Census#1

Evaluating impacts of mischievous reporting in customer satisfaction survey results

Jean E. Fox, U.S. Bureau of Labor Statistics; Andrew Caporaso, U.S. Bureau of Labor Statistics

Mischievous reporting can occur when respondents lack motivation and oversight to provide good data, and instead provide random or intentionally inaccurate responses. It differs from satisficing, where respondents aim to select reasonable responses expediently. Mischievous reporting can affect all survey questions, but their presence may be most evident in open-ended questions, with responses that are nonsensical, gibberish, unrelated to the survey, and/or threatening. Retaining data from such respondents can jeopardize research validity. Prior research suggests mischievous and "bogus" survey respondents answer questions differently than quality respondents (Litman et al, 2023; Kennedy et al, 2020; Cimpian & Timmer, 2020), so including such responses may introduce bias. This study looked at the impact of mischievous respondents in a customer satisfaction (CSAT) survey with limited information and control over who participates. This analysis uses data from a BLS CSAT survey collected in 2024. Visitors participated in the survey in one of two ways: (1) They accepted an intercept invitation; (2) They selected the "Help us improve this site" button. We surmise that these two populations differ in their motivations and in the proportion of mischievous respondents. We did not offer an incentive to participate, so this common motivator for bogus participation was absent. We categorized open-ended responses based on how serious each response was, then evaluated differences in responses to the other questions. This presentation will discuss the effects of retaining data from mischievous reporters on key survey outcomes and discuss methods for identifying such responses in similar web surveys with limited paradata and background information on respondents.

We Asked Humans to Rate Their Trust in Autonomous Systems, and AI Answered

Kathryn Ballard, NASA Langley Research Center

Concerns about data quality in online surveys has grown in recent years and many methods have been developed to identify bots or careless responders. This presentation discusses a recent crowdsourced survey where participants were paid to provide their experiences and rate their trust in several autonomous systems, from driving with GPS and cruise control to using AI tools. Ironically, many responses appeared to be AI-generated content. Even free-response data that seemed innocuous at first became a red flag due to duplicated word choice and sentence structure across many participants. Lessons learned from this study include changes to the survey screening questions and cross-referencing the output from popular chatbots during analysis.

Dr. Strangesample or: How I Learned to Stop Worrying and Love Alias Email Addresses

Alfred "Dave" Tuttle, U.S. Census Bureau

Email has become a common means of delivering invitations to complete surveys, reflecting the importance of internet-based activities for individuals and institutions alike. At the same time, users of the internet must be increasingly vigilant against phishing scams and dubious marketing spam that target individuals through their email accounts. Some email platforms, such as Outlook, Gmail, iCloud, and Mozilla Thunderbird, enable their customers to create and use alias emails to obscure their real addresses. Users of these services can use an alias address whenever they are asked to provide an email address by a third party, as when creating an account for a new service or being asked to provide contact information by a survey organization. The email platforms relay messages between their users' email accounts and third parties and the latter see only the alias email addresses. What are the implications of alias email addresses for surveyors who rely on email to deliver survey invitations and reminders? One concern is that survey respondents who provide alias addresses may respond to surveys differently from other respondents, such as a greater likelihood of skipping questions they may find sensitive. Another concern is whether email platforms providing alias addresses deliver invitations reliably. In this presentation, we will share results from a web survey sent to members of a voluntary online panel which included a subset of respondents with alias email addresses. We will investigate whether respondents whose email addresses follow known alias formats differ from other respondents in terms of unit- and item-level response rates, demographic characteristics, and email failure/bounceback rates.

Use of MTurk samples in the development and validation of a U.S. Census personnel selection tool

J. Peter Leeds, U.S. Census Bureau

This presentation documents research by the U.S. Census Bureau to develop and validate a new occupational questionnaire for selecting Census taker supervisors. The researchers employed a multi-stage design to rapidly gather a large, representative sample of 2,199 respondents from Amazon Mechanical Turk (MTurk). Innovative analytical techniques, including Differential Item Functioning (DIF), Differential Prediction (DIP), and Measurement Invariance analyses, were used to identify and mitigate bias in the assessment items, ensuring fairness across protected class groups. The study's key innovations include: 1. Demonstrating the feasibility and efficiency of using MTurk for large-scale research while addressing limitations associated with participant sampling and response consistency. 2. Developing a 5-item scale that showed very low bias, strong correlation to supervisor performance, and measurement invariance across protected class groups. 3. Applying comprehensive bias assessment and fairness analyses to create a valid assessment tool for high-stakes government hiring. 4. Providing insights on the relationship between self-reported supervisory experience and item performance, informing the development of fair and effective assessment tools. These findings inform the Census Bureau's efforts to enhance hiring practices in its supervisory workforce, serving as a model for other government agencies seeking to improve their hiring practices.

3:45 pm - 5:00 pm, April 22

Concurrent Sessions

Session 3A: Data Science and Dissemination

WebEx Event number (if needed): 2828 473 2191

WebEx Event password (if needed): Census#1

Mitigating Health Administrative Data Sparsity Using Machine Learning for Enhanced Survey Collection and Processing

Irina Belyaeva, Ph.D., U.S Census Bureau

Health administrative data, a cornerstone of surveys on healthcare utilization and outcomes, often suffer from sparsity due to missing or incomplete records. Such data gaps can compromise the reliability and representativeness of survey findings. This study investigates the application of machine learning (ML) techniques to mitigate data sparsity, enabling more robust and accurate survey outputs. We evaluate advanced ML-based imputation methods, including k-nearest neighbors (KNN), random forests, and deep learning models, to reconstruct missing data in health administrative datasets commonly used in surveys. Our findings reveal that deep learning-based imputation significantly outperforms traditional methods in managing large-scale datasets with complex, nonrandom missingness patterns. Additionally, incorporating interpretability frameworks ensures transparency and fosters trust in the imputation process, aligning with operational standards for survey data management. By mitigating data sparsity, ML-driven approaches enhance overall data completeness, leading to more accurate population health metrics, policy development, and efficient resource allocation. Integrating automated imputation within survey workflows further reduces respondent burden and strengthens survey quality and reliability. This study not only demonstrates the pivotal role of ML in modernizing survey methodologies, but also offers evidence-based strategies to tackle incomplete health administrative data. Embedding these advanced solutions into survey pipelines offers a scalable way to bolster survey integrity and advance data-driven decision-making across public health and related fields.

Tracking Cross-State Achievement Gap Trajectory in NAEP Assessments with Dynamic Time Warping Methods

Qiwei Britt He, Georgetown University

Educational assessment plays a critical role in understanding trends in student performance and informing the dissemination of key federal survey data. The National Assessment of Educational Progress (NAEP) provides a rich dataset for examining longitudinal trends in student achievement across states. This study applies Dynamic Time Warping (DTW) to track changes in NAEP assessment scores by demographic variables over the past two decades in Grade 4 and Grade 8 mathematics and reading assessments across 50 states. DTW, a sequence mining method, identifies nuanced temporal patterns in performance trends that may not be apparent with traditional analytic techniques. This presentation will illustrate how DTW can be leveraged for data dissemination and visualization, offering new ways to communicate complex longitudinal trends in federal survey data. Attendees will gain insights into how advanced computational methods can enhance the interpretation and usability of NAEP data, particularly in supporting researchers and policymakers in their efforts to understand achievement gaps.

Accessing Census Data: Tools and Updates

Tyson Weister, U.S. Census Bureau

The Census Bureau's primary platform for data dissemination, data.census.gov, provides free and reliable access to a wide array of demographic, social, and economic data from across the country. These data serve as a key benchmark for understanding community demographics and helps support initiatives designed to reach various populations using detailed collection methods. Data.census.gov allows users to explore census data in multiple formats, such as tables, maps, geographic profiles, population pyramids, and charts. This session will highlight the platform's features and functionalities, focusing on the wide range of data available and the tools developed to make data more accessible. Demonstrations will showcase key enhancements, including navigation, searching, downloading, exporting, and mapping, illustrating how the Census Bureau is continually improving data dissemination. These updates, informed by user feedback, reflect our commitment to making Census data more accessible and user-friendly.

Session 3B: Extended Reality and Survey Design

WebEx Event number (if needed): 2825 705 1214

WebEx Event password (if needed): Census#1

Adapting Survey Software for Remote Extended Reality Research

Paul Merritt, National Institute of Standards and Technology

Extended reality (XR) technologies allow researchers to create survey experiences with complex, interactive stimuli. However, these studies require bringing participants into the laboratory to complete tasks in person where survey questions are administered as separate tasks. We present our work integrating software applications and traditional survey tools to create remote XR surveys and discuss how our methods improve response formats and data quality in software-application based surveys. We present an example from our research where we nest a WebXR application, built primarily in A-Frame, into a Qualtrics survey and use Qualtrics to capture user behavior within the application. We integrate our web application into the survey itself, expanding the types of stimuli that may be examined in surveys and improving the formats available to capture participant responses and precise user metrics. Compared to traditional survey and laboratory methods, our approach reduces reliance on convenience sampling and permits frequent data submissions without lag or breakdowns. We improve survey results by collecting context-relevant data, including environmental variables and device performance indicators. We define our user metrics in Qualtrics and store all study data, from the web application and Qualtrics, in a single dataset. Our integrated approach yields a nuanced understanding of participant responses and may be extended to other applications beyond those in XR. With our method, researchers dependent on a particular software application can conduct their surveys inside or outside of the laboratory, use online platforms to pre-test and refine study designs, reduce time between design iterations and prototyping, and more easily meet their sampling requirements.

Advancing Survey Sampling Methods in Extended Reality Research

Monika Bochert, National Institute of Standards and Technology; Jack Wagner, National Institute of Standards and Technology

Extended reality (XR) technologies in survey research provide novel opportunities to design engaging and immersive environments and support the study of how humans respond to a broader range of stimuli. XR research is often limited to specialized lab environments, constraining participant samples. The shift to remote XR platforms creates new opportunities to recruit more representative samples. However, achieving representative sampling still requires addressing variations in human perception, particularly vision. Without deliberate sampling design, survey research risks systematically missing the intended study population and undermining data quality. We present how we use traditional online survey approaches to examine experimental research questions with improved sampling methods. We discuss how experimental researchers can adapt online surveys to broaden participation, as well as pre-test the influence of individual differences by examining design features including color contrast and visual organization. We will share lessons learned, examples of design solutions, and how these efforts drive progress in survey sampling, data quality, and minimizing sampling biases in federal survey research.

Digital Accessibility Standards for Extended Reality Survey Design

Kaylee Ives, National Institute of Standards and Technology

The use of extended reality (XR) in survey research has accelerated in recent years, advancing science in many fields including medicine, manufacturing, and education. However, there are few design standards to guide the design of XR-based studies. While pioneering, the World Wide Web Consortium XR accessibility user requirements have a greater focus on motor and mobility accessibility than visual. Researchers have a unique opportunity to examine accessible design of software application-based tasks by creating remote studies and deploying them over survey platforms. We present our work extending existing web design accessibility standards to our own survey-based XR experiment. We discuss how we apply digital accessibility standards and best practices from the Web Content Accessibility Guidelines (WCAG), the World Wide Web Consortium (W3C), Digital.gov, and a selection of research on perceptual accessibility to our interactive survey interface. We share how these same methods may assist all computer-based survey designs and highlight a selection of helpful tools. We explain how we prioritized accessible design choices including, contrast, object features and palettes, textures, and information preservation. With these design choices, we aim to reduce confounding perceptual variables and establish a framework for accessible software application-based survey design.

Day 2: Wednesday, April 23

10:00 am - 11:00 am, April 23

Plenary Session

WebEx Event number (if needed): 2824 630 8639

WebEx Event password (if needed): Census#2

They May Be Large, But Should They Be In Charge? The Pros and Precarity of Using Large Language Models in Survey Research

Trent Buskirk, Old Dominion University

11:15 am - 12:45 pm, April 23

Concurrent Sessions

Session 4A: Record Linkage with Justice Data to Enhance Crime Statistics, Improve Coverage in the Decennial Census, and Create New Statistical Products

WebEx Event number (if needed): 2824 630 8639

WebEx Event password (if needed): Census#2

Techniques for linking agencies and facilities across frames

Clare Speer, U.S. Census Bureau; Zhi Keng He, U.S. Census Bureau

This presentation discusses methodology for linking law enforcement and criminal justice agencies and facilities across frames and other data sources. Facility and agency frames from the Bureau of Justice Statistics and the Office of Juvenile Justice and Delinquency Prevention were matched to records from the Census Bureau's Governments Master Address File and the Master Address File. Subject matter experts validated select matches based on triage categories related to measures of match quality. A web application was developed to increase validation efficiency, algorithmically identify records that could best benefit from review, and ultimately support continuous improvement and integration of new data sources.

Identifying geographic boundaries for law enforcement agencies

Lizabeth Remrey, U.S. Bureau of Justice Statistics; Kayla Patti, U.S. Census Bureau

This presentation discusses a Bureau of Justice Statistics project to identify geographic boundaries for law enforcement agencies and use those boundaries to identify jurisdictional and population characteristics for the geographies. Probabilistic record linkage was used to link Law Enforcement Agency Roster (LEAR) data with the Census Bureau's Governments Master Address File (GMAF). Links were carefully validated, then GMAF geographic data were used to link LEAR records to TIGER/Line shapefile boundary data. Applications of the geographic and population data will be discussed, and as well as how these tools will increase accessibility of crime data.

Coverage of Justice-Facility Group Quarters in the 2020 Census

Michaellyn Garcia, U.S. Census Bureau

This presentation summarizes research measuring the coverage of prisons and detention facilities in the 2020 Census. Justice-facility frames published by Department of Justice (DOJ) statistical agencies were linked with Group Quarters records from the Census Bureau's Master Address File. The integrated frame was used to assess the coverage and enumeration of facilities in the 2020 Census. The coverage exercise confirms relatively high coverage of populations in prisons and detention facilities, while identifying some coverage gaps between the Department of Justice and the Census Bureau's data frames that could be improved by continual administrative updates over time. Strategies for evergreening integrated frame maintenance are discussed.

CJARS Justice Outcomes Explorer

Keith Finlay, U.S. Census Bureau

The Justice Outcomes Explorer (JOE) is a Census Bureau experimental data product that measures the economic and health outcomes of people who were charged with criminal offenses, released from prison, or began probation or parole sentences. The outcomes measured include employment, earnings, government program participation, and mortality. JOE is a collaboration between the U.S. Census Bureau and University of Michigan that uses Criminal Justice Administrative Records System (CJARS) to better understand how people involved in the justice system reintegrate into society.

Session 4B: Advances in Data Collection

WebEx Event number (if needed): 2825 971 4905

WebEx Event password (if needed): Census#2

Redesigning the Census Bureau's Contact History Instrument to Improve Survey Paradata Collection

Matthew Virgile; U.S. Census Bureau

Since 2004, the U.S. Census Bureau's Contact History Instrument (CHI) has been used for Computer-Assisted Personal Interview (CAPI) surveys. In the current CHI design, interviewers record contact attempt information in CHI after they attempt to contact potential respondents. Similar information, such as whether the attempt was a personal visit or telephone call, is also recorded in the CAPI survey instrument before CHI. However, the two instruments do not interact. Census Bureau staff are refining and testing a future version of CHI to integrate with CAPI survey instruments, so that data collection of contact attempts would no longer require a separate instrument. Anticipated benefits of this redesign include more accurate case history information, and less time required by interviewers to enter the information. CHI is one of many data collection systems which are being redesigned as part of a broader program at the Census Bureau, known as the Data Ingest and Collection for the Enterprise (DICE) program, to simplify and modernize data collection. Our presentation will cover design for the integrated instrument, including how case history information will be collected for CAPI contact attempts in the Current Population Survey (CPS), followed by other Census surveys. We will also cover challenges to date and how these have been addressed, including: (i) development of standardized screens that are applicable to many CAPI surveys, but also permit flexibility so they can be tailored to the unique needs of a specific survey; (ii) accounting for collection of case history information at varying levels of sample units such as a household, person, facility, or neighborhood observations; and (iii) design of case management functionality beyond contact attempts.

Understanding the Motivations for Business Participation in Voluntary Web-Based Business Surveys

Jason Kosakow, Federal Reserve Bank of Richmond

Several regional banks within the Federal Reserve system maintain online panels among businesses to better understand current and emerging business trends. Results from these surveys provide critical information that are used in monetary policy discussions. However, regional banks have experienced low recruitment response rates and panel attrition. Regional banks are not able to offer monetary incentives to participate, so each regional bank dedicates significant time in crafting language to recruit new businesses and creates programs to retain participants in the panel. To better understand how regional banks could refine their panel management strategies, we explored the motivations of existing business panel members on why they participate. We added an open-ended question on why the business participates, along with a closed-ended question assessing the importance of various factors on their participation. These questions were added to several regional business surveys across the Federal Reserve System, yielding almost 1,000 responses. This research uncovered multiple reasons for participation in the online business survey panels. The benefit of receiving the results from the survey was cited frequently, as the results allowed the respondent to reflect on how their business was faring compared to others. Businesses also wanted to be helpful in giving the Federal Reserve accurate information about economic conditions, and often cited the survey as an opportunity to be heard. Research on response behavior among businesses has scarcely been explored, and this research offers insights that can help survey methodologists and panel managers in crafting targeted messaging and participatory benefits to attract and retain businesses for web-based surveys.

AI-Assisted Probing Can Improve Survey Data Collection

Soubhik Barari, NORC at the University of Chicago

We describe how generative artificial intelligence (AI) can allow survey researchers to harness the benefits of conversational interviewing in self-administered data collection contexts. In particular, we show how the integration of large language models (LLMs) into web surveys to perform active probing can enhance data quality in several ways: improve response quality (e.g., completeness, relevance), bolster construct validity (e.g., via personalization and clarification), and extract additional qualitative data (e.g., motives, details) relevant to research objectives. We illustrate best practices from NORC studies evaluating the impacts of AI-assisted probing on data quality metrics across a variety of question types. Finally, we provide a series of recommendations for practitioners seeking to implement AI-assisted probing in federal survey data collection efforts.

Generative AI for Surveys: Prioritizing User and Respondent Experience

Elizabeth Dean, NORC at the University of Chicago

Generative artificial intelligence (AI) can be used to improve survey response quality, personalize the respondent experience, analyze qualitative responses, and identify falsified data, among many other applications. AI tools such as these can significantly change the experience of designing surveys and responding to surveys. In this presentation we center the user experience of AI applications. Using recent NORC examples, we show how respondents perceive interviews with conversational agents and consider the researcher's experience using emergent AI based survey design and analysis tools. Our goal in this presentation is to establish best practices for prioritizing user and respondent experience in the design and implementation of generative AI tools in survey data collection.

2:00 pm - 3:30 pm, April 23

Session 5A: Advancing Survey Data Management: Tools for Enhancing Accuracy, Efficiency, and Automation

WebEx Event number (if needed): 2824 630 8639

WebEx Event password (if needed): Census#2

Supporting Flexible Computations: Integrated Formula Analysis and Calculation Tool

David Rozenshtein, PhD, Omnicom Consulting Group, Inc.

As part of an on-going central statistical systems modernization project, we have developed an integrated formula analysis and calculation tool (IFACT) to support the needs of calculating detailed accounts. IFACT allows analysts to specify computation formulas in an intuitive, MS Excel-like notation, and then evaluates them over the data in the database. IFACT supports: full arithmetic (+, -, *, and /); binary comparators (=, !=, <, etc.); logical conditions (using NOT, AND, and OR) that use 3-valued logic to properly account for missing values (NULLs); aggregate and scalar functions; multiple formula preferences; ability to conditionally reconfigure formulas; etc. The IFACT engine translates sets of formulas into labeled directed acyclic graph structures, loads them into the database tables, captures formula interdependencies (in order to properly sequence the computation), and then acts as an interpreter over these formulas, calculating results. Because formulas are not part of the system source code, analysts can modify them, and thus system behavior, without any reprogramming. Importantly, IFACT supports full auditability over its computations. The tool is written entirely in SQL except for a small component that translates the IFACT formula files into an XML representation.

Enhancing Survey Data Quality: Integrated Validation, Auto-Edit, and Search Tool

Alice Ramey, U.S. Bureau of Economic Analysis

As part of an on-going central statistical systems modernization project, we have developed an integrated validation, auto-edit, and search tool (IVEST) to support processing of federal surveys. The IVEST system allows analysts to specify a variety of criteria for searching through survey data. These criteria are then used to validate, correct, and/or enhance the data when certain errors are discovered or specific conditions are met. The system also supports a sophisticated multi-level approval structure for overriding rule violations when necessary. The rule language supported by IVEST has a natural, user-friendly syntax, yet is expressive enough to allow for any conditions normally expressible within SQL. At its essence IVEST is a code generator, itself implemented in SQL, that translates IVEST rules into efficient SQL queries. Because IVEST rules are not hard-coded into the system source code, analysts can modify them, and thus system behavior, without changes to underlying programs.

Supporting Data Non-Disclosure: Secondary Suppression Analysis, Suggestion, and Audit Tool

Sandip Mehta, Omnicom Consulting Group, Inc.; Melanie Carrales, U.S. Bureau of Economic Analysis

Secondary suppression is used in supporting non-disclosure of data cells in a multi-dimensional table space, a notoriously difficult problem. As part of an on-going central statistical systems modernization project, we have developed a suite of tools to support secondary suppression. The three most significant tools of this suite are: a tool for analyzing a current state of suppression, including reporting on "broken" cells; a tool for choosing candidates (based on a variety of criteria) for additional suppression necessary to protect currently suppressed cells; and a suppression audit tool for showing why certain suppressions were chosen by the system and the various dependencies that exist among suppressed cells. Our suppression tools are both periodicity-aware and history-aware, i.e., suppressions are coordinated between annual and quarterly data, as well as with prior vintages/revisions. They also allow for analyst overrides (both positive and negative) of system-selected suppressions and support an iterative collaborative process between the analysts and the system in establishing the final secondary suppression pattern.

Analyzing and Automating Processing Workflows

Benjamin Kavanaugh, U.S. Bureau of Economic Analysis

By the nature of their business, some of the systems we have built as part of an on-going central statistical systems modernization project have very complex computation processes that involve many thousands of distinct interdependent tasks, authored by multiple groups of analysts and users who normally are not in synch with each other, at least during the early stages of processing. Figuring out the correct sequence of execution of these tasks, and the overall synchronization state of the system, is an activity not well suited to manual control. We have developed a system that takes in information about computation tasks and their interdependencies, builds the dependency graph, and then automatically manages the overall computation process (including incrementally recalculating only the necessary tasks) and reports on the system synchronization state. It collects performance statistics of executions and provides time estimates on pending tasks.

Aggregating time-series data in multiple dimensions, which themselves change over time

Benjamin Cowan, U.S. Bureau of Economic Analysis

Mathematical aggregations are a common form of computation encountered in survey data processing and analysis systems. As part of an on-going central statistical systems modernization project, we have developed a variety of metadata-driven aggregators that operate on multiple dimensions. These aggregators dynamically adapt to: the dimensions that are present; the taxonomies that exist and their membership and inter-element relationships; how these taxonomies change over time; which taxonomies are used for which dimensions; which aggregation steps involve which subsets of dimension; the dependencies among the aggregation steps; etc. All specifications are represented in metadata authored and controlled by the analysts outside of the system source code. This allows analysts to change the behavior of the system without changing its programming. The aggregators synchronize multidimensional aggregations of time series data with evolving taxonomical structures. They support a completeness check to ensure that all children of a given aggregate have values, as well as direct specification of values for aggregates that fail to compute naturally. The aggregators also have mechanisms to control computational explosions that are common in multi-dimensional situations. The aggregation engines are written in SQL and are essentially applications of several breadth-first graph processing algorithms - some based on level-by-level graph rolls, and others based on pre-computed partial and full closures.

3:45 pm - 5:15 pm, April 23

Session 6A: Advances in Data Processing

WebEx Event number (if needed): 2824 630 8639

WebEx Event password (if needed): Census#2

Automated Scraping of PDF School Transcripts for Surveys

Emily Hadley, RTI International

We present TranscriptGenie, a prototype application developed to address the need for efficient and accurate data extraction from PDF school transcripts for large surveys. Secondary and postsecondary transcript data are crucial for understanding student educational journeys and outcomes. Yet extracting meaningful data from PDF school transcripts has long been a labor-intensive process that is often fraught with challenges due to variability in transcript formats, embedded tables, and diverse data structures. TranscriptGenie seeks to overcome these challenges through a combination of innovative techniques to enable automation of transcript data extraction. In this session, we will provide a comprehensive overview of the tool's development process by highlighting the requirements that drove its design and the novel solutions that underpin its capabilities. This includes integrating generative AI technology to handle text variations and leveraging natural language processing techniques for data annotation. We will discuss how this tool is designed to comply with security standards required for handling sensitive education data. We will provide insights into how TranscriptGenie employs a graph database to efficiently manage and query the extracted data and how this graph database improves the data validation process. Finally, we will discuss next steps needed for deployment and broader implications for transcript analysis in surveys.

Dynamic Schema Modeling of Heterogeneous Data Structures

Christopher N. Carrino, U.S. Census Bureau

Survey data is provided in various formats and structures across different survey programs. The formats range from two dimensional structures like CSV files, SAS files and relational database tables, to multi-dimensional structures serialized in XML, JSON, Avro, and RDF/OWL formats. It is a challenge for a statistical agency to validate the disparate formats in a consistent and standardized way. This research proposes a platform agnostic schema model that allows for standardized validation regardless of the underlying format. The modeling method proposed here can be used to validate the data inputs and outputs of survey programs, and offers significant resource and processing efficiency. Additionally, this modeling method allows for extension to other data science applications including data set summarization and data simulation.

Automating Codebase Translation from SAS to Python with LLMs

Cameron Milne, Reveal Global Consulting

The U.S. Census Bureau's Nurses Survey (NSSRN) is conducting a codebase migration from SAS - a proprietary software language widely-used by statistical agencies in the Federal government - to Python, an open-source language. Migrating an existing codebase is a challenging effort requiring expertise in both the source and target languages, a level of effort from federal employees that can put significant strain on mission-critical activities, and an understanding of where redevelopment is necessary to take advantage of the target language's strengths. Current approaches in unsupervised translation often rely on massive amounts of parallel data which are not available for SAS and Python, and are mostly restricted to compiled languages such as C++ or Java. This initiative implements an automated SAS-Python translation application that uses Large Language Models (LLMs) to generate high-quality translations for human review to save time and resources. The application mixes rules-based approaches with LLM-generated translations for improved consistency across multiple runs, reduced model laziness when generating repetitive code, and nuanced translations where redevelopment is optimal. Our pipeline also works without reliance on output dataset performance or compiler feedback which is common in many Agent-LLM systems. This presentation will outline how the pipeline works from parsing, chunking, context window balancing, and generation. We will also detail the challenges we encounter specific to handling SAS code and how they are addressed. Our framework aims to support code migration efforts across federal agencies as they transition from legacy software to open-source solutions.

Survey Modernization Assisted by Generative AI

Alice Ramey, U.S. Bureau of Economic Analysis

BEA is modernizing its electronic survey collection system, eFile. Historically, BEA's 17 surveys have been collected electronically using PDF files that transmit data to BEA's internal databases using JavaScript. This collection method has become outdated and problematic as it requires the forms to be submitted using (free) Adobe Reader software. BEA decided to convert all 17 surveys to a web-based format using the open-source software SurveyJS. SurveyJS requires that the content of the PDF forms be converted to a JSON file format. While the SurveyJS form builder is generally very user-friendly and flexible, BEA discovered it is still time intensive to copy and fix the formatting of the content from our PDF files while putting them into JSON format. The conversion team investigated using ChatGpt and other artificial intelligence (AI) options to convert survey forms into JSON format. When provided with an example JSON form and an unconverted PDF form, ChatGpt was able to create a JSON file of the PDF survey with a high level of accuracy and better than other AI options. Manual editing is still required, but the use of generative AI to execute the bulk of the conversion is a major efficiency improvement.

For questions about the FedCASIC workshops or technical issues with the FedCASIC website, please send an email to [email protected].

Source: U.S. Census Bureau, ADSD

Last Revised: April 21st, 2025

2025 FedCASIC Virtual Conference