U.S. flag

An official website of the United States government

Skip Header


Finding ‘Anomalies’ Illustrates 2020 Census Quality Checks Are Working

Written by:

We’re in the midst of data processing for the 2020 Census. As Acting Census Bureau Director Ron Jarmin acknowledged in a recent blog, we’ve discovered some “anomalies” along the way that we’re looking into and resolving.

Today, I’d like to unpack what that means. The word “anomaly” can sound alarming. In fact, the scientific advisory group JASON recently recommended that we consider avoiding the word because of the unwarranted alarm it causes, especially when used without context.

Instead of potentially causing confusion by introducing another word, this blog endeavors to explain this technical term and provide the needed context.

“Anomaly” just means that we’ve found something in our quality review process that doesn’t look quite right. Anomalies found in processing are not errors in the census, but they can turn into errors if we don’t review and resolve them. It’s a feature of our quality check process to find them, and it gives us the opportunity to fix any issues we confirm.

No matter what they are called, these anomalies are a signal that the quality checks on the census are working. Let’s dive deeper into what they are and how we are addressing them.

Finding Anomalies

With an accurate census count as the primary goal, our subject matter experts meticulously go through the response data, comparing population totals against other data sources, such as the 2010 Census, the 2020 population estimates, and the American Community Survey. They also ensure that processing ran as designed. As we review the data, we look for outliers — numbers that don’t fit what we might reasonably expect.

Where we find outliers, we dig deeper to find out what’s going on. If we determine that a fix is needed to correct an error, we fix it.

Examining outliers is a normal part of data processing and the quality checks we do for any census or survey.

To date we have encountered 33 anomalies, which fall into three main categories. In his last blog, Acting Director Jarmin gave some high-level examples of their causes. In this blog, I’ll explain more specifically what and how many we’ve seen, and what we have done to fix them.

“Standard” or Coding-Related Anomalies

The biggest category is “standard” anomalies. These arise in processing any census or survey.

These routine anomalies relate to coding — how the response data appear and are processed in our data files and in the resulting tallies.

First, let me give some background on how coding works. We spend a lot of time ahead of the census thinking through how to deal with both responses that come in and missing information.

  • We establish business rules. These rules define “if X happens, then Y should happen.”
  • We write specifications. These specifications translate the business rules into instructions for how the data should be processed in our systems.
  • We write code. The code turns responses and missing information into data, following the specifications.

Having worked on a number of decennial censuses, the career processing staff at the Census Bureau understand that the reality for any large, complex data collection is that coding is never able to anticipate every data situation, no matter how well we test it. Knowing this, we meticulously run quality checks looking for outliers in the data.

Where we spot an outlier, we go back and look at the specifications and code to identify where things may have gone wrong. If the code or specifications were incorrect, we write a fix and then test that fix. When we confirm the fix works, we implement it to make sure the data display correctly.

So far in 2020 Census processing, 27 of the 33 anomalies we’ve found are of this type. Let me give a couple of examples.

  • Miscalculating age for missing birthdays. We found that our system was miscalculating ages for people who included their year of birth but left their birthday and month blank. We fixed this with a simple code correction. Making sure ages calculate correctly helps us with other data processing steps for matching and removing duplicate responses.
  • Incorrectly sorting out self-responses from group quarters residents. The 2020 Census allowed people to respond online or by phone without using the pre-assigned Census ID that links their response to their address. As a result, some people who live in group quarters facilities, such as nursing homes, were able to respond on their own even though they were also counted through the separate Group Quarters Enumeration operation. This also makes their address show up as a duplicate — as both a group quarters facility and a housing unit. Our business rules sort out these duplicate responses and addresses by accepting the response coming from the group quarters operation and removing the response and address appearing as a housing unit. We found an error in how this rule was being carried out. The code was correctly removing the duplicate address but wasn’t removing the duplicate response. We fixed this with another code correction, which enables us to avoid overcounting these residents. 

Anomalies From Unanticipated Respondent Actions

Another category of anomalies results from respondent actions that we did not anticipate. The COVID-19 pandemic seemed to exacerbate these.

So far in 2020 we have encountered five anomalies of this type. The most notable example is related to the count of students living in college dorms at some universities.

When the pandemic hit, we strongly encouraged colleges and universities to provide responses for their residents electronically instead of through one of the in-person options we offered. A small number of colleges and universities mistakenly reported the total student population in all dorms for each dorm. If not fixed, this could have inflated the population count on those campuses. 

We spotted this error because months before we asked colleges and universities to respond for their students, we asked them to estimate how many students lived on their campuses. We compared these estimates to the response data they ultimately provided, and the numbers stood out as outliers. We confirmed what had happened and then implemented a code fix to correctly distribute the population among those dorms.

Unanticipated Census Taker Actions

The last category of anomalies results from unanticipated census taker action. So far in 2020 we have encountered only one anomaly of this type.

We had a census taker who incorrectly flagged a group quarters facility, which wiped out the response data for the entire facility. We identified the issue during our quality checks and were able to reinstate the response data for all the residents.

Summary

I am pleased to report that we have not found any anomalies that are impossible to fix. We have fixed or are fixing every anomaly that our systems and processes have identified so far, and we will continue to look for and fix any that arise as we continue processing the data.

In fact, we completed the second phase of our data processing (validation of the Decennial Response File 2) on Feb. 24. In this phase, we removed duplicate responses that we received and addressed any anomalies which needed to be corrected. We have begun work on the third phase (Census Unedited File processing), and we will continue to look for and fix any anomalies that arise as we continue processing the data.

Finding these anomalies illustrates that our quality checks are working — ensuring we can count everyone once, only once, and in the right place.

Related blogs


Random Samplings Blog
Upcoming 2020 Census Coverage Estimates 
The U.S. Census Bureau released coverage estimates for the 2020 Census.


Random Samplings Blog
The Post-Enumeration Survey: Measuring Coverage Error
Although we undertake extensive efforts to accurately count everyone in the decennial census, sometimes people are missed or duplicated.


Random Samplings Blog
Using Demographic Benchmarks to Help Evaluate 2020 Census Results
One of the primary methods of evaluating the quality of a census is comparing the results to other population benchmarks.


Random Samplings Blog
Programa de Evaluaciones y Experimentos del Censo del 2020
Este blog describe la serie de evaluaciones formales que miden diferentes aspectos de las operaciones del censo y los desafíos.


Random Samplings Blog
2020 Census Program for Evaluations, Experiments, and Assessments
This blog describes the series of formal evaluations and assessments that measure different aspects of census operations and specific challenges.


Random Samplings Blog
Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures
This blog discusses how we improved the census questions on race and Hispanic origin, also known as ethnicity, between 2010 and 2020.


Random Samplings Blog
How We Complete the Census When Demographic and Housing Characteristics Are Missing
Although we strive to obtain all demographic and housing data from every individual in the census, missing data are part of every census process.


Random Samplings Blog
Censo del 2020: Métricas de calidad, Publicación 2
Este blog proporciona datos destacados del segundo grupo de métricas operacionales de calidad del Censo del 2020.


Random Samplings Blog
2020 Census Operational Quality Metrics: Release 2
Today we released the second round of 2020 Census operational quality metrics.


Random Samplings Blog
Examining Operational Quality Metrics
The Census Bureau is taking a multifaceted approach to studying the quality of the 2020 Census, so as to produce a more complete and informative picture.


Random Samplings Blog
Comparisons to Benchmarks as a Measure of Quality
Data quality is multidimensional and so approaching it from multiple angles produces a more insightful and holistic picture of a dataset.


Random Samplings Blog
Revisión de los datos del Censo del 2020
En este blog hablamos sobre cómo estamos realizando una de las revisiones de datos más completas en la historia reciente del censo, para el Censo del 2020.


Random Samplings Blog
2020 Census Data Review
For the 2020 Census, we are conducting one of the most comprehensive reviews in recent census history.


Random Samplings Blog
Cómo completamos el censo cuando los hogares o alojamientos de grupo no responden
Mientras continuamos procesando las respuestas al Censo del 2020, las personas han preguntado qué sucede cuando no obtenemos una respuesta de una dirección.


Random Samplings Blog
How We Complete the Census When Households or Group Quarters Don’t Respond
As we continue to process 2020 Census responses, people have asked what happens when we don’t get a response from an address.


Random Samplings Blog
Administrative Records and the 2020 Census
Each decade we are asked, “Why don’t you just use the information the government already has about me for the census? Why ask me again?”


Random Samplings Blog
Los registros administrativos y el Censo del 2020
Este blog describe cómo el Censo del 2020 usó los registros administrativos para contar a las personas que no respondieron.


Random Samplings Blog
Introduction to Quality Indicators: Operational Metrics
In the coming weeks, the U.S. Census Bureau will release the first set of results from the 2020 Census. Our goal for every census is to count everyone once, only once, and in the right place.


Random Samplings Blog
2020 Census Group Quarters
As we continue processing 2020 Census results, we’d like to provide more information on how we count people living in group quarters (GQs).


Random Samplings Blog
Encontrar ‘anomalías’ demuestra que los controles de calidad del Censo del 2020 funcionan
El 9 de marzo de 2021, la Oficina del Censo de los EE. UU. publicó un blog (en inglés) sobre las “anomalías” que encontramos al procesar los datos del Censo del 2020.


Random Samplings Blog
Finding ‘Anomalies’ Illustrates 2020 Census Quality Checks Are Working
We’re in the midst of data processing for the 2020 Census. As Acting Census Bureau Director Ron Jarmin acknowledged in a recent blog, we’ve discovered some “anomalies” along the way that we’re looking into and resolving.


Random Samplings Blog
Adaptación de las operaciones de campo para enfrentar desafíos sin precedentes
La oficina del Censo de los EE. UU. compartió información en una publicación de blog el 1 de marzo de 2021, acerca de cómo la realización de un censo es una tarea enorme, incluso en circunstancias ideales.


Random Samplings Blog
Adapting Field Operations to Meet Unprecedented Challenges
As we process census responses and analyze the quality of the 2020 Census, it’s helpful to look back at some of the unprecedented challenges we faced during this census.


Random Samplings Blog
Ensuring a Robust and Accurate Data Quality Analysis in the 2020 Census
Asking outside experts to review our work is standard operating procedure at the U.S. Census Bureau. It underscores our commitment to quality and transparency.


Random Samplings Blog
Timeline for Releasing Redistricting Data
We expect to deliver the redistricting data to the states and the public by Sept. 30, 2021.


Random Samplings Blog
Census Data Processing 101
Michael Thieme describes how census data processing works to ensure the census is accurate.


Directors Blog
2020 Census Processing Updates
I’m writing to provide an update on data processing for the 2020 Census.


Random Samplings Blog
Update on 2020 Census Data Processing and Quality
The Census Bureau has begun processing the data collected for the 2020 Census. Data collection for the decennial census is always a herculean task and 2020 was no exception.

Top

Back to Header