We’re in the midst of data processing for the 2020 Census. As Acting Census Bureau Director Ron Jarmin acknowledged in a recent blog, we’ve discovered some “anomalies” along the way that we’re looking into and resolving.
Today, I’d like to unpack what that means. The word “anomaly” can sound alarming. In fact, the scientific advisory group JASON recently recommended that we consider avoiding the word because of the unwarranted alarm it causes, especially when used without context.
Instead of potentially causing confusion by introducing another word, this blog endeavors to explain this technical term and provide the needed context.
“Anomaly” just means that we’ve found something in our quality review process that doesn’t look quite right. Anomalies found in processing are not errors in the census, but they can turn into errors if we don’t review and resolve them. It’s a feature of our quality check process to find them, and it gives us the opportunity to fix any issues we confirm.
No matter what they are called, these anomalies are a signal that the quality checks on the census are working. Let’s dive deeper into what they are and how we are addressing them.
With an accurate census count as the primary goal, our subject matter experts meticulously go through the response data, comparing population totals against other data sources, such as the 2010 Census, the 2020 population estimates, and the American Community Survey. They also ensure that processing ran as designed. As we review the data, we look for outliers — numbers that don’t fit what we might reasonably expect.
Where we find outliers, we dig deeper to find out what’s going on. If we determine that a fix is needed to correct an error, we fix it.
Examining outliers is a normal part of data processing and the quality checks we do for any census or survey.
To date we have encountered 33 anomalies, which fall into three main categories. In his last blog, Acting Director Jarmin gave some high-level examples of their causes. In this blog, I’ll explain more specifically what and how many we’ve seen, and what we have done to fix them.
The biggest category is “standard” anomalies. These arise in processing any census or survey.
These routine anomalies relate to coding — how the response data appear and are processed in our data files and in the resulting tallies.
First, let me give some background on how coding works. We spend a lot of time ahead of the census thinking through how to deal with both responses that come in and missing information.
Having worked on a number of decennial censuses, the career processing staff at the Census Bureau understand that the reality for any large, complex data collection is that coding is never able to anticipate every data situation, no matter how well we test it. Knowing this, we meticulously run quality checks looking for outliers in the data.
Where we spot an outlier, we go back and look at the specifications and code to identify where things may have gone wrong. If the code or specifications were incorrect, we write a fix and then test that fix. When we confirm the fix works, we implement it to make sure the data display correctly.
So far in 2020 Census processing, 27 of the 33 anomalies we’ve found are of this type. Let me give a couple of examples.
Another category of anomalies results from respondent actions that we did not anticipate. The COVID-19 pandemic seemed to exacerbate these.
So far in 2020 we have encountered five anomalies of this type. The most notable example is related to the count of students living in college dorms at some universities.
When the pandemic hit, we strongly encouraged colleges and universities to provide responses for their residents electronically instead of through one of the in-person options we offered. A small number of colleges and universities mistakenly reported the total student population in all dorms for each dorm. If not fixed, this could have inflated the population count on those campuses.
We spotted this error because months before we asked colleges and universities to respond for their students, we asked them to estimate how many students lived on their campuses. We compared these estimates to the response data they ultimately provided, and the numbers stood out as outliers. We confirmed what had happened and then implemented a code fix to correctly distribute the population among those dorms.
The last category of anomalies results from unanticipated census taker action. So far in 2020 we have encountered only one anomaly of this type.
We had a census taker who incorrectly flagged a group quarters facility, which wiped out the response data for the entire facility. We identified the issue during our quality checks and were able to reinstate the response data for all the residents.
I am pleased to report that we have not found any anomalies that are impossible to fix. We have fixed or are fixing every anomaly that our systems and processes have identified so far, and we will continue to look for and fix any that arise as we continue processing the data.
In fact, we completed the second phase of our data processing (validation of the Decennial Response File 2) on Feb. 24. In this phase, we removed duplicate responses that we received and addressed any anomalies which needed to be corrected. We have begun work on the third phase (Census Unedited File processing), and we will continue to look for and fix any anomalies that arise as we continue processing the data.
Finding these anomalies illustrates that our quality checks are working — ensuring we can count everyone once, only once, and in the right place.