U.S. flag

An official website of the United States government

Skip Header

Census Data Processing 101

Written by:

Last week U.S Census Bureau Acting Director Ron Jarmin released a blog about our progress on processing the data for the 2020 Census. This week I’d like to go a little deeper into how census data processing works to ensure the census is accurate. The bottom line is that producing results from the 2020 Census is an enormously complex process that takes time. 

First, I want to explain that modern census processing is complex because of its numerous moving parts, with phases that must happen in a certain order. Our processing operation must place everyone correctly in a geographic location and count everyone in that location accurately.

From enumeration in remote Alaska, to interviewing at doorsteps across the country, to counting federally affiliated Americans stationed around the world, and collecting data from people who self-respond to the census, each data collection effort produces responses in specific formats and on distinct schedules. As Acting Director Jarmin said last week, responses from these operations can come in one (or more) of five possible data collection modes—a paper census form, an online form, a telephone interview, a high-quality administrative record, or an in-person interview.

To make responding as easy and quick as possible, we allowed people to respond with or without the Census ID we mailed to them. If a household responded with its unique Census ID, we were able to link the response to the home’s physical address right away. But we chose to give households the option to respond without their Census ID, meaning that by design, we’re processing an unprecedented number of 2020 Census responses without IDs. This means it takes longer to match responses to the right address and count people in the right place.

Add to this that we also have to correctly process everyone living in group quarters (college dorms, nursing homes, prisons, etc.), those living in transient situations (e.g., van life), and those experiencing homelessness, and you have what data experts call a tough “data carpentry” challenge of building all the pieces to fit together properly. Fortunately, our technology, data, and subject matter professionals have built an excellent system to meet that challenge. Over the past three censuses, we designed census processing to integrate deep human expertise and experience with complex computer business rules, maximizing the strengths of each.

First Four Processing Phases

There are four phases to our processing approach leading up to the release of the first census results – state population totals that are used to apportion the seats in the U.S. House of Representatives. Each phase creates a data file.

1)  Decennial Response File 1 – We begin by linking the complete inventory of every residential address in the nation to every response we received during data collection. This phase produces what we call the Decennial Response File 1 (DRF1) and it involves a series of steps:

  • We determine the final classification of each address as either a housing unit or a group quarters location, and we identify each unique person present on the response.
  • We incorporate the outcomes produced in our quality check re-interviews. (That is, if a census taker’s interview with a household did not pass our quality check, we delete those responses and use the information a separate census taker collected in a reinterview with the household.) 
  • We link paper continuation forms (for large households that will not fit on a single form) to the correct parent form.
  • Finally, we standardize the response data to a common format across the modes of collection. We produce the DRF1 state-by-state, following a detailed process specifically designed to detect, identify, and resolve data anomalies. We’ll talk more about dealing with anomalies in an upcoming blog post.

2)  Decennial Response File 2 – The second phase of census processing is where we remove duplicate responses we received. We call this phase the Decennial Response File 2 (DRF2), and in it we run what is called the Primary Selection Algorithm (PSA). The PSA resolves situations where we have two or more responses from the same address. In this phase, we also incorporate data from quality checks on self-responses and results from nationwide matching (when we check for people duplicated at different addresses), and we resolve situations where people indicated in a response that they have a usual home elsewhere. This processing must be done on the full nation at once, as some duplication occurs between states. The same rigorous level of review continues in DRF2 to ensure correct software execution and data accuracy.

3)  Census Unedited File (CUF) – Building on the DRF2, the third phase of census processing produces the Census Unedited File (CUF). This process determines the final population count for each address in the census.

  •  The CUF processing determines the status for every address as occupied, vacant, non-existent, or unresolved (meaning we did not get a sufficient response or resolution of the case for that address).
  • For unresolved addresses, we use processing software that applies statistical methods to fill in the missing housing unit status and, if necessary, the missing household population.
  • The same rigorous level of review continues for the CUF as in every other phase to ensure correct software execution and data accuracy. The CUF provides the basis for the apportionment counts we produce.

4)  Data for Apportionment – The fourth phase is final review, preparation and delivery of the apportionment data to the president. These state population counts determine how many seats each state gets in the U.S. House of Representatives. Watch this space for an upcoming blog that will provide additional information on that important and ceremonial release.

Beyond Apportionment Data

After delivering the data for apportionment, our work continues. We go through additional phases of data processing as we prepare to release more detailed statistics for smaller levels of geography. We know states are particularly eager to receive the local-level data they need for redistricting. We’ll talk more about the steps to create those data and subsequent statistical products in future blogs.

In summary, processing a census is complex work that takes time, computing power, and subject matter expertise. We’re currently in the second of the first four processing phases – producing the DRF2. With each phase, we’re rigorously reviewing the resulting data files to ensure we count everyone accurately and in the correct geographic location. 

Related blogs

Random Samplings Blog
Upcoming 2020 Census Coverage Estimates
The U.S. Census Bureau released coverage estimates for the 2020 Census.

Random Samplings Blog
The Post-Enumeration Survey: Measuring Coverage Error
Although we undertake extensive efforts to accurately count everyone in the decennial census, sometimes people are missed or duplicated.

Random Samplings Blog
Using Demographic Benchmarks to Help Evaluate 2020 Census Results
One of the primary methods of evaluating the quality of a census is comparing the results to other population benchmarks.

Random Samplings Blog
Programa de Evaluaciones y Experimentos del Censo del 2020
Este blog describe la serie de evaluaciones formales que miden diferentes aspectos de las operaciones del censo y los desafíos.

Random Samplings Blog
2020 Census Program for Evaluations, Experiments, and Assessments
This blog describes the series of formal evaluations and assessments that measure different aspects of census operations and specific challenges.

Random Samplings Blog
Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures
This blog discusses how we improved the census questions on race and Hispanic origin, also known as ethnicity, between 2010 and 2020.

Random Samplings Blog
Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures
This blog discusses how we improved the census questions on race and Hispanic origin, also known as ethnicity, between 2010 and 2020.

Random Samplings Blog
How We Complete the Census When Demographic and Housing Characteristics Are Missing
Although we strive to obtain all demographic and housing data from every individual in the census, missing data are part of every census process.

Random Samplings Blog
Censo del 2020: Métricas de calidad, Publicación 2
Este blog proporciona datos destacados del segundo grupo de métricas operacionales de calidad del Censo del 2020.

Random Samplings Blog
2020 Census Operational Quality Metrics: Release 2
Today we released the second round of 2020 Census operational quality metrics.

Random Samplings Blog
Examining Operational Quality Metrics
The Census Bureau is taking a multifaceted approach to studying the quality of the 2020 Census, so as to produce a more complete and informative picture.

Random Samplings Blog
Comparisons to Benchmarks as a Measure of Quality
Data quality is multidimensional and so approaching it from multiple angles produces a more insightful and holistic picture of a dataset.

Random Samplings Blog
2020 Census Data Review
For the 2020 Census, we are conducting one of the most comprehensive reviews in recent census history.

Random Samplings Blog
Revisión de los datos del Censo del 2020
En este blog hablamos sobre cómo estamos realizando una de las revisiones de datos más completas en la historia reciente del censo, para el Censo del 2020.

Random Samplings Blog
Completing the Census When Households or Group Quarters Don't Respond
As we continue to process 2020 Census responses, people have asked what happens when we don’t get a response from an address.

Random Samplings Blog
Cómo completamos el censo cuando los hogares no responden
Mientras continuamos procesando las respuestas al Censo del 2020, las personas han preguntado qué sucede cuando no obtenemos una respuesta de una dirección.

Random Samplings Blog
Administrative Records and the 2020 Census
Each decade we are asked, “Why don’t you just use the information the government already has about me for the census? Why ask me again?”

Random Samplings Blog
Los registros administrativos y el Censo del 2020
Este blog describe cómo el Censo del 2020 usó los registros administrativos para contar a las personas que no respondieron.

Random Samplings Blog
Introduction to Quality Indicators: Operational Metrics
In the coming weeks, the U.S. Census Bureau will release the first set of results from the 2020 Census. Our goal for every census is to count everyone once, only once, and in the right place.

Random Samplings Blog
2020 Census Group Quarters
As we continue processing 2020 Census results, we’d like to provide more information on how we count people living in group quarters (GQs).

Random Samplings Blog
Finding 'Anomalies' Illustrates 2020 Census Quality Checks Are Working
We’re in the midst of data processing for the 2020 Census. As Acting Census Bureau Director Ron Jarmin acknowledged in a recent blog, we’ve discovered some “anomalies” along the way that we’re looking into and resolving.

Random Samplings Blog
Encontrar ‘anomalías’ demuestra que los controles de calidad funcionan
El 9 de marzo de 2021, la Oficina del Censo de los EE. UU. publicó un blog (en inglés) sobre las “anomalías” que encontramos al procesar los datos del Censo del 2020.

Random Samplings Blog
Adapting Field Operations to Meet Unprecedented Challenges
As we process census responses and analyze the quality of the 2020 Census, it’s helpful to look back at some of the unprecedented challenges we faced during this census.

Random Samplings Blog
Adaptación de las operaciones de campo para enfrentar desafíos
La oficina del Censo de los EE. UU. compartió información en una publicación de blog el 1 de marzo de 2021, acerca de cómo la realización de un censo es una tarea enorme, incluso en circunstancias ideales.

Random Samplings Blog
Ensuring a Robust and Accurate Data Quality Analysis in the 2020 Census
Asking outside experts to review our work is standard operating procedure at the U.S. Census Bureau. It underscores our commitment to quality and transparency.

Random Samplings Blog
Timeline for Releasing Redistricting Data
We expect to deliver the redistricting data to the states and the public by Sept. 30, 2021.

Random Samplings Blog
Census Data Processing 101
Michael Thieme describes how census data processing works to ensure the census is accurate.

Directors Blog
2020 Census Processing Updates
I’m writing to provide an update on data processing for the 2020 Census.

Random Samplings Blog
Update on 2020 Census Data Processing and Quality
The Census Bureau has begun processing the data collected for the 2020 Census. Data collection for the decennial census is always a herculean task and 2020 was no exception.

This article was filed under:

Page Last Revised - March 22, 2022
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?


Back to Header