Last week U.S Census Bureau Acting Director Ron Jarmin released a blog about our progress on processing the data for the 2020 Census. This week I’d like to go a little deeper into how census data processing works to ensure the census is accurate. The bottom line is that producing results from the 2020 Census is an enormously complex process that takes time.
First, I want to explain that modern census processing is complex because of its numerous moving parts, with phases that must happen in a certain order. Our processing operation must place everyone correctly in a geographic location and count everyone in that location accurately.
From enumeration in remote Alaska, to interviewing at doorsteps across the country, to counting federally affiliated Americans stationed around the world, and collecting data from people who self-respond to the census, each data collection effort produces responses in specific formats and on distinct schedules. As Acting Director Jarmin said last week, responses from these operations can come in one (or more) of five possible data collection modes—a paper census form, an online form, a telephone interview, a high-quality administrative record, or an in-person interview.
To make responding as easy and quick as possible, we allowed people to respond with or without the Census ID we mailed to them. If a household responded with its unique Census ID, we were able to link the response to the home’s physical address right away. But we chose to give households the option to respond without their Census ID, meaning that by design, we’re processing an unprecedented number of 2020 Census responses without IDs. This means it takes longer to match responses to the right address and count people in the right place.
Add to this that we also have to correctly process everyone living in group quarters (college dorms, nursing homes, prisons, etc.), those living in transient situations (e.g., van life), and those experiencing homelessness, and you have what data experts call a tough “data carpentry” challenge of building all the pieces to fit together properly. Fortunately, our technology, data, and subject matter professionals have built an excellent system to meet that challenge. Over the past three censuses, we designed census processing to integrate deep human expertise and experience with complex computer business rules, maximizing the strengths of each.
There are four phases to our processing approach leading up to the release of the first census results – state population totals that are used to apportion the seats in the U.S. House of Representatives. Each phase creates a data file.
1) Decennial Response File 1 – We begin by linking the complete inventory of every residential address in the nation to every response we received during data collection. This phase produces what we call the Decennial Response File 1 (DRF1) and it involves a series of steps:
2) Decennial Response File 2 – The second phase of census processing is where we remove duplicate responses we received. We call this phase the Decennial Response File 2 (DRF2), and in it we run what is called the Primary Selection Algorithm (PSA). The PSA resolves situations where we have two or more responses from the same address. In this phase, we also incorporate data from quality checks on self-responses and results from nationwide matching (when we check for people duplicated at different addresses), and we resolve situations where people indicated in a response that they have a usual home elsewhere. This processing must be done on the full nation at once, as some duplication occurs between states. The same rigorous level of review continues in DRF2 to ensure correct software execution and data accuracy.
3) Census Unedited File (CUF) – Building on the DRF2, the third phase of census processing produces the Census Unedited File (CUF). This process determines the final population count for each address in the census.
4) Data for Apportionment – The fourth phase is final review, preparation and delivery of the apportionment data to the president. These state population counts determine how many seats each state gets in the U.S. House of Representatives. Watch this space for an upcoming blog that will provide additional information on that important and ceremonial release.
After delivering the data for apportionment, our work continues. We go through additional phases of data processing as we prepare to release more detailed statistics for smaller levels of geography. We know states are particularly eager to receive the local-level data they need for redistricting. We’ll talk more about the steps to create those data and subsequent statistical products in future blogs.
In summary, processing a census is complex work that takes time, computing power, and subject matter expertise. We’re currently in the second of the first four processing phases – producing the DRF2. With each phase, we’re rigorously reviewing the resulting data files to ensure we count everyone accurately and in the correct geographic location.