In keeping with the U.S. Census Bureau’s long-established commitment to being entirely transparent in the production of our statistics and data products, I’m writing to provide an update on data processing for the 2020 Census. In every decennial census, we are the first to identify and analyze the quality of our data, including the extent to which we overcount or undercount key population groups in our country. We cannot do this in detail until we complete the Post-Enumeration Survey later this year, however we know a lot already about the accuracy and completeness of our population counts in the 2020 Census. I blogged with some initial impressions in early November, and we’ve made a lot of progress since then. But as reported in the media, some issues have surfaced as well. Most of these issues are typical and are similar to those we’ve encountered in prior decennial censuses. Others are novel to planned improvements for the 2020 Census, and some are related to the difficulties experienced collecting data during the COVID-19 pandemic.
The main consequence of these issues regards the schedule. The Census Bureau takes its constitutional and statutory duties very seriously. Even with the pandemic delaying data collection, we hoped to deliver the state population counts for apportionment by the statutory deadline of Dec. 31, 2020. Even with data collection not ending until mid-October, this was technically possible as long as we didn’t encounter any significant processing issues. However, we were also realistic knowing that all prior decennial censuses encountered such issues. We devoted additional resources, including staff working weekends and holidays, to meet the deadline. Even with these additional resources, we knew that we would need to stay on what we referred to as the “happy path,” where each stage of processing would be completed with no issues in order to process the data in two and a half months rather than the five months that our final Operational Plan called for. The path we actually experienced was much more like those we’ve historically experienced in prior censuses than the “happy path” we had hoped would allow us to deliver the apportionment data on time. The result is that our current schedule points to April 30, 2021, for the completion of the apportionment counts.
The desire to meet the statutory deadline meant that there was unprecedented attention by Census Bureau and U.S. Department of Commerce leadership and by outside observers on the data processing schedule. It is important to know that while we had the goal of finishing by the statutory deadline, or as close to it as possible, the Census Bureau’s most important objective — the objective that has driven our entire approach to the 2020 Census — is to deliver a complete and accurate census. That is, to count every person residing in the country once, only once, and in the right place. To achieve this objective, ALL processing issues we find are carefully researched, a fix is developed and tested, and then implemented. Because this can be a time-consuming process, the “happy path” that would have met the statutory deadline was not achievable.
The issues we’ve uncovered are varied in their underlying cause, their magnitude, and the complexity of their remedies. Coming soon will be more detailed information about these issues and how we’re addressing them from experts far more qualified to comment on them than I. But I want to talk briefly about some broad classes of issues we’re seeing, some more concerning that others.
First, there are what one might call “standard” problems that arise in processing any large survey. These might include what we’ve been referring to as processing anomalies such as basic errors in processing code, mismatches between code and business rules, errors in data handoffs between systems, and misalignment of processing business rules between phases. These make up the majority of the issues we’ve encountered in processing the 2020 Census. It is common to encounter issues like these when one runs the entire country through post-collection processing and these issues are relatively straightforward to address. Thus, we’re not concerned about them impacting the quality of the final data. And while we try to apply lessons learned from prior censuses and surveys to minimize the prevalence and impact of these types of issues, design changes to the 2020 Census intended to make it easier for everyone to respond can create new and unanticipated complexities for data processing as the number of response modes (e.g., internet, phone, and paper self-response; administrative records; and visits by enumerators) has increased relative to 2010.
The other class of issues arise from the nature of the responses to the 2020 Census. Again, we’ve experienced most of these before, but some have been exacerbated by design changes for 2020 and, most importantly, by the pandemic. These include whether households respond (response rates), the completeness of their responses (I mentioned the issue of item nonresponse in my November blog), and the ability of our enumerators to contact nonresponding households (e.g., in areas impacted by natural disasters) and the cooperation of those households.
Enumerating Group Quarters (GQs) facilities is a challenge in every decennial census, but we are seeing additional complications brought on by the COVID-19 pandemic. GQs are facilities such as college dormitories, prisons and nursing homes. We delayed this and other field operations due to the pandemic. This delay, and the fact that some facilities emptied in the spring due to the pandemic, has caused issues with our GQs enumeration. Even though these issues affect a relatively small part of the total count, they can have a big impact on the count for the communities in which they’re located. As a result, we re-contacted thousands of facilities and have brought in new data sources such as the Integrated Postsecondary Education Data System (for college dormitories) to resolve these issues.
Another issue we experience with every decennial census is duplicate responses. The addition of the internet response option and the ability to respond without a Census ID# have increased duplicate responses, as we expected. Our data processing in the 2010 Census was able to handle this challenge, and we are again well-equipped to handle duplicates in the 2020 Census.
Counting everyone once, only once, and in the right place is a daunting challenge even in the best of circumstances, and the circumstances presented for the 2020 Census were not the best. At the end of the day, the key question is, did those circumstances impact the fitness of the data we will release based on the 2020 Census? Knowing that the COVID-19 pandemic might pose data quality issues for the 2020 Census, I chartered the 2020 Data Quality Executive Guidance Group last April to ensure that we had the right focus and resources dedicated to detecting and addressing data quality issues. Since then, I and other senior leaders have met regularly with various teams working the 2020 Census to discuss a range of quality-related issues. During data collection, these discussions centered on how game-time decisions to cope with the pandemic, hurricanes, wildfires and civil unrest might affect data quality and what we could do to mitigate any impacts to data quality. During post-collection processing, we’ve reviewed processing anomalies, discussed remedies, and reviewed early quality indicators. Most importantly, we’ve ensured that our dedicated staff have the time and resources to do the job right. To increase the transparency of our efforts, we will be releasing additional blogs from Census Bureau experts that dive more deeply into the issues discussed above. Also, we are working with a team of experts from the American Statistical Association on quality indicators (I mentioned our intention to engage them in my November blog) and members of JASON who, since the beginning of data collections, continue to review our processes, procedures and key decisions. JASON is an independent group of technical experts that advise the federal government on sensitive matters in science and technology. These efforts will give the public an unprecedented behind-the-scenes look at the 2020 Census and should provide additional confidence in assessing the fitness of the 2020 Census data.
As we complete each major stage of processing and release 2020 Census data products to the public, we will be releasing quality indicators appropriate for each release. Later this month, we will begin processing the Census Unedited File (CUF) from which the apportionment counts are tabulated. For the release of the apportionment data by the end of April, we plan to release state-level quality indicators based on the CUF. Once the CUF is complete, we will begin processing the Census Edited File from which the Public Law 94-171 redistricting data are tabulated. We hope to have an update on this schedule soon. Here again, we are developing appropriate quality indicators to accompany this release. Please continue to watch for more updates from my colleagues in the coming weeks.