Our statistical products – like median household income or monthly retail sales – don't just emerge out of thin air. Underlying all our data products are one or more sources of more detailed data from which we compute the statistics you see on our website or in the news media. The sources are typically responses to our household and business censuses and surveys or records on government or private-sector databases.
The most important factor underlying the quality of the statistics we produce is the quality of source data from which they're computed. For decades, the primary method for gathering source data at federal statistical agencies has been the sample surveys. Such surveys yield statistics about a population by randomly sampling a much smaller subset of units, unlike a census, which measures all units in the population. Pioneered at the U.S. Census Bureau by Morris Hansen, Edward Deming and others in the middle of the 20th century, sample surveys revolutionized federal statistics by giving agencies scientifically sound and economical methods to compute statistics of interest across many topics.
However, the workhorse surveys that served the country well for decades are showing signs of stress. Households and businesses are becoming more reluctant to fill out government (and other) surveys. This trend has been underway for some time, and we see it across many countries. Moreover, it appears to have accelerated after the pandemic. Declining response rates reduce the quality of statistics computed from surveys and increase survey costs. So far in the United States, agencies like the Census Bureau have been able to maintain high quality by incorporating additional data, adjusting survey weights, targeted oversampling, or other methods. Statistical agencies in some countries have been forced to withhold publishing data due to quality concerns arising from low response rates.
Over the same period that we’ve been experiencing declining survey response rates, we’ve seen an explosion of digital data that can be used to supplement or even replace existing survey collections. I can’t delve into all the possible benefits and pitfalls of using “big data” for official statistics in this blog, but good discussions can be found in Groves (2011), Jarmin (2019), Abraham et al. (2022), and NASEM (2023). Noting both the costs and benefits, the Census Bureau is carefully exploring several alternative data sources that can help us in a variety of ways across our broad economic and social measurement mission.
Before I describe some of the ways we’re tapping new data sources, it’s important to note that the Census Bureau has been using nonsurvey data in our work for decades. Administrative data from other government agencies such as the Internal Revenue Service, the Social Security Administration, the Postal Service, and state unemployment insurance offices provide crucial information that underlie our Frames from which we draw samples for our household and business surveys. But we also use administrative data directly in estimation, as in products like our Small Area Income and Poverty Estimates, Business Dynamics Statistics, Business Formation Statistics, OnTheMap, and the 2020 Census. These Frames contain high-quality information covering the universes of the primary units that are the subjects of our statistical products: locations (business, residential, and government), organizations (businesses and governments), and households and individuals. These Frames are thus invaluable when we’re evaluating alternative data sources, as they help us understand the coverage and characteristics of the units included—for example, the number and locations of retail stores in a third-party dataset.
Let me now turn to some exciting examples of how we’re using novel nonsurvey data sources to make our statistics more accurate, more timely, and more sustainable.
These short descriptions do no justice to all the work our teams have accomplished on these and similar efforts across the Census Bureau. For example, we’ve been working with Circana for over 10 years now. New data sources are never “shovel-ready,” as they say. It takes time to evaluate and understand new data sources and how their characteristics impact the statistics we publish. My last blog post described our work building a modern Business Ecosystem at the Census Bureau. This is crucial as we expand our uses of alternative data sources, both due to the scale of the data (e.g., retail transactions or satellite imagery) and the complexity of statistical computations that utilize AI and blended data. Tools like Valhalla will help us more efficiently incorporate new data sources and make them ready for use. The goal of all this is to provide you with higher quality and more sustainable statistics. We’d love your feedback and will keep you informed as we move forward.