Skip Header

How We Complete the Census When Demographic and Housing Characteristics Are Missing

August 02, 2021
By Roberto Ramirez, assistant division chief, Special Population Statistics and Christine Borman, statistician demographer, Count Review Office, Population Division, U.S. Census Bureau

Estimated reading time: 8 minutes

Although we strive to obtain all demographic and housing data from every individual in the census, missing data are part of every census process. Fortunately, we have long-established procedures we’ve used in previous censuses and surveys to fill in these missing pieces. 

As you’ll see from this blog, this process is complex but is a reflection of the extensive standard statistical methodology we use to account for missing or conflicting data.

We collected demographic and housing data in a few ways:

  • Most people responded online, by phone or by mail and provided their demographic and housing characteristics.
  • A census taker collected the information from the household or group quarters.
  • A census taker collected the information from a proxy, such as a neighbor, landlord or building manager, after multiple failed attempts to reach a household member.
  • We used information the household provided in a previous census, survey, tax return or other government program.

Once we’ve collected all the data we can, we use statistical techniques, such as edits and characteristic imputation, for the small number of missing, invalid or inconsistent housing or demographic characteristics. With editing, we compare an individual’s responses to those of other household members or the overall group quarters to look for invalid or inconsistent information. With characteristic imputation, we fill in missing information by using a combination of sources, including other information in that individual’s or other family members’ census responses, responses from that individual or family member from another census or survey, or other existing records or information from similar nearby neighbors. More information is available in our blog: How We Complete the Census When Households or Group Quarters Don’t Respond.

It is important to note that edits and characteristic imputation occur after total population counts are finalized — these processes do not affect the number of people counted in the 2020 Census. Also keep in mind that the Census Bureau receives administrative records from many sources, but the data we collect cannot and will not be shared with anyone else, including other government agencies, or used for anything other than statistical purposes. Statistical purposes never include identifying respondents or the data they provided. 

Handling Missing Demographic and Housing Characteristics

Why do we impute missing demographic and housing data? We’ve done it for a long time and it’s an established process for most statistical agencies around the world. In fact, the Census Bureau has used characteristic imputation since the 1960 Census to ensure that each person and housing unit have demographic and housing data for each item on the census questionnaire. Imputation has been shown to improve data quality and accuracy compared to leaving these fields blank, or without information from respondents.

The Public Law (P.L.) 94-171 Redistricting Data Summary File is the first detailed data file released from the 2020 Census, which will be released by August 16. The 2020 Census data are used to distribute hundreds of billions of dollars in federal funding to state and local governments and communities across the country and is typically used by states to redraw congressional and state legislative districts.

Edits and characteristic imputation are part of our quality control and assurance measures and can be divided into three general stages — edits, assignment and allocation. 

Edits

In this phase, we take the responses people reported and run a series of checks:

  • We detect and correct out-of-range age values (e.g., date of birth and reported age are inconsistent).
  • We convert date of birth responses to age values.
  • We remove invalid responses, such as multiple relationship checkboxes selected or an age greater than 115 years.
  • We perform consistency edits to ensure the relationships between household members are consistent with the age and sex reported for them.
  • We remove duplicate Hispanic origin and race responses, such as the White checkbox being selected and “White” being written into the White text box. One of the two “White” responses is removed.
  • We convert race and Hispanic origin checkbox responses into standard output codes (e.g., the “Mexican, Mexican Am., Chicano” checkbox is converted to a numeric code). Assigning codes to words and phrases in each response helps capture what the response is about, which allows researchers to analyze and summarize the results.
  • We code Hispanic origin and race write-in responses by converting text responses into numeric codes. (We’ll talk more about the coding process in an upcoming blog.)

The ideal situation is that every person counted in the 2020 Census fills out their census questionnaire completely and provides valid and consistent responses. When they do, we call it “as reported” and these responses do not get assigned or allocated. 

Assignment

Assignment occurs when missing responses can be determined based on other information provided for that same person.

In the 2010 Census, we used responses from the 2000 Census to fill in missing Hispanic origin and race information. A major improvement for the 2020 Census is the expanded use of administrative records to assign demographic and housing characteristics during characteristic imputation.

In 2020, we used 2010 Census responses to fill in missing values for sex, age, Hispanic origin and race. Plus, we used information from the American Community Survey, Social Security Administration (such as records from Social Security card applications), other federal administrative records, and commercial housing tax and deed information to assign missing characteristics.

Below are specific examples of how we assign each of the key demographic and housing characteristics collected in the 2020 Census:

  • Sex. We use respondent’s first name to try to fill in missing sex. We also assign sex to maintain household consistency. For example, if sex is missing for the householder’s opposite-sex spouse or unmarried partner, we assign the sex that fits with that response. In addition, we assign sex from prior census responses and other existing records.
  • Age. If a person reported their date of birth, we assign their missing age. If date of birth is missing too, we can often use what they reported on another census or in other federal administrative records. If the age calculated from the reported date of birth was inconsistent with reported age, we choose the value that is more consistent with the person’s relationship to the householder. If this occurs for a person in group quarters, we choose the reported age or date of birth closest to the median age for all of the people in the group quarters unit.
  • Hispanic origin and race. If Hispanic origin was missing, we use responses from the race question. For example, if a respondent reported “Cuban” in the race question, then we would code a response of “Yes, Cuban” for the Hispanic origin question. Similarly, if race was missing, we use responses from the Hispanic origin question. We also assign Hispanic origin and race from prior American Community Survey or 2010 Census responses and other administrative records.
  • Relationship to householder. If relationship was missing, we use administrative records that indicate a parent/child relationship to assign it where possible.
  • Tenure (own or rent). We use information from administrative records and tax assessor records to assign missing tenure. For example, we identify housing units that receive rental assistance through public housing or other federal programs and they are assigned to be renter-occupied units. 

Allocation

We turn to allocation when we can’t determine missing responses from other information provided for that same person living in a household or group quarters. The primary method of allocation is to use information from similar nearby households. We use allocation to determine responses for:

  • Individual demographic characteristics that are missing and could not be assigned.
  • Entire households if all of the demographic characteristics are missing for every person in a household. First, we look to prior survey or census responses and other existing records for the housing unit, but if those are unusable or unavailable, we impute the information from similar nearby households.
  • Group quarters by using people with reported data in similar nearby group quarters (e.g., another college dorm).
  • Tenure by using data from similar nearby occupied housing units.
  •  Detailed vacancy status by using data from other nearby vacant units.

There’s one other allocation method we can use for individuals missing Hispanic origin or race before looking at data from nearby neighbors. For people living in households, we fill it in using information from other household members. For example, we use information from a parent if they report their race but do not provide it for their child. 

Next Steps

Once all the data have been processed, missing data imputed, and internal quality checks completed, the next step is to apply differential privacy to prevent unauthorized disclosure of confidential data. Upcoming blogs will provide more details about these files and products.

We plan to provide characteristic imputation rates by key demographic and housing items in 2022.

We emphasize that characteristic imputation is only implemented long after all data collection has ended. The imputation methods follow after all attempts to obtain a response have been exhausted.

We prefer when information about a household or group quarters facility comes directly from the household or the people at the group quarters facility who reported their demographic data. When they do not respond, this technique helps us deliver more complete and accurate statistics and statistical products.

Top

Back to Header