On this page:
The PPMF is the only 2020 Census data product that allows data users to create their own custom tabulations.
The microdata files that make up the PPMF were used to create the Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristic File (DHC), and the Demographic Profile. Tabulations from the PPMF are entirely consistent with tables from these data products.
The PPMF allows for the tabulation of data with more detail and at lower levels of geography than is available in other 2020 Census data products. Therefore, data users should exercise caution when generating tables not included in the tabular 2020 Census data products because the potential inaccuracies may be an effect of the disclosure protections not being tuned for their custom tabulation.
The PPMF allows for the tabulation of data with more detail and at lower levels of geography than is available in other 2020 Census data products.
Data users should exercise caution when generating tables not included in the tabular 2020 Census data products because the potential inaccuracies may be an effect of the disclosure protections not being tuned for their custom tabulation.
The 2020 Census Detailed DHC-A and Detailed DHC-B are the only decennial sources with data on detailed race and ethnicity groups and American Indian and Alaska Native tribes and villages. The PPMF does not include data on detailed race and ethnicity.
The PPMF is the successor to the Public Use Microdata Sample (PUMS) files released in previous decades, and is made possible by the new disclosure avoidance methodology used for the 2020 Census. Unlike the PUMS file that included only a small percentage of records, the PPMF is a 100 percent microdata file for the 2020 Census.
It is important to note that while the data in the PPMF look like individual records, all of the data are protected against disclosure.
The microdata records generated by the 2020 Census Disclosure Avoidance System (DAS) ensure the confidentiality of respondent information through the application of differentially private statistical noise. The microdata included in the PPMF do not include any actual census responses. They are simply the microdata format used by the Census Bureau’s production system to produce the published tables.
The PPMF data are contained in a single zip file. Once unzipped, the files are in a comma-separated value (.csv) Unicode Transformation Format (UTF-8).
There are two files: one for people (2020_ppmf_person.csv) and one for housing units (2020_ppmf_unit.csv). Each file includes data for all 50 states, the District of Columbia, and Puerto Rico.
Note: Because the files are so large, you may experience long download times and interruptions. Users should also consider utilizing a statistical programming language to open these files and perform analysis.
The PPMF allows for the tabulation of data with more detail and at lower levels of geography than what was included in those products. Therefore, data users should exercise caution when generating tables not included in the tabular 2020 Census data products because the potential inaccuracies may be an effect of the disclosure protections not being tuned for their custom tabulations.
Data users should similarly exercise caution when performing microdata analysis on the PPMF records because the disclosure protections may impact the accuracy of their findings, particularly when analyzing patterns at lower levels of geography.
More information is available in the technical documentation.
As with all Census Bureau data products, the PPMF uses disclosure avoidance methods to protect respondent confidentiality. It works by adding statistical noise—small, random additions or subtractions— to every published statistic to reduce the likelihood that characteristics about a specific person or household can be accurately inferred using any combination of the published data.
The Census Bureau worked closely with the data user community to implement these protections.
The 2020 Census is the first to be able to quantify disclosure avoidance-related variability because it uses a more sophisticated approach for disclosure avoidance. More information and guidance is available in the technical documentation.