Beginning in August 2002, the Bureau of the Census began taking additional steps in the Current Population Survey (CPS) Public Use Files to further protect the confidentiality of the individuals in sample. The method chosen was to "mask" the age variable (PRTAGE). Depending on the demographic characteristics of all members of the household, ages of selected household members were adjusted to increase confidentiality protection.
This masking, or age perturbation, resulted in some inconsistent male/female sex ratios especially for people age 65 and over in the 2003-2010 CPS files. Age perturbation was applied to both internal (used to generate Census Bureau reports and tables) and public use files. The following table displays 2009 CPS Annual Social and Economic Supplement (ASEC) data from the internal file for people 60 and older with and without the age perturbation. An example of the inconsistency is shown in the 65 age group, where the sex ratio estimate using unperturbed data is .97 and the sex ratio estimate using perturbed data is 1.04. This problem was described in more depth in a January 2010 National Bureau of Economic Research Working Paper (No. 15703) by J. Trent Alexander, Michael Davern and Betsey Stevenson, "Inaccurate age and sex data in the Census PUMS files: Evidence and Implications."
| Age | Perturbed | Unperturbed | ||||
|---|---|---|---|---|---|---|
| Male | Female | Sex Ratio | Male | Female | Sex Ratio | |
| 60 | 1,735 | 1,849 | 0.94 | 1,735 | 1,849 | 0.94 |
| 61 | 1,670 | 1,787 | 0.93 | 1,670 | 1,787 | 0.93 |
| 62 | 1,509 | 1,740 | 0.87 | 1,509 | 1,740 | 0.87 |
| 63 | 1,253 | 1,431 | 0.88 | 1,253 | 1,431 | 0.88 |
| 64 | 1,255 | 1,305 | 0.96 | 1,255 | 1,305 | 0.96 |
| 65 | 1,399 | 1,341 | 1.04 | 1,317 | 1,356 | 0.97 |
| 66 | 1,259 | 1,284 | 0.98 | 1,220 | 1,371 | 0.89 |
| 67 | 1,093 | 1,248 | 0.88 | 1,103 | 1,290 | 0.86 |
| 68 | 965 | 1,216 | 0.79 | 974 | 1,175 | 0.83 |
| 69 | 916 | 1,104 | 0.83 | 918 | 1,012 | 0.91 |
| 70 | 877 | 1,004 | 0.87 | 883 | 995 | 0.89 |
| 71 | 743 | 996 | 0.75 | 786 | 989 | 0.79 |
| 72 | 773 | 999 | 0.77 | 755 | 991 | 0.76 |
| 73 | 731 | 910 | 0.80 | 750 | 937 | 0.80 |
| 74 | 646 | 901 | 0.72 | 636 | 893 | 0.71 |
| 75 | 705 | 880 | 0.80 | 721 | 887 | 0.81 |
| 76 | 596 | 954 | 0.62 | 621 | 933 | 0.67 |
| 77 | 735 | 828 | 0.89 | 652 | 770 | 0.85 |
| 78 | 698 | 776 | 0.90 | 644 | 735 | 0.88 |
| 79 | 433 | 725 | 0.60 | 547 | 778 | 0.70 |
| 80 to 84 | 2,248 | 3,428 | 0.66 | 2,254 | 3,391 | 0.66 |
| 85 plus | 1,492 | 2,886 | 0.52 | 1,530 | 2,977 | 0.51 |
| Total | 23,731 | 29,591 | 0.80 | 23,731 | 29,591 | 0.80 |
Census is examining alternative age perturbation procedures. Until a suitable procedure is found, we will continue to use the existing procedure on public use files. We expect to implement a new procedure in January 2011.
Careful review of ASEC data for the ages most affected by the disclosure avoidance steps showed a few cases in which there were significant differences between the perturbed and unperturbed estimates. In all of these cases, however, the significance arises not because the difference is large, but because the high correlation between the estimates leads to a small confidence interval for the difference. Almost all the differences for income and poverty estimates were within a 90-percent confidence interval. Generally, the results of any income or poverty analysis based on perturbed age will not differ statistically from those using unperturbed age. The relatively small differences found when reviewing the ASEC data resulted in a decision to not re-release the 147 CPS files already in the public domain using the new method.
In rare instances, users may have a specific analysis that requires the use of the unmasked age data. ASEC files starting with the 2003 data and continuing forward containing both the actual and "masked" ages are available in the Census Research Data Centers (RDCs). The Bureau of the Census will take reasonable steps to ensure that users will have access to these files for appropriate analysis. Additionally, the Bureau of the Census will work directly with users for whom the RDCs do not provide a practical alternative for conducting their research.
A review of median household income and mean earnings of men and women by age and race and Hispanic origin revealed three statistically significant differences between perturbed and unperturbed estimates in 2008 (see INCOME TABLE [Excel 38k]). All significant differences occurred for mean earnings of men. In each case, the perturbed earnings estimate was higher than the unperturbed estimate:
A review of poverty rates by race and Hispanic origin revealed six statistically significant differences between estimates using perturbed and unperturbed ages in 2008 (see POVERTY TABLE [Excel 43k]). All significant differences occurred in the two narrowest age categories: 65 to 69 and 70 to 74. The perturbed poverty rate was lower than the unperturbed poverty rate in three cases:
The perturbed poverty rate was higher than the unperturbed poverty rate in three cases: