The CPS is subject to two sources of nonresponse. The largest is noninterview households. To compensate for this data loss, the weights of noninterviewed households are distributed among interviewed households. The second source of data loss is from item nonresponse, which occurs when a respondent either does not know the answer to a question or refuses to provide the answer. Item nonresponse in the CPS is modest.
One of three imputation methods are used to compensate for item nonresponse in the CPS. Before the edits are applied, the daily data files are merged and the combined file is sorted by state and PSU within state. This sort ensures that allocated values are from geographically related records; that is, missing values for records in Maryland will not receive values from records in California. This is an important distinction since many labor force and industry and occupation characteristics are geographically clustered.
The edits effectively blank all entries in inappropriate questions (e.g., followed incorrect path of questions) and ensure that all appropriate questions have valid entries. For the most part, illogical entries or out-of-range entries have been eliminated with the use of electronic instruments; however, the edits still address these possibilities, which may arise from data transmission problems and occasional instrument malfunctions. The main purpose of the edits, however, is to assign values to questions where the response was "Don’t know" or "Refused." This is accomplished by using 1 of the 3 imputation techniques described below.
The edits are run in a deliberate and logical sequence. Demographic variables are edited first because several of those variables are used to allocate missing values in the other modules. The labor force module is edited next since labor force status and related items are used to impute missing values for industry and occupation codes and so forth.
The three imputation methods used by the CPS edits are described below:
All CPS items that require imputation for missing values have an associated hot deck . The initial values for the hot decks are the ending values from the preceding month. As a record passes through the editing procedures, it will either donate a value to each hot deck in its path or receive a value from the hot deck. For instance, in a hypothetical case, the hot deck for question X is defined by the characteristics Black/non-Black, male/female, and age 16−25/25+. Further assume a record has the value of White, male, and age 64. When this record reaches question X, the edits determine whether it has a valid entry. If so, that record’s value for question X replaces the value in the hot deck reserved for non-Black, male, and age 25+. Comparably, if the record was missing a value for item X, it would be assigned the value in the hot deck designated for non-Black, male, and age 25+.
As stated above, the various edits are logically sequenced, in accordance with the needs of subsequent edits. The edits and codes, in order of sequence, are:
Household edits and codes. This processing step performs edits and creates recodes for items pertaining to the household. It classifies households as interviews or noninterviews and edits items appropriately. Hot deck allocations defined by geography and other related variables are used in this edit.
Demographic edits and codes. This processing step ensures consistency among all demographic variables for all individuals within a household. It ensures all interviewed households have one and only one reference person and that entries stating marital status, spouse, and parents are all consistent. It also creates families based upon these characteristics. It uses longitudinal editing, hot deck allocation defined by related demographic characteristics, and relational imputation.