Statistical Quality Standard E1: Analyzing Data
Purpose: The purpose of this standard is to ensure that statistical analyses‚ inferences‚ and comparisons used to develop information products are based on statistically sound practices..
Scope: The Census Bureau's statistical quality standards apply to all information products released by the Census Bureau and the activities that generate those products‚ including products released to the public‚ sponsors‚ joint partners‚ or other customers. All Census Bureau employees and Special Sworn Status individuals must comply with these standards‚ including contractors and other individuals who receive Census Bureau funding to develop and release Census Bureau information products.
In particular‚ this standard applies to the analyses performed to generate information products. It includes analyses:
- Used to produce Census Bureau information products (e.g.‚ reports‚ news releases‚ conference papers‚ journal articles‚ and maps)‚ regardless of data source.
- Conducted using census data‚ survey data‚ administrative records data‚ or any data linked with any of these sources.
- Performed during research to develop improved methodologies for frame construction‚ survey design‚ sampling‚ data collection‚ data capture‚ processing‚ estimation‚ analysis‚ or other statistical processes.
- Performed to evaluate the quality of Census Bureau data‚ methodologies‚ and processes.
- Conducted to guide decisions about processes or information products of the Census Bureau’s programs.
The global exclusions to the standards are listed in the Preface. No additional exclusions apply to this standard.
Key Terms: Bonferroni correction‚ cluster‚ covariance‚ direct comparison‚ goodness–of–fit‚ hypothesis testing‚ implied comparison‚ multivariate analysis‚ outliers‚ parameter‚ peer review‚ regression‚ sample design‚ Scheffe’s method‚ sensitivity analysis‚ significance level‚ statistical inference‚ and Tukey’s method.
Requirement E1–1: Throughout all processes associated with analyzing data‚ unauthorized release of protected information or administratively restricted information must be prevented by following federal laws (e.g.‚ Title 13‚ Title 15‚ and Title 26)‚ Census Bureau policies (e.g.‚ Data Stewardship Policies)‚ and additional provisions governing the use of the data (e.g.‚ as may be specified in a memorandum of understanding or data–use agreement). (See Statistical Quality Standard S1‚ Protecting Confidentiality.)
Requirement E1–2: A plan must be developed prior to the start of the analysis that addresses‚ as appropriate:
- A description of the analysis‚ addressing issues such as:
- Research questions or hypotheses.
- Relevant literature.
- A description of the data‚ addressing issues such as:
- The data source(s).
- Key variables and how they relate to the concept(s) in the hypotheses.
- Design and methods used to collect and process the data.
- Limitations of the data.
- A description of the methodology‚ addressing issues such as:
- Analysis methods (e.g.‚ demographic and economic analysis techniques‚ ANOVA‚ regression analysis‚ log–linear analysis‚
nonparametric approaches‚ box plots‚ and scatter plots).
- Key assumptions used in the analysis.
- Tests (e.g.‚ z–tests‚ F–test‚ chi–square‚ and R–squared) and significance levels used to judge significance‚
goodness–of–fit‚ or degree of association.
- Limitations of the methodology.
- Appropriateness of the data and underlying assumptions and verification of the accuracy of the computations.
- During a data analysis project‚ the focus of the analysis may change‚ as the researcher learns more about the data. The analysis plan should be updated‚ as appropriate‚ to reflect major changes in the direction of the analysis.
- Statistical Quality Standard A1‚ Planning a Data Program‚ addresses overall planning requirements‚ including schedule and estimates of costs.
Requirement E1–3: Statistically sound practices that are appropriate for the research questions must be used when analyzing the data.
Examples of statistically sound practices include:
- Reviewing data to identify and address nonsampling error issues (e.g.‚ outliers‚ inconsistencies within records‚ missing data‚ and bias in the frame or sample from which data are obtained).
- Validating assumptions underlying the analysis‚ where feasible.
- Developing models appropriate for the data and the assumptions. (See Statistical Quality Standard D2‚ Producing Estimates from Models.)
- Using multiple regression and multivariate analysis techniques‚ when appropriate‚ to examine relationships among dependent variables and independent variables.
- Using a trend analysis or other suitable procedure when testing for structure in the data over time (e.g.‚ regression‚ time series analysis‚ or nonparametric statistics).
Sub–Requirement E1–3.1: The data analysis must account for the sample design (e.g.‚ unequal probabilities of selection‚ stratification‚ and clustering) and estimation methodology.
- If it has been documented that a particular methodological feature(s) has no effect on the results of the analysis‚ then it is not necessary to account for that feature in the analysis (e.g.‚ if using weighted and unweighted data produce similar results‚ then the analysis may use the unweighted data; if the variance properties for clustered data are similar to those for unclustered data‚ then the analysis need not account for clustering).
Requirement E1–4: Any conclusions derived from sample data must be supported by appropriate measures of statistical uncertainty.
Examples of measures of statistical uncertainty that support conclusions include:
- Confidence or probability intervals with specified confidence levels (e.g.‚ 90% or 95%).
- Margins of error for specified confidence levels‚ provided the sample size is sufficiently large that the implied confidence interval has coverall close to the nominal level.
- P–values for hypothesis tests, such as are implied when making comparisons between groups or over time. Comparisons with p–values greater than 0.10, if reported, should come with a statement that the difference is not statistically different from zero.
- Confidence intervals‚ probability intervals or p–values should be statistically valid and account for the sample design (e.g.‚ accounting for covariances when the estimates are based on clustered samples). If based on a model‚ then the key assumptions of the model should be checked and not contradicted by the observed data. (See Statistical Quality Standard D2‚ Producing Estimates from Models.
Note: Although the p–value does not indicate the size of an effect (or the size of the difference in a comparison)‚ p–valuse below 0.01 constitute strong evidence against the null‚ p–values between 0.01 and 0.05 constitute moderate evidence‚ and p–values between 0.05 and 0.10 constitute weak evidence.
Sub–Requirement E1–4.1: The same significance level or confidence level must be used throughout an analysis. Table A shows the requirements for specific information products:
Table A: Significance and Confidence Levels by Information Product
|Census Bureau publications
|All other information products (e.g.‚ working papers‚ professional papers‚ and presentations)
||0.10 or less
||0.90 or more
Requirement E1–5: The data and underlying assumptions must be appropriate for the analyses and the accuracy of the computations must be verified.
Examples of activities to check the appropriateness of the data and underlying assumptions and the accuracy of the computations:
- Checking that the appropriate equations were used in the analysis.
- Reviewing computer code to ensure that the appropriate data and variables are used in the analysis and the code is correctly programmed.
- Performing robustness checks (e.g.‚ checking that unexpected results are not attributable to errors‚ examining plots of residuals to assess fit of models and comparing findings against historical results for reasonableness).
- Performing sensitivity analyses using alternative assumptions to assess the validity of measures‚ relationships‚ and inferences.
- Requesting peer reviews by subject matter‚ methodological‚ and statistical experts to assess analysis approach and results.
Requirement E1–6: Documentation needed to replicate and evaluate the analysis must be produced. The documentation must be retained‚ consistent with applicable policies and data–use agreements‚ and must be made available to Census Bureau employees who need it to carry out their work. (See Statistical Quality Standard S2‚ Managing Data and Documents.)
Examples of documentation include:
- Plans‚ requirements‚ specifications‚ and procedures relating to the analysis.
- Computer code (e.g.‚ SAS code).
- Data files with weighted and unweighted data.
- Outlier analysis results‚ including information on the cause of outliers‚ if available.
- Error estimates‚ parameter estimates‚ and overall performance statistics (e.g.‚ goodness–of–fit statistics).
- Results of diagnostics relating to the analysis.
Back to Main
- The documentation must be released on request to external users‚ unless the information is subject to legal protections or administrative restrictions that would preclude its release. (See Data Stewardship Policy DS007‚ Information Security Management Program.)
- Statistical Quality Standard F2‚ Providing Documentation to Support Transparency in Information Products‚ contains specific requirements about documentation that must be readily accessible to the public to ensure transparency of information products released by the Census Bureau.