WebEx Event number (if needed): 2824 630 8639
WebEx Event password (if needed): Census#2
Supporting Flexible Computations: Integrated Formula Analysis and Calculation Tool
David Rozenshtein, PhD, Omnicom Consulting Group, Inc.
As part of an on-going central statistical systems modernization project, we have developed an integrated formula analysis and calculation tool (IFACT) to support the needs of calculating detailed accounts. IFACT allows analysts to specify computation formulas in an intuitive, MS Excel-like notation, and then evaluates them over the data in the database. IFACT supports: full arithmetic (+, -, *, and /); binary comparators (=, !=, <, etc.); logical conditions (using NOT, AND, and OR) that use 3-valued logic to properly account for missing values (NULLs); aggregate and scalar functions; multiple formula preferences; ability to conditionally reconfigure formulas; etc. The IFACT engine translates sets of formulas into labeled directed acyclic graph structures, loads them into the database tables, captures formula interdependencies (in order to properly sequence the computation), and then acts as an interpreter over these formulas, calculating results. Because formulas are not part of the system source code, analysts can modify them, and thus system behavior, without any reprogramming. Importantly, IFACT supports full auditability over its computations. The tool is written entirely in SQL except for a small component that translates the IFACT formula files into an XML representation.
Enhancing Survey Data Quality: Integrated Validation, Auto-Edit, and Search Tool
Alice Ramey, U.S. Bureau of Economic Analysis
As part of an on-going central statistical systems modernization project, we have developed an integrated validation, auto-edit, and search tool (IVEST) to support processing of federal surveys. The IVEST system allows analysts to specify a variety of criteria for searching through survey data. These criteria are then used to validate, correct, and/or enhance the data when certain errors are discovered or specific conditions are met. The system also supports a sophisticated multi-level approval structure for overriding rule violations when necessary. The rule language supported by IVEST has a natural, user-friendly syntax, yet is expressive enough to allow for any conditions normally expressible within SQL. At its essence IVEST is a code generator, itself implemented in SQL, that translates IVEST rules into efficient SQL queries. Because IVEST rules are not hard-coded into the system source code, analysts can modify them, and thus system behavior, without changes to underlying programs.
Supporting Data Non-Disclosure: Secondary Suppression Analysis, Suggestion, and Audit Tool
Sandip Mehta, Omnicom Consulting Group, Inc.; Melanie Carrales, U.S. Bureau of Economic Analysis
Secondary suppression is used in supporting non-disclosure of data cells in a multi-dimensional table space, a notoriously difficult problem. As part of an on-going central statistical systems modernization project, we have developed a suite of tools to support secondary suppression. The three most significant tools of this suite are: a tool for analyzing a current state of suppression, including reporting on "broken" cells; a tool for choosing candidates (based on a variety of criteria) for additional suppression necessary to protect currently suppressed cells; and a suppression audit tool for showing why certain suppressions were chosen by the system and the various dependencies that exist among suppressed cells. Our suppression tools are both periodicity-aware and history-aware, i.e., suppressions are coordinated between annual and quarterly data, as well as with prior vintages/revisions. They also allow for analyst overrides (both positive and negative) of system-selected suppressions and support an iterative collaborative process between the analysts and the system in establishing the final secondary suppression pattern.
Analyzing and Automating Processing Workflows
Benjamin Kavanaugh, U.S. Bureau of Economic Analysis
By the nature of their business, some of the systems we have built as part of an on-going central statistical systems modernization project have very complex computation processes that involve many thousands of distinct interdependent tasks, authored by multiple groups of analysts and users who normally are not in synch with each other, at least during the early stages of processing. Figuring out the correct sequence of execution of these tasks, and the overall synchronization state of the system, is an activity not well suited to manual control. We have developed a system that takes in information about computation tasks and their interdependencies, builds the dependency graph, and then automatically manages the overall computation process (including incrementally recalculating only the necessary tasks) and reports on the system synchronization state. It collects performance statistics of executions and provides time estimates on pending tasks.
Aggregating time-series data in multiple dimensions, which themselves change over time
Benjamin Cowan, U.S. Bureau of Economic Analysis
Mathematical aggregations are a common form of computation encountered in survey data processing and analysis systems. As part of an on-going central statistical systems modernization project, we have developed a variety of metadata-driven aggregators that operate on multiple dimensions. These aggregators dynamically adapt to: the dimensions that are present; the taxonomies that exist and their membership and inter-element relationships; how these taxonomies change over time; which taxonomies are used for which dimensions; which aggregation steps involve which subsets of dimension; the dependencies among the aggregation steps; etc. All specifications are represented in metadata authored and controlled by the analysts outside of the system source code. This allows analysts to change the behavior of the system without changing its programming. The aggregators synchronize multidimensional aggregations of time series data with evolving taxonomical structures. They support a completeness check to ensure that all children of a given aggregate have values, as well as direct specification of values for aggregates that fail to compute naturally. The aggregators also have mechanisms to control computational explosions that are common in multi-dimensional situations. The aggregation engines are written in SQL and are essentially applications of several breadth-first graph processing algorithms - some based on level-by-level graph rolls, and others based on pre-computed partial and full closures.