The U.S. 
Census Bureau

Software, Databases, Product Development and Integration

Marketing and Data Dissemination Roundtable Meeting

Washington, DC April 4-5, 2000

Statistics Canada: Anil Arora; arorani@statcan.ca

Background

Technology has traditionally played a pivotal role in the processing and dissemination of statistical output, be it in the form of mainframes crunching gigabytes of micro level data or PCs using desktop software to produce simple aggregate tables for external users. The pre-Internet years allowed us to keep most of the technology (and in turn direct access to data) well out of reach of the external user, and restrict access only to in-house experts who understood systems, could translate a request, and then formulate a query to retrieve expected results. The same (or another) expert delivered the results and was there to support and guide the user with proper and intended use of the data set.

The external user today, expects 7/24 accessibility, friendly (personalized and intimate) on-line interfaces that intuitively teach (first timers) and allow him/her to formulate a complex query. The user expects to retrieve pertinent data and meta-data in less than 4 clicks, within 5 or 6 seconds, and in output formats that interact with the exact flavour of the software he/she has on his/her PC. Needless to say, over the past 4 years, the popularity of the Internet (and in turn user demand) has forced a profound change in the way we use automated tools, and how we drive software development at Statistics Canada. Our web site (www.statcan.ca) is accessed by over 15,000 users daily, and is considered to be our primary mode of dissemination.

The Internet has provided us with the incentive (and pressures) to take our legacy systems and rework them to suit this self-serve medium. We've turned systems 180 degrees, moving from expert-driven to novice-ready, from the back-office to open accessibility, from internal use-only to wide-open external access, and from proprietary software to open standards software. Our On-line Catalogue is one such example, where it started off being a tool for internal reference only. The database has been (and continues to be) reworked to allow the external user access to information on all of our Products and Services on-line. Systems never intended/designed to talk to each other, now depend on each other and share information only stored once, providing a seamless front to the external user (for example our On-line Catalogue and the e-commerce facility).

Dissemination Databases

Data

Statistics Canada has made a conscious decision to place all aggregate data into our corporate data warehouse: CANSIM (Canadian Socio-economic Information Management system). Currently the Census and Trade databases are separate entities and they will be incorporated into CANSIM II in the near future.

CANSIM and CANSIM II: Relational database with 800,000 time-series (in multi-dimensional table structure) of Socio-economic data. Database is/will be linked to IMDB, IPS (Online Catalogue), related publications, research papers, articles, Canadian Statistics Tables, and The Daily. Platform: Unix Oracle 8.1. CANSIM has been in existence for over 30 years, CANSIM II (relational database with multi-dimensional tables) will be available internally within Statistics Canada in April 2000 with external access in the fall of 2000. CANSIM II will be Statistics Canada's data warehouse for all aggregate data and the engine for various output mediums (tables, dynamic publications, CD-ROMs, custom retrievals). Current dynamic publishing tools include SGML outputs using the CALS format for online table viewer, Pagemaker and Excel software for texts and charts for creation of PDF and paper formats. OLAP tools include Beyond 20/20 and Dynamicube.

International Trade Database: Relational database with commodity level data for exports and imports. Platform: Unix Oracle 8.1. Plans are to integrate this database onto the same server as for the CANSIM II database and the CDPS database (Phase 1) in order to make maximum use of the Oracle Power Unit licensing approach.

Meta-data

Statistics Canada has made the decision to integrate all of its meta-data holdings (for dissemination purposes as a first step) into one database, the Integrated Meta Database (IMDB). The IMDB is currently in the process of being developed.

Integrated Meta-database: Statistics Canada's comprehensive database of dissemination related meta-data. The IMDB contains survey program meta-data (sample sizes, sampling strategy, methodology, qualitative indicators, questionnaires, contact information, etc.) that users should know for the proper use of data. Platform: Unix, Oracle 8.1. This database is currently being developed with the first phase including meta-data at the survey/program level, with subsequent phases involving variable level meta-data. The IMDB will be linked to CANSIM II, On-line catalogue, Publications, The Daily, Canadian Statistics Tables, etc. The template for subject matter areas will be integrated with the template for the CDPS.

Databases involved in interaction between Statistics Canada and external users for dissemination purposes

The Internet site is designed to be the primary integrated source for our external users to access our holdings. Through hyperlinks, the user is presented with a seamless interface giving the illusion of one central system. The reality is that the multitude of systems (front-end and back-end) are in various stages of being integrated physically and logically. The following systems perform this interaction function to various degrees.

INTERNET Site: Statistics Canada's web site consists of 12 servers (Unix and NT) with over 80,000 HTML pages. Plans are underway to dynamically generate the entire web site using database publishing tools.

Corporate Database on Products and Services (CDPS): Only accessible internally, contains information on all Statistics Canada’s Products and Services, including ongoing costs and yearly revenues associated with the production of each product/service. Platform: Unix, Oracle 8.1. Currently being redesigned to group product families, lower meta-data maintenance for each record, allow web-based order processing of electronic documents and hard-goods (by injecting price and commodity information to e-commerce facility). The current template allowing subject matter areas to maintain meta-data uses Cold Fusion technology, and is being redesigned using Oracle Forms. The template is planned to integrate with the Integrated Meta-data base (IMDB).

Information on Products and Service (IPS/On-line Catalogue): Is a subset of the CDPS, containing information relevant to the external user (over the web) when searching or ordering one of our products/services (abstract, price, medium, release dates, related information (other products or databases), etc.). An SGML coded document is created each night and HTML pages are loaded onto our Web site for searching using the OpenText search engine. Both a simple and expert search (fielded) are made available to users. The On-Line Catalogue record is used to integrate various other sources on the site: IMDB, Publications, Research papers, Articles, CANSIM database, Trade database, Historical documents database-Bibliocat.

Corporate Sales Support System (CSSS): Available internally, it is a database of approximately 80,000 clients, tracking all transactions conducted between them and Statistics Canada. Modules include Accounts Receivable, Accounts Payable, Inventory, Subscription, Power Tools, Order processing. Platform: Progress RDBMS on an Alpha-Unix server. The system/process is currently being re-engineered to allow clients to conduct business over the web. A number of web-based systems have been created in the last year to reduce overall costs/efforts for transaction processing (DSP, Client registration payment system, on-line ordering of hard goods, e-commerce). In the future, the transaction processing functions will be distributed, web-based and the repository database will be integrated with CDPS, Statistics Canada's Financial system (CDFS) and divisional transaction-processing systems.

The DAILY: Statistics Canada's daily newspaper summarizing/highlighting the day's releases. The HTML document provides links to the IPS/On-line catalogue, Canadian Statistics Tables, Electronic publications, IMDB, CANSIM, Trade, Research papers and articles (as relevant for the particular release). The process is heavily automated with an SGML version that feeds various mediums (HTML, PDF, paper, Text to speech, etc.). Issues back to 1995 are directly accessible from the web site.

Integration

Integration is underway at various levels:

As with all approaches/strategies, there are advantages and disadvantages-both short and long term!

Issues

The level of impact to dissemination systems presented by the Internet and the strategies adopted to deal with it.

Establishing common output standards and working coherently to influence private sector product development: OLAP (DTDs) Common Software?

Reaction to changes in pricing practices of companies such as Oracle (how do you react to pricing policies, in-house investment)

Dynamic Publishing on an International Scale-database driven?

How do we share best practices, new in-house developments/tools, experiences, and approaches?

Government on-line type initiatives (issues of relevance, visibility, cost-recovery).



Source: U.S. Census Bureau, Marketing Services Office,
Research, Planning and Evaluation Staff

Created: May 26, 2000
Last revised: June 08 2000