1342-1343

Previous Table of Contents Next

Page 1342

  • Reference table population software: Reference tables serve as the list of legal values in the data warehouse. You have two basic options for reference table population. The first option is to obtain the data from some other data source (for example, one of your OLTP systems). If you do not have this data in another online system, you might have to build a series of forms to enable your users to maintain these tables manually in the data warehouse. Either way, these software modules generally are relatively simple to develop.
  • Aggregation routines: If you use aggregation tables to keep information for a longer period of time, you need to build software routines to perform these aggregations. These routines typically use the summation function of a SQL query to aggregate the information over a where clause that covers the date range of interest. Typically, there will be a series of aggregation routines ”one for each set of aggregations you need to perform. One set of routines can be run monthly to calculate monthly sums from the weekly sums, for example. Another set of routines can calculate the weekly sums from the daily sums. You determine which set of aggregations to use based on the periods of information useful to your users.
  • Summarization routines: You use these routines to generate data tables that match the queries of your user community that are both common and time consuming. These routines are relatively simple to design because you already have the SQL from the user queries. All you do is execute that select SQL as part of an insert into... select... statement. Generally, you should purge these tables routinely to keep down their size . You should base the purging interval on the amount of time covered in the various user reports and the queries that use this summarized information. You have to use a bit of judgment when designing these routines to determine which summaries make sense to implement. You do not want to waste a lot of disk space and nighttime processing time (which often can be limited in data warehouses that receive their information late at night and need to be up by the beginning of the next business day) to calculate summaries that are used only rarely.
  • Archive routines: Many data warehouses do not need archive routines; the information is archived in the parent OLTP systems and can be restored if needed from those systems. Many OLTP systems do not provide an efficient archive mechanism, however. Also, the time to recover that information and put it into a data warehouse format if needed in the future may be prohibitive. Therefore, you might have to design routines that take the information in your tables and put it into a format that can be stored offline. You should consider two formats of these routines. The most basic format extracts the information to be purged from the system (usually based on it being older than a certain date) and stores it in an ASCII text file. You then can archive this file to tape. If you need this information in the future, you can use Oracle's Loader product to copy the information back into database tables. Another option that recently became available is to copy the information from your base tables to a series of archive tables in a separate tablespace. You then can back up these archive tables by using Oracle's Export utility or a simple operating system copy operation to

Page 1343

  • a CD. If you choose the CD option, you can set up this tablespace as a read-only tablespace that then can be mounted directly on many operating systems. You then can simply mount the appropriate archive CD and use standard reports and queries against the Oracle database to access its information without having to run an Import or Loader job.
  • Purge routines: Eventually, you probably will have to get rid of some information from your data warehouse system. This usually takes the form of a series of SQL delete... statements run against the various base, aggregation, and summarization tables. You delete rows based on a where... clause in the delete statement. These purge routines often are run with the table-population routines to form a series of batch jobs. After you calculate your weekly aggregations, for example, you might purge the data from the base tables for records that contain information more than a certain number of days old. These often are the hardest routines to sell to the user community because, in an ideal world, users would have all the information they could ever want online.

Now take a look at the format of these software routines and processes. The good news is that you can choose from a number of alternatives. That also can be the bad news, however, because you might have to wade through a series of claims from vendors and attend a number of demonstrations while trying to keep up with a tight project schedule. You always can use the tools Oracle provides as part of its basic DBMS as a starting point in your development process. The combination of Oracle's Loader, Import, Plus, and PL/SQL offerings (both scripts and stored procedures) can handle the requirements in most of the data warehouse situations I have run across. Of course, you still need to build data-extraction routines that obtain the information from your OLTP systems. You generally will use the same tools used for other programming purposes on these systems. Suppose that you have a large group of COBOL programmers maintaining your OLTP system on a mainframe. You probably will wind up using COBOL to write the data-extraction routines (unless there are format issues, time issues, and so on).

Although you can use a series of standard programming tools to develop the complex series of scripts needed to maintain your data warehouse, other alternatives are appearing on the market. Oracle recently introduced its Data Marts product, which gives you a series of tools that makes it easier to develop and maintain a data warehouse. The extraction routines are built using a graphical exiting tool that pops up dialog boxes for the various types of data transformations. There is even a tool to help you capture the schemas of the source and target databases. Oracle Data Marts and similar products will continue to evolve , as will other programming tools (for example, Visual Basic), to increase the productivity of developers working in the data warehouse environment.

Other Types of Databases

Other forms of analytical databases are available as well. The Oracle relational database management system (RDBMS) is based on a series of data tables that store information. To access this information, you issue a series of queries against the database. The challenge here is that

Previous Table of Contents Next


Oracle Unleashed
Oracle Development Unleashed (3rd Edition)
ISBN: 0672315750
EAN: 2147483647
Year: 1997
Pages: 391

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net