Designing the Load Programs


The final step in the ETL process is loading the BI target databases, which can be accomplished in either of two ways: (1) by inserting the new rows into the tables or (2) by using the DBMS load utility to perform a bulk load. It is much more efficient to use the load utility of the DBMS, and most organizations choose that approach.

Once the extract and transformation steps are accomplished, it should not be too complicated to complete the ETL process with the load step. However, it is still necessary to make design decisions about referential integrity and indexing.

Referential Integrity

Because of the huge volumes of data, many organizations prefer to turn off RI to speed up the load process. However, in that case the ETL programs must perform the necessary RI checks; otherwise , the BI target databases can become corrupt within a few months or even weeks. Acting on the idea that RI checking is not needed for BI applications (because no new data relationships are created and only existing operational data is loaded) does not prevent database corruption! Corruption of BI target databases often does occur, mainly because operational data is often not properly related in the first place, especially when the operational data is not in a relational database. Even if the operational data comes from a relational database, there is no guarantee of properly enforced RI because too many relational database designs are no more than unrelated flat files in tables.

graphics/hand_icon.gif

When RI is turned off during the ETL load process (as it should be, for performance reasons), it is recommended to turn it back on again after the load process has completed in order to let the DBMS determine any RI violations between dependent tables.

Indexing

Poorly performing databases are often the result of poorly performing indexing schemes. It is necessary to have efficiently performing indices, and to have many of them, because of the high volume of data in the BI target databases. However, building index entries while loading the BI tables slows down the ETL load process. Thus, it is advisable to drop all indices before the ETL load process, load the BI target databases, and then recreate the indices after completing the ETL load process and checking RI.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net