ETL Design Activities


The activities for ETL design do not need to be performed linearly. Figure 9.7 indicates which activities can be performed concurrently. The list below briefly describes the activities associated with Step 9, ETL Design.

  1. Create the source-to-target mapping document.

    Use the source data analysis results and the business rules from the previous steps and incorporate them into the transformation specifications. Document the transformation specifications in a source-to-target mapping matrix or spreadsheet.

  2. Test the ETL tool functions.

    It is very important to test the ETL tool functions before designing the ETL process flow and before deciding how to set up the staging area. For example, it would be worthless to install a currently popular ETL tool that cannot read flat files on a mainframe if 90 percent of your source data is in flat files on your mainframe. Therefore, test the ETL tool functions and determine whether supplemental code must be written to perform some complicated and lengthy transformations that the tool cannot handle.

  3. Design the ETL process flow.

    The most challenging aspect of ETL design is creating an efficient ETL process flow. Because most data staging windows are very smallonly a few hours per nightthe ETL process must be streamlined as much as possible. That means breaking the ETL process into small program components so that as many as possible can be run in parallel.

  4. Design the ETL programs.

    Since most organizations require several years of historical data to be loaded with the first BI application release, there are three sets of ETL programs to consider: the initial load, the historical load, and the incremental load. The incremental load will probably be a delta load and will therefore be the most complicated to design. Modularize the ETL programs as much as possible, and create programming specifications for each ETL program module.

  5. Set up the ETL staging area.

    Determine whether you need a centralized staging area on a dedicated server or whether it would make more sense to implement a decentralized staging area in your environment. Deciding factors are the type and location of source files and source databases, as well as the functions, capabilities, and licensing terms of the ETL tool.

Figure 9.7. ETL Design Activities

graphics/09fig07.gif

graphics/hand_icon.gif

Do not create a separate staging area for each data mart. A decentralized, coordinated staging area is not the same thing as separate, uncoordinated staging areas for different BI target databases and different BI applications.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net