Chapter Eleven. Step 11: ExtractTransformLoad Development


Chapter Eleven. Step 11: Extract/Transform/Load Development

graphics/ch11.gif

Chapter Overview

This chapter covers the following topics:

  • Things to consider about extract/transform/load (ETL) development

  • Common data transformation activities and the chronic underestimating of data transformation efforts

  • Three types of reconciliation totals that should be produced by the ETL process: record counts, domain counts, and amount counts

  • Peer reviews

  • Six applicable ETL testing procedures: unit testing, integration testing, regression testing, performance testing, quality assurance (QA) testing, and acceptance testing

  • How to create a formal test plan

  • Brief descriptions of the activities involved in ETL development, the deliverables resulting from those activities, and the roles involved

  • The risks of not performing Step 11

Things to Consider

Source Data Extracts

Who will write the ETL programs? Have those developers written ETL programs before? Do they understand the ETL process?

Do ETL programs already exist from a previous release or another BI application? How many of them have to be expanded?

Can we ask the programmers of the operational systems to give us the extract files, or do we have to get the source data ourselves ?

What do we need to know about the operational systems before we can get the data? What operational programs have to finish running before we can extract the data from the source files and source databases?

ETL Tool

Have we worked with this ETL tool before, or is it new to us?

Has the ETL team been sufficiently trained on the ETL tool?

Can the ETL tool perform all the required transformations, or will we have to write some custom code? In what language (C++, COBOL)?

ETL Process Dependencies

What are the dependencies among program modules? In what sequence do we have to run our ETL programs (or the ETL tool modules)?

How many program modules can we run in parallel?

What are the dependencies among the tables? Do some tables have to be loaded before others?

How many tables can we load in parallel?

Testing

Will we conduct peer reviews? Are we using extreme programming (XP) techniques?

How many testers will we have on the project?

Will the subject matter expert and business representative participate in testing?

Who will be the testing coordinator ? Who will log the test results and maintain the test log?

What type of testing do we need to perform? Integration or regression testing? Performance testing? QA testing? Acceptance testing?

Which business people will participate in acceptance testing? Only the business representative? The subject matter expert? Other business people?

Technical Considerations

What technical platform issues do we have to take into consideration?

How is the staging area set up? On a dedicated server?

Will the ETL process be split between the mainframe and one or more servers?

What environments does the ETL tool run in?

What type of middleware do we need?

The use of ETL tools has become very widespread, but organizations that use them discover very quickly that these tools have their limitations. Depending on the complexity of the required source data transformations and on the age and condition of the source files, custom code often has to be written to augment the ETL tool functionality.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net