Chapter 6: Developing the ETL System

What could possibly be more fun?

Overview

The Extract, Transformation, and Load (ETL) application is the foundation of the DW/BI system. Business users dont see the ETL system, and they dont care about it. That is, they dont care unless the data isnt available first thing in the morning. They dont care unless the data is unreliable. If the business users care about ETL, its probably bad news for the DW/BI team.

Building an ETL system is largely a custom process. ETL tools, like SQL Server 2005 Integration Services, provide a huge productivity boost over hand-coding . But theres no getting around the fact that companies have diverse source systems. Integrating and aligning the data from those systems is an exercise in accounting for idiosyncrasies.

When Microsoft decided to rebuild Data Transformation Services (DTS), it took an If it aint broke, dont fix it approach. Comparing DTS with Integration Services, we can infer that the pale yellow background in the designer wasnt broken. Other than that, Integration Services is astonishingly different from DTS. Integration Services is a real ETL tool, and you can use it to build real, enterprise-class ETL systems. This chapter describes how to do exactly that.

Integration Services can be used in a wide variety of scenarios beyond ETL. In this chapter, we focus exclusively on how to implement the ETL system for a Kimball Method data warehouse. We come back to other Integration Services features and uses in Chapters 15 and 17. Still other uses of Integration Services fall outside the scope of this book, as of course do other uses of the relational database.

Although we expect readers to skip around in this book, you should read Chapter 5 before you start this chapter. And before you start developing your ETL system, you should think through most of the design issues discussed in that chapter. Look back at Figure 5.1 to re- orient yourself to where ETL system development falls in the Kimball Method Business Dimensional Lifecycle methodology.

This chapter makes significant use of the Adventure Works case study, and the associated packages available on the books web site at www.MsftDWToolkit.com. You can read the chapter on its own, but sample code is worth ten thousand words: Youll learn more if you pick through the sample packages. This is a long chapter, as befits the area that takes the majority of project implementation resources. Most readers will get a good enough idea of how Integration Services works by reading the first 15-20 pages of the chapter. ETL architects and developers should read it all.

Reference 

Theres more to say about ETL and Integration Services than we can possibly fit into a single chapter, however lengthy. Two key resources are The Data Warehouse ETL Toolkit (Kimball and Caserta, Wiley, 2004) and Professional SQL Server 2005 Integration Services , by Knight, Chaffin, Barnes, Warren, and Young (Wrox, 2006).

The chapter sections reference the Kimball Method 38 Subsystems of ETL (The 38 Subsystems of ETL by Ralph Kimball, Intelligent Enterprise , December 4, 2004). You can find a link to this article at www.kimballgroup.com/html/articlesbydate/articles2004.html .

This long chapter contains four main sections:

  • Getting Started: In the first portion of the chapter, you learn how to use BI Studio to create Integration Services projects, data sources, and connections. You also learn about package templates and create a master package.

  • Dimension Processing: This section shows you the basics of dimension processingextract, transform, and loadand walks you through the steps to build a package to populate a simple dimension. You learn how to manage slowly changing dimensions in Integration Services.

  • Fact Processing: This part of the chapter helps you understand how fact table extracts and cleaning differ from dimension processing. You learn about how to implement the surrogate key pipeline in Integration Services, and how to design alternative approaches to managing challenging fact table loads, like snapshots and late-arriving facts. This section presents the two most important approaches for incrementally processing Analysis Services databases.

  • Tying It All Together: See the details of our recommended audit system, and learn how to use package configurations to communicate between master and child packages.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net