Chapter 5: Designing the ETL System

Measure twice; cut once

Overview

Some people like to plan, specify, and document systems; most dont. Weve observed that Extract, Transformation, and Load (ETL) system development draws folks in the latter category. Weve found so few people who write adequate design specifications for their ETL systems that weve practically stopped asking to see our clients planning documents. Either theyre unaware of the impending complexity, or they dont have the planning tools. We intend not to let this happen to you.

In this chapter we begin by providing an overview of SQL Server Integration Services, including a description of the processing and transformation tasks that are available. The purpose of this introductory material is to provide you with enough background about how Integration Services works so you can make informed decisions about the design of your ETL system. The next chapter, Developing the ETL System, provides more detailed guidance on how to use Integration Services.

The second half of this chapter describes how to develop the specification for your ETL system. Begin with high-level planning; make some system-wide decisions about how to approach various issues; and then perform a thorough analysis of the source data. Well talk about how to set up the ETL system and staging areas. Finally, you will have gathered enough information and made enough decisions to develop the detailed ETL system specification. Throughout, we will be referring to the Adventure Works Cycles case study.

As Figure 5.1 illustrates, and common sense dictates, the ETL portion of the project is part of the data track. Remember, however, that these boxes arent to scale. The ETL effort requires the most time invested in the data track, and often in the entire project.

image from book
Figure 5.1: The Business Dimensional Lifecycle

In this chapter, you learn:

  • What SQL Server Integration Services is, and its role in the DW/BI system. Youll receive an overview of its most important features for ETL system design.

  • The components of a solid ETL design specification, and how to create this document:

    • High-level map

    • Detailed data profiling report

    • Source-to-target mapping

    • Strategy document for extracting data

    • Strategy document for updating Analysis Services database

  • The details of ETL system physical design and architecture.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net