| ||
Measure twice; cut once
Some people like to plan, specify, and document systems; most dont. Weve observed that Extract, Transformation, and Load (ETL) system development draws folks in the latter category. Weve found so few people who write adequate design specifications for their ETL systems that weve practically stopped asking to see our clients planning documents. Either theyre unaware of the impending complexity, or they dont have the planning tools. We intend not to let this happen to you.
In this chapter we begin by providing an overview of SQL Server Integration Services, including a description of the processing and transformation tasks that are available. The purpose of this introductory material is to provide you with enough background about how Integration Services works so you can make informed decisions about the design of your ETL system. The next chapter, Developing the ETL System, provides more detailed guidance on how to use Integration Services.
The second half of this chapter describes how to develop the specification for your ETL system. Begin with high-level planning; make some system-wide decisions about how to approach various issues; and then perform a thorough analysis of the source data. Well talk about how to set up the ETL system and staging areas. Finally, you will have gathered enough information and made enough decisions to develop the detailed ETL system specification. Throughout, we will be referring to the Adventure Works Cycles case study.
As Figure 5.1 illustrates, and common sense dictates, the ETL portion of the project is part of the data track. Remember, however, that these boxes arent to scale. The ETL effort requires the most time invested in the data track, and often in the entire project.
In this chapter, you learn:
What SQL Server Integration Services is, and its role in the DW/BI system. Youll receive an overview of its most important features for ETL system design.
The components of a solid ETL design specification, and how to create this document:
High-level map
Detailed data profiling report
Source-to-target mapping
Strategy document for extracting data
Strategy document for updating Analysis Services database
The details of ETL system physical design and architecture.
| ||