Summary | The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset

The goal of this chapter is to encourage you to plan your ETL system. To develop a good plan, you need a basic understanding of how an ETL system is built in SQL Server 2005. The main ETL tool is Integration Services, which contains a rich set of functionality for developing and executing an ETL system. This chapter introduces you to Integration Services features, without going into any details about how exactly to use them.

The first step in developing your ETL system specification is to start from the draft source to target mappings that are created during the modeling process. While youre developing your data model, you must be looking at the source systems in order to validate that your data model is feasible . At this point you havent done detailed examination of the source systems, but youve identified source tables for your dimensional model, and youve identified some data quality issues. Document these relationships and issues in a high-level map. Youll edit this map throughout the specification process in order to correct your early assumptions.

The next big step is to profile the source system data. We recommend that you purchase a tool to perform data profiling, as its very tedious to do by hand. The data profiling results are the main set of information that you need to complete the source-to-target mapping on a detailed level, table by table and column by column.

There are several system-wide decisions to make and document: How often will you load data, and how much history will you maintain? Will you partition your fact tables, and if so how will you maintain those partitions? What strategies will you use for extracting data from the source? How will you handle de-duplication of people and organizations? How will you manage your dimension processing? And what strategy will you use for updating your Analysis Services databases and predefined reports ?

All of the information weve described here should be collected into an ETL system design specification. This doesnt have to be a fancy document; just collect the information weve described and slap a table of contents on it. A few years from now, when the team needs to revise the system, your protg will marvel at your foresight in pulling together all of this information.