Implementing Staging Tables


It is possible to load data directly from the extraction process of data sources into the data warehouse without a separate staging area. In fact, the opportunity to avoid a staging database is even greater when using SSIS, which allows you to apply multiple transformations to data as it passes through memory between the source and the destination. Whether you need to stage the data as part of your ETL process is a common design decision that you have to make.

A common practice and the architecture of data warehouse processes include storing the transaction-level data into staging tables. Select source data is loaded into temporary (or persisted) staging tables. The data staging tables become the source data for the next process, where the data is further transformed, merged, standardized, cleansed, and prepared for loading into data warehouse models. Some corporations might create a single enterprise data warehouse as a central repository of all its business data. Subject area subsets of the enterprise data warehouse are often created as data marts to serve departmental and functional area reporting and analysis. Figure 13-2 illustrates a common configuration model for staging.

image from book
Figure 13-2: Common data staging configuration model

Regardless of the final destination, the staging table provides the fastest and easiest way to extract data from source systems and temporarily store it in a different database where additional transformation, derivation, augmentation, and other tasks can be performed.

Some of the reasons you might want to use staging tables include:

  • Only short windows of time are required to extract data from source systems.

  • Data from multiple source systems do not exist all at the same time and must be collected at different times and merged once.

  • Source data systems might not have changed-data identification, and staging tables enable change detection between existing data warehouse data and the current extract.

  • Recovery from failure is typically easier when performed from staging table sources.

  • Memory constraints for transformations such as sorting and others might be handled better with disk-based staging.

  • Audit tracking and lineage is also more easily managed when data is staged across its various transformations.

  • Staging allows for easier surrogate key generation and management for star schema-modeled data warehouses.




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net