Types of Staging Schemes


Once you have decided to stage data for data warehouse maintenance, you can consider various types of staging table patterns. Each affects the design of your SSIS packages and process design. Look at each of these kinds of staging design options and when to use each type.

Staging Data from Multiple Sources

In staging data from multiple sources, the data is temporarily stored in a staging database before the ETL process. SSIS is used to extract the data from the source systems to the staging database. Additional SSIS packages are used to transform the data from the staging database into the final data warehouse database destination. This is a common approach, especially if multiple-source data requires a great amount of standardization, matching, and merging and usually isn't available or doesn't all exist at the same time. Figure 13-3 illustrates multiple-source staging.

image from book
Figure 13-3: Staging from multiple sources

Staggered Staging

In staggered staging, the data can be from multiple or single sources. Sometimes the ETL process can be very expensive in terms of processing time, and sometimes you might not have the luxury to repeat the entire process because of a failure at some point during execution. This staging approach allows you to limit recovery reprocessing. Clearly, this approach also results in additional data disk space but enables much greater and faster correction and recovery for high data volumes and long-running SSIS packages. An illustration of staggered staging might look as shown in Figure 13-4.

image from book
Figure 13-4: Staggered staging model

Persisted Staging

In persisted staging, an archive copy of the staging database is created on a routine basis. The primary reason for using this alternative scheme is to allow auditing of the extract and transformation processes for more than the current period and processing cycle. This allows for smaller data volumes for each current period processing while maintaining previous period processing data lineage for auditing and troubleshooting. An illustration of persisted staging is shown in Figure 13-5.

image from book
Figure 13-5: Persisted staging model

Accumulated Staging

Ideally, your source application systems can provide a way to determine whether data required for the data warehouse has changed. Then, you could set up a process in SSIS for these data transaction details (inserts, updates, and deletes). In these circumstances, you can use the accumulated staging approach:

  • If the source system does not have a built-in delta detection mechanism or changed-data capture capability.

  • If the source system does not provide the kind of transaction applied for each delta-detected record.

If your source application system does not have built-in changed-data detection, all the records from the required tables are extracted and compared against the accumulated staging data to determine changes.

If your source application system does have changed-data detection mechanisms and you can extract only the changed and new records, the next step is to determine the transaction type that occurred for these changed records (inserts, updates, or deletes).

For example, in Figure 13-6, when comparing the incoming employee table with accumulated staging, the record EmpId 200 can be determined as Modified (changed data for an existing dimension member) whereas the EmpId 202 record can be determined as Inserted (a new, previously nonexisting dimension member).

image from book
Figure 13-6: Accumulated staging model

Chunked Accumulated Staging

Common in retail or other high transaction-type business data warehouse designs, the chunked accumulated staging scheme enables you to extract a huge volume of data and spread the extract load across the day on the source system. The accumulated information is then transformed and loaded to the destination data warehouse as required by business needs. For example, extraction could happen every hour throughout the day, and transformation and loading could be done once at the end of the day. Figure 13-7 provides an illustration of the chunked accumulated staging approach.

image from book
Figure 13-7: Chunked accumulated staging model

Other Destination Considerations

You've just learned various designs for staging data before loading the data warehouse. In some BI and data warehouse database implementations, there can be an additional step that involves distributing the data to separate data marts. For example, the data from the central data warehouse might need to be distributed to data marts at different geographical locations. In some ways, the data warehouse is functioning as a staging area for the distributed data marts. This is not technically a stage, but it still needs to be taken into consideration for choosing appropriate conditions for loading the data warehouse to ensure that it is appropriately populated for destination marts. Figure 13-8 is an illustration of the multiple destination scheme approach.

image from book
Figure 13-8: Multiple destination with staging model




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net