Lesson 1: Managing Slowly Changing Dimensions

When it is necessary to make changes to the attributes of the dimensions in your data warehouse, the integrity of the data can be compromised. It is important to manage these changes to ensure that the data remains consistent and that user queries continue to return correct results.

After this lesson, you will be able to:

  • Describe strategies for managing slowly changing dimensions

Estimated lesson time: 30 minutes

Slowly Changing Dimensional Attributes

Dimensions in your data warehouse can cause problems associated with attributes that change slowly over time, as illustrated in Figure 14.1.

click to view at full size

Figure 14.1 Slowly changing dimensional attributes

When a schema accommodates any amount of history, changes to master file records on the source system become an issue.

For example:

  • Product may change packaging.
  • Store management may be replaced.
  • Reorganizations may redefine regions.

Your schema must accurately describe the past relative to business requirements. As you move back and forth in time, an appropriate version of the dimension must be accessible. For example, if the packaging for a product is changed at the beginning of this year, then records of sales for the product for last year must reflect last year s packaging. Likewise, records of sales for the product for this year must reflect this year's packaging when you query the data warehouse.

Slowly Changing Dimension Solutions

Three primary solutions have been defined for managing slowly changing dimensions. Each solution option has a specific application. In some cases, a table may have a mix of solutions applied to it. Ultimately, the business needs dictate which solution set you should use.

Each of the following solution options addresses a different aspect of the slowly changing dimension problem:

  • Type 1: Overwrite the dimension record—Using this option, changes to attributes are written in place.
  • Type 2: Write another dimension record—This option creates a separate instance in the dimension table each time one or more specific attributes change.
  • Type 3: Values in the dimension record—When you select this option, one or more attributes are added to the dimension table for the number of tracked change events for a given attribute.

Type 1: Overwriting the Dimension Record

The first solution option involves updating the changed attribute in place without creating a new instance in the table when a dimension attribute changes. Figure 14.2 illustrates the Type 1 approach. This approach is most appropriate when corrections (not routine changes) occur.

click to view at full size

Figure 14.2 Type 1: Overwriting the dimension record

This approach requires a relaxing of the original belief that once a data warehouse table instance is written, it is never altered.

In practical terms, this approach may be appropriate. Figure 14.2 shows that the size was incorrectly entered as 10 oz. The Type 1 approach will cause all previous history to reflect the correct size.

The advantages and disadvantages of this solution are as follows:

Advantages

The Type 1 solution is the simplest to implement.

Disadvantages

The Type 1 solution has the following disadvantages:

  • No notion of history. This approach actually rewrites history.
  • Future queries will not necessarily return the same results.
  • OLAP cubes may require reprocessing.

Type 2: Writing Another Dimension Record

The second solution option involves writing a new instance to the table when a dimension attribute changes. Figure 14.3 illustrates the Type 2 approach.

click to view at full size

Figure 14.3 Type 2: Writing another dimension record

Type 2 is the most common approach found in data warehouse implementations. An effective date attribute enables point-in-time historical reporting.

Figure 14.3 shows that a new instance (key 731) was created to accommodate the change in size. The original instance was left unchanged.

Design Characteristics of the Type 2 Approach

The Type 2 solution has the following design characteristics:

  • It dictates the use of surrogate keys.
  • Source system keys are rarely changed when field values are modified.

  • Effective date or start and end date attributes are typically added to the table to allow point-in-time selection.

Advantages

The Type 2 solution offers the following advantages:

  • Accurately tracks history
  • Capable of tracking an unlimited number of changes

Disadvantages

The Type 2 solution has the following disadvantages:

  • Difficult to navigate unless you employ a query tool or application with version recognition capability
  • May greatly expand the number of rows in a dimension table

Type 3: Values in the Dimension Record

The third solution option involves updating the changed attribute in place and moving the old value to a separate attribute in the instance when a dimension attribute changes. Figure 14.4 illustrates the Type 3 approach. The Type 3 approach is appropriate if the attributes change very infrequently.

click to view at full size

Figure 14.4 Type 3: Values in the dimension record

The Type 3 approach is similar to Type 1, except that a limited amount of change history is captured. As with the Type 1 solution, the attribute is changed in place. However, Type 3 provides attributes for previous values and change dates.

In Figure 14.4, the following updates are applied to the instance:

  • The product size value changes to 12 oz., and the current product size date value changes to 10-15-1998.
  • The old product size (10 oz.) and current product size date (05-01-1995) values move to the previous product size and date attributes, respectively.
  • The old previous product size (11 oz.) and previous product size date (03-20-1994) values move to the second previous product size and date attributes, respectively.

Design Characteristics of the Type 3 Approach

The Type 3 solution has the following design characteristics:

  • One or more attributes are added to the dimension table for the number of tracked change events for a given attribute.
  • A date attribute is added for each Type 3 tracked attribute to indicate the dates on which changes took place.

Advantages

The Type 3 solution offers the following advantages:

  • No additional records need to be written when changes occur.
  • No key structure modifications need to be made to the dimension table.
  • All available history can be viewed by retrieving one table row.

Disadvantages

The Type 3 solution has the following disadvantages:

  • Difficult (if not impossible) to establish a point-in-time view of dimensional information using standard query tools.
  • Not recommended if changes are frequent.
  • It is very complex for end-user applications.
  • No OLAP solution is available.

Lesson Summary

There are various solutions for managing the slowly changing dimensions in your data warehouse. It is important to assess the impact of changes to the dimensional attributes in your data warehouse and use one of these solutions to handle these changes correctly.



Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
ISBN: 0735606706
EAN: 2147483647
Year: 1999
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net