When it is necessary to make changes to the attributes of the dimensions in your data warehouse, the integrity of the data can be compromised. It is important to manage these changes to ensure that the data remains consistent and that user queries continue to return correct results.
After this lesson, you will be able to:
- Describe strategies for managing slowly changing dimensions
Estimated lesson time: 30 minutes
Dimensions in your data warehouse can cause problems associated with attributes that change slowly over time, as illustrated in Figure 14.1.
Figure 14.1 Slowly changing dimensional attributes
When a schema accommodates any amount of history, changes to master file records on the source system become an issue.
For example:
Your schema must accurately describe the past relative to business requirements. As you move back and forth in time, an appropriate version of the dimension must be accessible. For example, if the packaging for a product is changed at the beginning of this year, then records of sales for the product for last year must reflect last year s packaging. Likewise, records of sales for the product for this year must reflect this year's packaging when you query the data warehouse.
Three primary solutions have been defined for managing slowly changing dimensions. Each solution option has a specific application. In some cases, a table may have a mix of solutions applied to it. Ultimately, the business needs dictate which solution set you should use.
Each of the following solution options addresses a different aspect of the slowly changing dimension problem:
The first solution option involves updating the changed attribute in place without creating a new instance in the table when a dimension attribute changes. Figure 14.2 illustrates the Type 1 approach. This approach is most appropriate when corrections (not routine changes) occur.
Figure 14.2 Type 1: Overwriting the dimension record
This approach requires a relaxing of the original belief that once a data warehouse table instance is written, it is never altered.
In practical terms, this approach may be appropriate. Figure 14.2 shows that the size was incorrectly entered as 10 oz. The Type 1 approach will cause all previous history to reflect the correct size.
The advantages and disadvantages of this solution are as follows:
Advantages
The Type 1 solution is the simplest to implement.
Disadvantages
The Type 1 solution has the following disadvantages:
The second solution option involves writing a new instance to the table when a dimension attribute changes. Figure 14.3 illustrates the Type 2 approach.
Figure 14.3 Type 2: Writing another dimension record
Type 2 is the most common approach found in data warehouse implementations. An effective date attribute enables point-in-time historical reporting.
Figure 14.3 shows that a new instance (key 731) was created to accommodate the change in size. The original instance was left unchanged.
The Type 2 solution has the following design characteristics:
Source system keys are rarely changed when field values are modified.
Advantages
The Type 2 solution offers the following advantages:
Disadvantages
The Type 2 solution has the following disadvantages:
The third solution option involves updating the changed attribute in place and moving the old value to a separate attribute in the instance when a dimension attribute changes. Figure 14.4 illustrates the Type 3 approach. The Type 3 approach is appropriate if the attributes change very infrequently.
Figure 14.4 Type 3: Values in the dimension record
The Type 3 approach is similar to Type 1, except that a limited amount of change history is captured. As with the Type 1 solution, the attribute is changed in place. However, Type 3 provides attributes for previous values and change dates.
In Figure 14.4, the following updates are applied to the instance:
The Type 3 solution has the following design characteristics:
Advantages
The Type 3 solution offers the following advantages:
Disadvantages
The Type 3 solution has the following disadvantages:
There are various solutions for managing the slowly changing dimensions in your data warehouse. It is important to assess the impact of changes to the dimensional attributes in your data warehouse and use one of these solutions to handle these changes correctly.