The data model for credit union transactions is for the most part fairly standard. The transactions can be analyzed by the channel, type of transaction, and source of the transaction, and by the customer responsible for the transaction with extensive attributes from the CRM system, including credit rating information, as shown in Figure 8-2.
Figure 8-2. Credit Union Transaction data model
The main modeling decision that we will need to make is how to handle the effect of dimension data changes in our data warehouse design. The key is to look carefully at the business requirements for each of the dimensions. The question that we need to answer for each dimension is "If a change to an attribute occurs, will users always want to analyze data using the latest value for that attribute, or when they look back at historical data, do they want to use the value that was in effect at that time?" The approach that we will take depends on the answer to this question.
Managing Dimensions When History Doesn't Matter
If users always want to see the latest value for an attribute, we can just update the existing dimension record with the new value. This is known as a Type 1 SCD. This type of SCD is the easiest to implement because we don't have to make any changes to our standard data model for dimension tables.
For example, the name attribute in the Channel dimension may change over time, but users really only want to see the latest names for the channels. When an update occurs in the source system, we can simply use the business key (which is a channel code in this case) to locate the record in the Channel dimension and replace the name (for example, replacing Internet with Internet Banking for the NET channel code, as shown in Table 8-1). When this dimension is used in a cube, only the latest name displays.
Managing Dimensions to Preserve History
We know from the business requirements that the approach just described is not appropriate for all the attributes in the Customer dimension. When a customer's credit rating changes, we cannot just replace the existing credit rating on the record. The reason for this is that when we look at the transactions in the cube, all the old transactions would be shown associated with the new credit rating rather than the credit rating that was in effect at the time of the transaction. What we need is a design that can accommodate the historical values for credit ratings. This type of dimension is known as a Type 2 SCD.
When the Customer dimension is initially loaded as shown in Table 8-2, it will have exactly one record for each business key (Customer Number in this case). If Emily Channing's credit rating changes from High to Medium, for example, we will need to insert a new record that has the same business key (1000) but a new generated surrogate key (see Table 8-3).
Although the data model looks somewhat unorthodox because we have multiple records for the same customer number, we can reassure you that this design is a cornerstone of the dimensional approach and is used in most data warehouses of any significant size.
Some interesting questions arise now that we have multiple records for the same business key:
The first question is fairly easy to handle. We have a few choices of how to approach this, such as adding a status column or date columns that represent the date range for which the dimension record is valid.
The second question is even easier to handle. The major reason that surrogate keys were introduced for dimensional models is to support SCDs that need to display historical information. Because we have used these surrogate keys in all our designs, there is actually nothing else we need to do as long as our ETL process is working correctly (and we will spend most of the technical solution stage explaining how this process should work). The following example helps illustrate this.
Suppose that Emily Channing's credit rating was changed from High to Medium on 3/April/2006. Before that date, there was only one dimension record for customer number 1000, and it had a customer surrogate key of 1. All the transactions in the fact table for Emily Channing that were loaded up until that point are associated with a surrogate key of 1.
After we add the new dimension record for customer number 1000 with a surrogate key of 4, any new transactions after that date will use the latest customer record and have a customer key of 4, as shown in Table 8-4.
You can see how this would work using an SQL query that shows the sum of amounts grouped by credit rating. If we were to execute the following SQL, the result would be as shown in Table 8-5:
SELECT C.CreditRating, SUM(Amount) AS TotalAmount FROM FactTransactions F INNER JOIN DimCustomer C ON C.CustomerKey = F.CustomerKey GROUP BY C.CreditRating
Another common example of a Type 2 SCD occurs in Sales databases. If we were to update the sales rep name attribute on a customer record, all transactions for that customer would be assigned to the new sales rep. Because this would undoubtedly affect the sales rep's total bonus for the year, the business requirement would probably be to make sure that any transactions that occurred while the original sales rep was assigned to the customer are credited to that sales rep, and only the new transactions would be assigned to the new sales rep. This kind of situation occurs often enough that we can give you the following tip.
Tip: Any Dimension Attribute That Affects Someone's Bonus Is a Candidate for Type 2 SCD
This is a somewhat light-hearted statement but is usually born out in practice. During the modeling phase, it is common for users to claim that they are not interested in tracking attribute history. If that attribute affects how important metrics such as revenue or risk (as in the credit union example) are assigned, give the user concrete examples of what the ramifications would be.
When an Analysis Services cube is used to analyze the data, it will also take advantage of the join on the surrogate key and present the information correctly. So as you can see, Type 2 SCDs are a simple and powerful way to correctly represent historical information in a data warehouse.