Data Model


The data model for credit union transactions is for the most part fairly standard. The transactions can be analyzed by the channel, type of transaction, and source of the transaction, and by the customer responsible for the transaction with extensive attributes from the CRM system, including credit rating information, as shown in Figure 8-2.

Figure 8-2. Credit Union Transaction data model


The main modeling decision that we will need to make is how to handle the effect of dimension data changes in our data warehouse design. The key is to look carefully at the business requirements for each of the dimensions. The question that we need to answer for each dimension is "If a change to an attribute occurs, will users always want to analyze data using the latest value for that attribute, or when they look back at historical data, do they want to use the value that was in effect at that time?" The approach that we will take depends on the answer to this question.

Managing Dimensions When History Doesn't Matter

If users always want to see the latest value for an attribute, we can just update the existing dimension record with the new value. This is known as a Type 1 SCD. This type of SCD is the easiest to implement because we don't have to make any changes to our standard data model for dimension tables.

For example, the name attribute in the Channel dimension may change over time, but users really only want to see the latest names for the channels. When an update occurs in the source system, we can simply use the business key (which is a channel code in this case) to locate the record in the Channel dimension and replace the name (for example, replacing Internet with Internet Banking for the NET channel code, as shown in Table 8-1). When this dimension is used in a cube, only the latest name displays.

Table 8-1. Channel Dimension Records

Channel Surrogate Key

Channel Code

Channel Name

Channel Type

1

ATM

ATM

Direct

2

BRN

Branch

Direct

3

NET

Internet Banking

Indirect


Managing Dimensions to Preserve History

We know from the business requirements that the approach just described is not appropriate for all the attributes in the Customer dimension. When a customer's credit rating changes, we cannot just replace the existing credit rating on the record. The reason for this is that when we look at the transactions in the cube, all the old transactions would be shown associated with the new credit rating rather than the credit rating that was in effect at the time of the transaction. What we need is a design that can accommodate the historical values for credit ratings. This type of dimension is known as a Type 2 SCD.

When the Customer dimension is initially loaded as shown in Table 8-2, it will have exactly one record for each business key (Customer Number in this case). If Emily Channing's credit rating changes from High to Medium, for example, we will need to insert a new record that has the same business key (1000) but a new generated surrogate key (see Table 8-3).

Table 8-2. Original Customer Dimension Records

Customer Surrogate Key

Customer Number

Customer Name

Credit Rating

1

1000

Emily Channing

High

2

1121

Roberto Berneman

Low

3

1212

Deborah Shaffet

Medium


Table 8-3. Updated Customer Dimension Records

Customer Surrogate Key

Customer Number

Customer Name

Credit Rating

1

1000

Emily Channing

High

2

1121

Roberto Berneman

Low

3

1212

Deborah Shaffet

Medium

4

1000

Emily Channing

Medium


Although the data model looks somewhat unorthodox because we have multiple records for the same customer number, we can reassure you that this design is a cornerstone of the dimensional approach and is used in most data warehouses of any significant size.

Some interesting questions arise now that we have multiple records for the same business key:

  • How will we identify the current or latest customer record?

  • More important, how will we make sure that users see the correct view of the information when looking at transactions?

The first question is fairly easy to handle. We have a few choices of how to approach this, such as adding a status column or date columns that represent the date range for which the dimension record is valid.

The second question is even easier to handle. The major reason that surrogate keys were introduced for dimensional models is to support SCDs that need to display historical information. Because we have used these surrogate keys in all our designs, there is actually nothing else we need to do as long as our ETL process is working correctly (and we will spend most of the technical solution stage explaining how this process should work). The following example helps illustrate this.

Suppose that Emily Channing's credit rating was changed from High to Medium on 3/April/2006. Before that date, there was only one dimension record for customer number 1000, and it had a customer surrogate key of 1. All the transactions in the fact table for Emily Channing that were loaded up until that point are associated with a surrogate key of 1.

After we add the new dimension record for customer number 1000 with a surrogate key of 4, any new transactions after that date will use the latest customer record and have a customer key of 4, as shown in Table 8-4.

Table 8-4. Transaction Fact Records for Customer 1000

Transaction Date

Customer Key

Channel Key

Amount

15/Dec/2005

1

2

4214.12

7/Feb/2006

1

3

213.11

12/Apr/2006

4

2

3431.11


You can see how this would work using an SQL query that shows the sum of amounts grouped by credit rating. If we were to execute the following SQL, the result would be as shown in Table 8-5:

SELECT C.CreditRating, SUM(Amount) AS TotalAmount   FROM FactTransactions F INNER JOIN DimCustomer C ON C.CustomerKey = F.CustomerKey GROUP BY C.CreditRating 


Table 8-5. Transactions Group by Credit Rating for Customer 1000

Credit Rating

Total Amount

High

4427.23

Medium

3431.11


Another common example of a Type 2 SCD occurs in Sales databases. If we were to update the sales rep name attribute on a customer record, all transactions for that customer would be assigned to the new sales rep. Because this would undoubtedly affect the sales rep's total bonus for the year, the business requirement would probably be to make sure that any transactions that occurred while the original sales rep was assigned to the customer are credited to that sales rep, and only the new transactions would be assigned to the new sales rep. This kind of situation occurs often enough that we can give you the following tip.

Tip: Any Dimension Attribute That Affects Someone's Bonus Is a Candidate for Type 2 SCD

This is a somewhat light-hearted statement but is usually born out in practice. During the modeling phase, it is common for users to claim that they are not interested in tracking attribute history. If that attribute affects how important metrics such as revenue or risk (as in the credit union example) are assigned, give the user concrete examples of what the ramifications would be.


When an Analysis Services cube is used to analyze the data, it will also take advantage of the join on the surrogate key and present the information correctly. So as you can see, Type 2 SCDs are a simple and powerful way to correctly represent historical information in a data warehouse.



Practical Business Intelligence with SQL Server 2005
Practical Business Intelligence with SQL Server 2005
ISBN: 0321356985
EAN: 2147483647
Year: 2007
Pages: 132

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net