Dimensions | Delivering Business Intelligence with Microsoft SQL Server 2005: Utilize Microsofts Data Warehousing, Mining & Reporting Tools to Provide Critical Intelligence to A

After taking a good look at measures, let's move on to another part of the OLAP cube, namely, the dimensions. As we have seen, dimensions are what give cubes their true analytical power. They enable us to slice and dice the measures in the cube until we find the meaningful business intelligence hidden among all of the numbers. Dimensions provide understandable phrases in business language to define what may be cryptic in a transactional database system.

Just as there is more to measures than initially meets the eye, so, too, dimensions come in a number of different varieties. We will examine their various classifications and capabilities but, first, let's look at the way dimensions are related to measure groups.

Managing Dimensions

Dimensions are managed using the Dimension Design tab. To display this tab, double-click on a dimension in the Solution Explorer window. The Dimension Design tab is shown in Figure 8-15 displaying the Dim Product dimension.

image from book
Figure 8-15: The Dimension Design tab with the Dim Product dimension

The Dimension Design tab includes the entire hierarchy for the dimension. The Dim Product dimension in Figure 8-15 has a three-level hierarchy. The Dimension Design tab contains information about all three levels: the product level, the product subtype level, and the product type level.

The left-hand column of the Dimension Design tab shows all of the attributes for the entire dimension hierarchy. The center column shows the structure of the dimension hierarchy. The right-hand column shows the tables from the data source view that define this dimension.

Relating Dimensions to Measure Groups

We have been working with the Max Min Manufacturing DM cube through a couple of Learn By Doing exercises. This cube now includes two different measure groups with a number of measures and calculated members. It also contains a number of dimensions. However, for all of that, this cube is still not useful because the dimensions cannot be used to analyze the measures. The dimensions are not related to the measure groups. The dimensions must be related to the measure groups before we can query data from the cube and get any meaningful results.

We can solve this situation right now with another Learn By Doing activity.

Learn by Doing—Relating Dimensions in the Max Min Manufacturing DM Cube

Features Highlighted

Changing the key attribute in a dimension
Relating dimensions to measure groups

Business Need We need to have the dimensions in the Max Min Manufacturing DM cube related to the measure groups to perform any meaningful analysis on this cube.

Steps

Open the Business Intelligence Development Studio.
Open the MaxMinManufacturingDM project.
Double-click the entry for the Dim Time dimension in the Solution Explorer window. The dimension design tab appears.
Right-click the DateOfManufacture—ProductCode—BatchNumber—MachineNumber attribute in the Attributes area and select Delete from the Context menu. The Delete Objects dialog box appears.
Click OK to confirm the deletion.
Right-click the DateOfManufacture attribute in the Attributes area and select Set Attribute Usage I Key from the Context menu. The Dim Time Design tab should appear as shown in Figure 8-16.
Click the Save All button in the toolbar, and then Close the Dim Time Dimension Design tab.
Double-click the entry for Max Min Manufacturing DM.cube to display the Cube Design tab, if it is not already visible.
Select the Dimension Usage tab of the Cube Design tab to create dimension to measure group relations. You can see that only the Dim Time dimension appears on this tab and it is only related to the Manufacturing Fact measure group. The red, wavy line tells us that even this relationship is in error because we made our modifications to the Dim Time dimension.
Click the entry in the Manufacturing Fact column and the Dim Time row. An ellipsis (…) button appears.
Click the ellipsis button. The Define Relationship dialog box appears.
Select No Relationship in the Select Relationship Type drop-down list and click OK to exit the Define Relationship dialog box. This removes the relationship invalidated by our changes.
Click the ellipsis button again. The Define Relationship dialog box appears.
Select Fact in the Select Relationship Type drop-down list and click OK to exit the Define Relationship dialog box. This creates a new, valid fact dimension relationship between the Manufacturing Fact measure group and the Dim Time dimension. Fact dimensions are explained in the section "Types of Dimensions."
Click the Add Cube Dimension button in the Dimension Usage tab toolbar. The Add Cube Dimension dialog box appears.
Select the Dim Batch, Dim Machine, and Dim Product dimensions in the Select Dimension list. Click OK to add all of these dimensions to the Dimension Usage tab.
Click the gray rectangle in the Manufacturing Fact column and the Dim Batch row. An ellipsis (…) button appears.
Click the ellipsis button. The Define Relationship dialog box appears.
Select Regular from the Select Relationship Type drop-down list. We are specifying that the Dim Batch dimension relates to the Manufacturing Fact measure group as a regular dimension.
Select Dim Batch from the Granularity Attribute drop-down list. This is the attribute in the dimension that defines the granularity of the measure group for this dimension. We can either select the primary key of the table that defines the dimension or we can select one of the attribute fields in this table. The primary key has the same name as the dimension itself.
In the Relationship grid, select BatchNumber from the drop-down list under Measure Group Columns. This creates a relationship between the BatchNumber column, which is the primary key in the table that defines the dimension, and the BatchNumber column in the table, which defines the measure group.
Click OK to exit the Define Relationship dialog box.

Repeat this process to create regular dimension relationships as follows:

Dimension	Measure Group	Granularity Attribute	Measure Group Columns
Dim Machine	Manufacturing Fact	Dim Machine	MachineNumber
Dim Product	Manufacturing Fact	Dim Product	ProductCode
Dim Time	Inventory Fact	DateOfManufacture	DateOfInventory

The following regular dimension relationships should have been created for you automatically, based on database foreign key relationships:

Dim Machine

Inventory Fact

Dim Material

Material

Dim Product

Inventory Fact

Dim Product

ProductCode

Once completed, the Dimension Usage tab should appear as shown in Figure 8-17.
Click the Save All button in the toolbar.

image from book
Figure 8-16: The modified Dim Time dimension

image from book
Figure 8-17: The Dimension Usage tab of the Max Min Manufacturing DM cube

Types of Dimensions

Dimensions can be placed in a number of classifications. These classifications tell us something about the way a dimension is created and managed. They can also shed some light on how a dimension is utilized.

Fact Dimensions

In most cases, the information used to populate a dimension resides in a dimension table. DimProduct and DimMachine are examples from our Manufacturing data mart. However, you may have noticed no DimTime table in this data mart. The reason is that the time dimension is created from the DateOfManufacture field and the YearOfManufacture, QuarterOfManufaclure, and MonthOfManufacture named calculations in the ManufacturingFact table.

Dimensions created from attributes in a fact table are known as fact dimensions. They are also known in the "biz" as degenerate dimensions. (This seems like a highly slanderous name, if you ask me.)

Fact dimensions follow a few simple rules in Analysis Services:

At most, one fact dimension can be in a measure group.
A fact dimension must be defined by one and only one measure group.
A fact dimension can have a fact relationship with only one measure group. It can, however, have a regular relationship with additional measure groups.

The Define Relationship dialog box for a fact dimension is shown in Figure 8-18.

image from book
Figure 8-18: The Define Relationship dialog box for a fact dimension

Parent-Child Dimensions

A parent-child dimension is built on a table that contains a self-referential relationship. An employee table that contains a supervisor field is a good example. Figure 8-19 shows a table with a supervisor field. The supervisor field contains the employee number of the person to whom the employee reports. This field is a foreign key field that points back to the same table.

image from book
Figure 8-19: An employee table with a self-referential field

When a table, such as the Employee table in Figure 8-19, is used to define a dimension, the Supervisor field becomes an attribute. This is a special type of attribute known as a parent attribute. A parent attribute is created by right-clicking an attribute in the Attributes column of the Dimension Design tab and selecting Set Attribute Usage I Parent from the Context menu.

A parent-child dimension creates its own hierarchy. Each step from child to parent creates another level in that hierarchy. What is unique about this is there are an undetermined number of levels to the hierarchy. The number of hierarchy levels depends on the number of links in the parent-child chain you are following.

In the Employee dimension shown in Figure 8-19, we can start at Eva, the developer, and follow the chain of supervisors up to Maya, the CEO. There are six levels to the hierarchy along this path. If, however, we start at Juan, the System Administrator, and follow the chain of supervisors up to Maya, there are four levels to the hierarchy.

Role Playing Dimensions

A role playing dimension is a dimension that is related to the same measure group multiple times. Each relationship represents a different role the dimension plays relative to the measure group. One of the best examples of this is a time dimension and a sales measure group. The time dimension is related to the sales measure group once for the date of the sale, another time for the date of shipment, and a third time for the date of payment. In this case, the time dimension plays three different roles: date of sale, date of shipment, and date of payment, relative to the sales measure group.

To create a role playing dimension, add the dimension to the Dimension Usage tab multiple times. Each instance of the dimension should be renamed to reflect one of the roles it is playing for this measure group. Then, create a relationship between each instance of the dimension and the measure group.

Reference Dimensions

A reference dimension is not related directly to the measure group. Instead, a reference dimension is related to another regular dimension which is, in turn, related to the measure group. Reference dimensions are created by selecting Referenced from the Select Relationship Type drop-down list on the Define Relationship dialog box.

Data Mining Dimensions

We can include information discovered by data mining algorithms in our cubes for analysis. This is done through the creation of data mining dimensions. We discuss data mining in Chapters 12, 13, and 14. Data mining dimensions are created by selecting Data Mining from the Select Relationship Type drop-down list on the Define Relationship dialog box.

Many-To-Many Dimensions

Many-to-many dimensions, as the name implies, support many-to-many relationships between dimension members and measure group members. For example, if an e-commerce site allows an order to have multiple ship to addresses, there is a many-to-many relationship. An order can have multiple addresses and, of course, an address can be used by more than one order.

In a relational database, a many-to-many relationship is implemented by creating an intermediate linking table. When a many-to-many relationship is implemented in an Analysis Services cube, the intermediate linking table becomes an intermediate fact table. Many-to-many dimensions are created by selecting Many-to-Many from the Select Relationship Type drop-down list on the Define Relationship dialog box.

Slowly Changing Dimensions

For our cubes to provide meaningful information from year to year, we need to have dimensions whose members are fairly constant. If the dimensions are changing drastically month to month, our analysis across the time dimension becomes worthless. Therefore, we need mainly static dimensions.

Some dimensions, however, change over time. Salespeople move from one sales territory to another. The corporate organizational chart changes as employees are promoted or resign. These are known as Slowly Changing Dimensions (SCD).

SCDs come in three varieties: Type 1, Type 2, and Type 3, as defined by the Business Intelligence community. Not exciting names, but they didn't ask for my input, so it's what we are stuck with! Anyway, let's look at what differentiates each type of SCD.

Type 1 Slowly Changing Dimensions

When a dimension is implemented as a Type 1 SCD, we don't keep track of its history as it changes. The members of the dimension represent the way things are right now. With a Type 1 SCD, it is impossible to go back and determine the state of the dimension members at any time in the past.

In actuality, all dimensions in Analysis Services cubes are allowed to change. Some dimensions track that change so a previous state of that dimension's members can be reconstructed during analysis. These are the Type 2 and Type 3 SCD, which we discuss in the sections "Type 2 Slowly Changing Dimensions" and "Type 3 Slowly Changing Dimensions." If a dimension does not track this change, it is a Type 1 SCD. In most cases, the majority of the dimensions in your cubes are going to be Type 1 SCD.

Let's consider a situation where we have four salespeople—Jackie, Andy, Sam, and Mollie—in four sales territories: A, B, C, and D. (All right, so maybe my naming isn't always that creative either!) Figure 8-20 shows the situation in May. Jackie is in sales territory A. Andy is in sales territory B. Sam is in sales territory C, and Mollie is in sales territory D.

image from book
Figure 8-20: The state of the salesperson Type 1 SCD in May

Now, let's suppose, at the beginning of August, Mollie is promoted to sales manager. Andy is moved to Mollie's former sales territory (sales territory D) and a new salesperson, Pat, is hired to take over Andy's former territory (sales territory B). The state of the salesperson dimension after the shuffling is shown in Figure 8-21. Because this is a Type 1 SCD, we do not track any history. We just overwrite the previous dimension information with the current dimension information.

image from book
Figure 8-21: The state of the salesperson Type 1 SCD after August

If we perform analysis on sales information using this cube, it will look like Andy has always been in sales territory D and Pat has always been in sales territory B. This is true if we look at sales measures for June or if we look at sales information for October. The history is lost.

Type 2 Slowly Changing Dimensions

When a dimension is implemented as a Type 2 SCD, four supplementary attributes are added to the dimension to track the history of that dimension. These four attributes are:

SCD Original ID An alternative primary key for the dimension
SCD Start Date The date this dimension member became active

SCD End Date The date this dimension member ceased being active
SCD Status The current state of this dimension member, either active or inactive

Let's return to our salesperson dimension example. Figure 8-22 shows the state of the dimension in May. Three of the four new fields are shown in the figure: Start Date, End Date, and Status. Because none of the dimension members has an End Date filled in, they remain active indefinitely into the future.

image from book
Figure 8-22: The state of the salesperson Type 2 SCD in May

After the changes to the sales territories occur at the beginning of August, you can see how the new fields are used to track both the previous state and the current state of the sales territory assignments. This is shown in Figure 8-23. Now, by using the Start Date and End Date attributes, when we perform analysis for May, we see how the salespeople were arranged in May. When we do analysis for October, we see how the salespeople were arranged in October. When we want to find the current state of the dimension, we can use the Status attribute to quickly locate the active members, rather than doing a lot of date comparisons.

image from book
Figure 8-23: The state of the salesperson Type 2 SCD after August

The Original ID attribute holds a unique identifier that is used as the primary key for this dimension. This alternate key is used in place of the ID field that would normally serve this purpose. In our examples, the Original ID field would be used as the primary key in place of the salesperson's employee number. Because a salesperson can appear in the dimension more than once, this alternate key is necessary to prevent primary key violations.

In the Type I SCD example, each employee can only appear once in the Salesperson dimension. Therefore, the salesperson's employee ID can be used as the primary. This is not the case with the Type 2 SCD.

In the Type 2 SCD, a salesperson can appear in the dimension more than once. You can see this with Andy in Figure 8-23. Andy has two dimension members: the member with an effective start date in March and the member with the effective start date in August. If we defined a primary key using the employee ID, we would receive an error when we tried to insert the second entry into this dimension. When the Original ID is used as the primary key, there are no run-time errors because the Original ID is unique to each member in the dimension.

When we use the Original ID as the primary key, we also have to use it in the foreign key fields in the fact tables. That way we can tell which measures are linked to Andy when he was in sales territory B and which measures are linked to Andy when he was in sales territory D.

Type 3 Slowly Changing Dimensions

Type 3 SCD are similar to Type 2 SCD with one exception. Type 3 SCD do not track the entire history of the dimension members. Instead, a Type 3 SCD tracks only the current state and the original state of a dimension member.

A Type 3 SCD is implemented using two additional attributes:

SCD Start Date The date the current state of this dimension member became active
SCD Initial Value The original state of this attribute

A Slowly Changing Dimension in Action

In addition to the Manufacturing data mart, we created another data mart back in Chapter 6. This is the Sales data mart. You will recall that we created both the database schema for the data mart and the OLAP cube definition at the same time using the Cube Wizard. This data mart may have a cube, but it has even less data than the Manufacturing data mart. It is completely empty. We can remedy this situation as well.

The cube we created for the Sales data mart includes an SCD to track the salespeople as they moved from sales territory to sales territory and from job to job. You can see how we load data into an SCD by using the MaxMinSalesDMDimLoad project to populate the dimension tables in the Sales data mart. You can then use the MaxMinSalesDMDimLoad project to populate the fact table. The populated version of this data mart will be used in Learn By Doing exercises later in this book. These projects are available for download from the book's website. Follow the directions that come along with these projects for information on setup and execution.

Dim Machine	Inventory Fact	Dim Material	Material
Dim Product	Inventory Fact	Dim Product	ProductCode