Lesson 4: Dimensions | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

The dimensions determine the possible ways that data can be organized in other words, how the data can and should be filtered, organized, and grouped. At this point in the design, the organization of the data should be clear based on the interviews and sample queries performed during the analysis process.

After this lesson, you will be able to:

Create a design that organizes data the way business needs to access it

Estimated lesson time: 30 minutes

Establishing Dimensions

To model a dimension effectively, you must address three areas:

Dimension characteristics

Dimension table characteristics

Slowly changing dimensions

Dimension Characteristics

Dimension tables must be designed with the business user and the intended analytical usage in mind. Dimensions reside in relational tables and, as such, consist of a key and supporting attributes. They should contain highly correlated attributes, high-quality verbose textual information, parsed name and address elements, and a surrogate key.

Highly Correlated Attributes

All attributes are strongly related to the stated entity. For example, everything in the customer dimension table is about the customer.

High-Quality Verbose Textual Information

The following qualities ensure that the attributes are easy and useful to the user:

Descriptive For example, name an attribute ProductCode rather than PriKey.

Fully expanded words or short phrases, not codes or abbreviations For example, TotalDiscountedAmount rather than TDA.

Quality assured No missing or null values, no misspellings or invalid or obsolete values.

Parsed Name and Address Elements

It is important to parse fields down to their simplest elements. For example, the following are recommended:

Name parsing:

Elements for individuals These might include first, middle (full name or initial), last, title (Mr., Ms., Dr.), and suffix (Jr., MD, III).

Elements for organizations These might include organization name (Wide World Importers), organization subname (Lighting Division), organization second subname (Accounts Payable Department).

Address parsing For example, house number, street name, apartment or suite number, city, state or province, postal code, country, and others.

These parsed elements allow you greater flexibility in queries, including:

Support for international mailing idiosyncrasies

Personalization of correspondence

Support for geographic-area queries

Surrogate Key

Rather than using key values from the source system, you should generate unique values to identify dimension table entries. It is most efficient to use a simple, sequential numeric value for this purpose. This key provides the following benefits:

Establishes independence from source system, which protects you from the following problems:

Modification to application may cause key structure to change.

Source application may reuse keys.

More efficient for indexing

Other Types of Dimensions

You can define other types of dimensions after defining the standard and shared dimensions. We discuss two here: degenerate dimensions and junk dimensions. These dimensions are useful for handling attributes that do not fit in the standard dimensions.

Degenerate Dimensions

Degenerate dimensions represent a business event in the fact table. Degenerate dimensions are control documents used to manage business events and the details of those events. Such control allows consolidated reporting at the business event level.

For example, a business event could consist of an order, invoice, or a shipping ticket. The control values in these events, such as invoice number, order number, or shipping ticket number, are of a higher level of granularity than the actual line items so that more than one record in the fact table may have the same control value.

Degenerate dimensions

Are not dimensional keys or part of the additive numeric facts

Do not have an associated dimension table

Are not foreign keys

Should not be thrown out or ignored

Contain control values that remain after all business event records have been defined for all dimensions

Junk Dimensions

Junk dimensions are attributes that do not correlate to business objectives used to define other dimensions. Using junk dimensions allows you to remove flags and attributes from the fact table and place them into a useful dimensional model. Such placement allows you to capture important information without increasing the size of the fact table.

You can create junk dimensions from the following sources:

Miscellaneous flags and textual attributes from the source system that do not correlate to any one of the established information-object (non-time) dimensions

Attributes that are important to the business and potential constraining value and that would otherwise be excluded from the schema

When designing a junk dimension, you can

Populate it by establishing an instance for every unique set of values. You accomplish this by creating all possible combinations of the values. However, use caution, because creating all possible combinations can produce a high number of rows.

Populate it by creating instance records as they occur. Rather than establishing all possible combinations of the values, you can create a record for combinations that exist and update new combinations as they occur. However, this may require additional resources for inserting new rows into the junk dimensions.

Assign an artificial key.

Exercise: Refining the Data Mart Design

Consider the previous dimensional schema design exercise. In this follow-up exercise, you will complete some of the design questions.

Identifying degenerate dimensions

Which table and column contain information on the required shipment date of all orders? Is the level of granularity consistent with the fact table?

How can the required shipment date be useful in analyzing data in the data mart?

When the information is not a foreign key or measure but is still required and useful to the mart design, what type of information is this called?

Conformed Dimensions

One of the dangers in implementing the data mart iteratively is the possibility of building an analytical environment that is not integrated. By defining all of the dimension tables on an enterprise basis before creating your final schemes, you can make sure that all data marts can use facts in each data mart. A shared dimension that applies to two subject areas or data marts is called a conformed dimension. By utilizing conformed dimensions, comparisons across data marts are meaningful. Conformed dimensions

Are common to multiple base schemas

Should be established at the enterprise level

Should be used in all schema designs

Allow queries to span schemas

Enable the creation of an integrated enterprise data warehouse

Figure 4.8 illustrates how conformed dimensions might be employed in a data warehouse that consists of three separate data marts.

click to view at full size

Figure 4.8 Conformed dimensions

Dimension Table Characteristics

Dimension tables have common characteristics, which makes it easier to define them. When defining dimension tables, there are several guidelines that you should consider.

Contains a Primary Key

Dimension tables should have a primary key that is derived and unique across each table.

TIP
Use single-element surrogate keys when defining dimensions.
When you define your dimensions, use a single key, not a compound key. Do not use the key from the source database. Source applications sometimes reuse keys, and if the source application is replaced, the format of the key may be incompatible with the previously defined key structure.

Has a One-to-Many Relationship to the Fact Table

Each entry in the dimension table has multiple entries corresponding to it in the fact table.

Contains at Least One Description Column

Each description column can be used to qualify a user query by using the associated primary key to find relevant fact records.

Contains Other Attribute Columns That Are Useful for Levels of Aggregation

Related attribute columns that provide differing levels of aggregation are called dimension hierarchies. You will learn more about aggregations in the next lesson of this chapter.

Contains a Limited Set of Rows That Grow Slowly Over Time

When a data warehouse or data mart grows, it should be based on additional fact table entries. Dimension-table rows should be added only as additional attributes for the dimension are introduced.

Lesson Summary

The objectives of a dimensional design are

A design that organizes data the way business needs to access it

An easy-to-understand design for example, one that uses fully parsed names instead of codes

A design that is flexible for example, one that uses surrogate keys

Careful choices at this stage will produce a design that will support the initial data warehouse implementation and also evolve as usage of the data warehouse changes over time.