The dimensions determine the possible ways that data can be organized in other words, how the data can and should be filtered, organized, and grouped. At this point in the design, the organization of the data should be clear based on the interviews and sample queries performed during the analysis process.
After this lesson, you will be able to:
- Create a design that organizes data the way business needs to access it
Estimated lesson time: 30 minutes
To model a dimension effectively, you must address three areas:
Dimension tables must be designed with the business user and the intended analytical usage in mind. Dimensions reside in relational tables and, as such, consist of a key and supporting attributes. They should contain highly correlated attributes, high-quality verbose textual information, parsed name and address elements, and a surrogate key.
Highly Correlated Attributes
All attributes are strongly related to the stated entity. For example, everything in the customer dimension table is about the customer.
High-Quality Verbose Textual Information
The following qualities ensure that the attributes are easy and useful to the user:
Parsed Name and Address Elements
It is important to parse fields down to their simplest elements. For example, the following are recommended:
These parsed elements allow you greater flexibility in queries, including:
Surrogate Key
Rather than using key values from the source system, you should generate unique values to identify dimension table entries. It is most efficient to use a simple, sequential numeric value for this purpose. This key provides the following benefits:
You can define other types of dimensions after defining the standard and shared dimensions. We discuss two here: degenerate dimensions and junk dimensions. These dimensions are useful for handling attributes that do not fit in the standard dimensions.
Degenerate Dimensions
Degenerate dimensions represent a business event in the fact table. Degenerate dimensions are control documents used to manage business events and the details of those events. Such control allows consolidated reporting at the business event level.
For example, a business event could consist of an order, invoice, or a shipping ticket. The control values in these events, such as invoice number, order number, or shipping ticket number, are of a higher level of granularity than the actual line items so that more than one record in the fact table may have the same control value.
Degenerate dimensions
Junk Dimensions
Junk dimensions are attributes that do not correlate to business objectives used to define other dimensions. Using junk dimensions allows you to remove flags and attributes from the fact table and place them into a useful dimensional model. Such placement allows you to capture important information without increasing the size of the fact table.
You can create junk dimensions from the following sources:
When designing a junk dimension, you can
Consider the previous dimensional schema design exercise. In this follow-up exercise, you will complete some of the design questions.
One of the dangers in implementing the data mart iteratively is the possibility of building an analytical environment that is not integrated. By defining all of the dimension tables on an enterprise basis before creating your final schemes, you can make sure that all data marts can use facts in each data mart. A shared dimension that applies to two subject areas or data marts is called a conformed dimension. By utilizing conformed dimensions, comparisons across data marts are meaningful. Conformed dimensions
Figure 4.8 illustrates how conformed dimensions might be employed in a data warehouse that consists of three separate data marts.
Figure 4.8 Conformed dimensions
Dimension tables have common characteristics, which makes it easier to define them. When defining dimension tables, there are several guidelines that you should consider.
Contains a Primary Key
Dimension tables should have a primary key that is derived and unique across each table.
TIP
Use single-element surrogate keys when defining dimensions.When you define your dimensions, use a single key, not a compound key. Do not use the key from the source database. Source applications sometimes reuse keys, and if the source application is replaced, the format of the key may be incompatible with the previously defined key structure.
Has a One-to-Many Relationship to the Fact Table
Each entry in the dimension table has multiple entries corresponding to it in the fact table.
Contains at Least One Description Column
Each description column can be used to qualify a user query by using the associated primary key to find relevant fact records.
Contains Other Attribute Columns That Are Useful for Levels of Aggregation
Related attribute columns that provide differing levels of aggregation are called dimension hierarchies. You will learn more about aggregations in the next lesson of this chapter.
Contains a Limited Set of Rows That Grow Slowly Over Time
When a data warehouse or data mart grows, it should be based on additional fact table entries. Dimension-table rows should be added only as additional attributes for the dimension are introduced.
The objectives of a dimensional design are
Careful choices at this stage will produce a design that will support the initial data warehouse implementation and also evolve as usage of the data warehouse changes over time.