Normalized E-R models are often not well suited to addressing the needs of a business. Business information can be requested at the detail or summary level. Summary-level information often provides input to the strategic direction of the business. In many cases, you can identify important business information by determining the business questions that are asked by managers and finding out how those questions are answered. These business queries may be satisfied by a simple set of criteria, which is often visualized as a three-dimensional cube. A cube is not limited to three dimensions; hypercubes of greater than three sides are also possible. In this lesson you will learn about dimensional models, which meet these business information requirements.
After this lesson, you will be able to:
- Describe the design characteristics of online analytical processing (OLAP) systems
Estimated lesson time: 20 minutes
Business sees information in the context of answers to questions. Figure 4.3 illustrates the business perspective on information. Information is often categorized according to criteria, as in this example by product type (Cherries) and city (Detroit). It is often summarized, as in this example by quarter (Q4). Even more significant, it is often considered in a time series.
Figure 4.3 How the business looks at information
Businesses need a multidimensional view to answer complex questions, such as:
The answers to these questions are answered by metrics. A metric is a measurable or quantitative value.
There are many non-metric-based questions that can be asked of the data. The multidimensional approach would not be the first choice if your business asks analytical questions that are not answered by metrics. Questions that are not answered by metrics typically require an answer that is a list of items rather than a numeric value. For such questions, you must fall back on the relational approach of retrieving this information through a series of queries against the dimension tables. For example:
Multidimensional OLAP systems provide a business-oriented solution for answering complex questions. This approach is highly successful when the answers are most often composed of metrics or quantitative data. A different design approach is needed to address the analytical information needs of the business. This approach is characterized by four qualities: a logical design technique for analytical querying; a predictable, standard framework; ability to withstand unexpected changes in user behavior; and existing queries and applications are unaffected by system changes.
The design is not affected when
Figure 4.4 introduces the table schema design approach used for OLAP. This table schema creates a multidimensional information structure that is compatible with business needs.
Figure 4.4 Dimensional modeling: The star schema
The star schema approach was developed to resolve issues surrounding the use of normalized E-R models for business information databases. The name comes from the similarity of the schema s diagram to a star. Star schemas have these characteristics:
In rare cases, a primary key element of a fact table may not be supported by a dimension table (this is called a degenerate dimension).
NOTE
A star schema may have as many "points" (dimensions) as necessary. It is not limited to a five- or six-point form.
It is common for business categories to be hierarchical in nature. In other words, criteria can be grouped at a higher level. These levels contain subcategories, each of which contains their own subcategories. These hierarchies define "drill-up" and "drill-down" paths for query navigation. A dimensional hierarchy is an arrangement of members of a dimension into levels based on parent-child relationships, such as Year, Quarter, Month, Day, or Country, State, and City. Members in a hierarchy are arranged from general to more specific. For example:
Here is the hierarchy in outline form:
Care must be taken to segregate incompatible hierarchies. For example, Sales and Manufacturing may have independent hierarchies that both include the same entity, such as product:
Depending on the application, you may want to represent these hierarchies by different dimensions, but it is more common to have different attributes within the same dimension.
Dimensional modeling provides two approaches: consolidated dimensional hierarchies and snowflaked hierarchies.
Consolidation is an example of how denormalization is applied in dimensional models. Consolidation involves embedding the entire hierarchy in the dimension table as shown in Figure 4.5.
Figure 4.5 Consolidated dimensional hierarchies
Here the product example is consolidated:
As in the rest of the table, these attributes contain textual descriptors.
Users can access data using a single table rather than needing to know the names of multiple hierarchical tables.
There will be repeated values in a table with a consolidated hierarchy. Compared with the storage required by the fact table, the increase is nominal. Here is a sample table showing a consolidated hierarchy:
Country | Region | City | Customer |
---|---|---|---|
USA | OR | Elgin | Hungry Coyote Import Store |
USA | OR | Eugene | Great Lakes Food Market |
USA | OR | Portland | Lonesome Pine Restaurant |
USA | OR | Portland | The Big Cheese |
USA | WA | Kirkland | Trail's Head Gourmet Provisioners |
USA | WA | Seattle | White Clover Markets |
USA | WA | Walla | Walla Lazy K Kountry Store |
USA | WY | Lander | Split Rail Beer & Ale |
A snowflake schema is an extension of a star schema such that one or more dimensions are defined by multiple tables. In a snowflake schema, only primary dimension tables are joined to the fact table. Additional dimension tables are joined to primary dimension tables. Figure 4.6 illustrates a dimension with a snowflaked hierarchy. Snowflaking is more closely related than a star schema is to techniques used in normalization. Snowflaking gets its name from the image that results when you add the hierarchical table extensions to the star schema diagram. In Figure 4.6, the product hierarchy example is snowflaked.
Figure 4.6 Snowflaked hierarchies
The following table compares the consolidated and snowflake hierarchy modeling approaches.
Criteria | Consolidated | Snowflaked |
---|---|---|
Overall row count/space required | Higher | Lower |
Model understandability | Easier | More difficult |
Number of tables | Less | More |
Query complexity | Simple | More complex |
Dimensional searching | Quicker | Slower |
Bitmapped indexing | Supports | Inhibits |
Unless absolutely necessary, snowflaking should be avoided. Although the snowflaked approach results in a lower overall row count, the disadvantages often outweigh the benefits.
This approach results in a higher overall row count, but it does provide benefits:
Figure 4.7 introduces the characteristics of the information object dimension tables of the schema. First, examine three of the points of the star (all except the time dimension, which is a special case). These are the information objects identified during the requirements-gathering process.
Figure 4.7 Information object dimension tables
These points of the star are used to describe the information objects that exist within a specific business process in this case, retail sales.
Each of the dimension tables share the following characteristics:
Regardless of the process being represented, this is the one dimension that is virtually guaranteed to be present in every star. It provides the historical element to the warehouse. The time key may (or may not) store date values. It could represent a month, fiscal period, or some other time division.
The time dimension has the following characteristics:
The fact table is the target of all analytical queries. While the dimensions provide constraints, the fact table provides the answers. The fact table has the following characteristics:
Additivity is crucial because decision support applications rarely retrieve a single fact table record; rather, they retrieve hundreds, thousands, or millions of records at a time.
In special rare types of data warehouse for example, data warehouses that track workflow fact tables for event-handling processes may not have any numerical attributes. The combination of dimensions denotes an event instance. In essence, these systems typically perform value counts to return numeric results from nonnumeric data.
One popular method of creating a dimensional table schema is to start with the normalized E-R model from an existing OLTP system.
Because both design approaches are based on an RDBMS, it is reasonable to believe that the results from one approach could be used to develop the other. Let s take a look at the steps:
The stars are based on discrete business processes.
There must be a dimension for each criterion that users may want to see data on in the OLAP environment. In other words, every time a users states, "We want to view this data by category/region/date/customer/shipper," there must be a representation of that criterion in the designed dimensions.
In general, avoid snowflaking of dimensional hierarchies. Snowflaking makes it more difficult for the end user to locate constraining values, and, in some cases, it can degrade query performance.