DIMENSIONAL ANALYSIS

only for RuBoard - do not distribute or recompile

DIMENSIONAL ANALYSIS

One approach to data warehouse design is to develop and implement a dimensional model. This has given rise to dimensional analysis (sometimes generalized as multi-dimensional analysis ).

It was noticed quite early on when data warehouses started to be developed that, whenever decision makers were asked to describe the kinds of questions they would like to get answers to regarding their organizations, they almost always wanted the following:

  • Summarized information with the ability to break the summaries into more detail

  • Analysis of the summarized information across their own organizational components such as departments or regions

  • Ability to slice and dice the information in any way they chose

  • Display of the information in both graphical and tabular form

  • Capability to view their information over time

So as an example, they might wish to see a report showing Wine Sales by Product, or a report showing Sales by Customer, or even Sales by Product by Customer

Table 2.1 shows a typical report of sales by product.

Table  2.1. The Wine Club ” Analysis of Sales by Product for July.
Product Name Quantity Sold (Cases) Cost Price Selling Price Total Revenue Total Cost Gross Profit
Chianti 321 26.63 42.95 13,787 8,548 5,239
Bardolino 1,775 15.10 31.35 55,646 26,802 28,844
Barolo 275 46.72 70.95 19,511 12,848 6,663
Lambrusco 1,105 23.25 41.45 45,802 25,691 20,111
Valpolicella 2,475 12.88 32.45 80,313 31,878 48,435

This dimensional approach led Ted Codd to make the following observation:

There are typically a number of different dimensions from which a given pool of data can be analyzed . This plural perspective, or multidimensional conceptual view, appears to be the way most business persons naturally view their enterprise.

”E. F. Codd, 1993

So the concept of dimensional analysis became a method for defining data warehouses.

The approach is to determine, by interviewing the appropriate decision makers in an organization, which is the subject area that they are most interested in, and which are the most important dimensions of analysis.

Recall that one of the characteristics of a data warehouse is that it is subject oriented. The subject area reflects the subject-oriented nature of the warehouse.

In the example above, the subject area would be Sales. The dimensions of analysis would be Customers and Products. The requirement is to analyze sales by customer and sales by product.

This requirement is depicted in the following three-dimensional cube. Figure 2.2 shows Sales (the shaded area) having axes of:

  1. Customer

  2. Product

  3. Time

Figure 2.2. Three-dimensional data cube.
graphics/02fig02.gif

Notice that time has not been examined so far. Time is regarded as a necessary dimension of analysis (recall that time variance is another characteristic of data warehouses) and so is always included as one of the dimensions of analysis.

This means that Sales can be analyzed by Customer by Product over Time. So each element of the cube (each minicube) contains a value for sales to a particular customer, of a particular product, at a particular point in time.

The multidimensional cube in Figure 2.2 shows sales as the subject with three dimensions of analysis. There is no real limit to the number of dimensions that can used in a dimensional model, although there is, of course, a limit to the number of dimensions we can draw!

Now let's return to the Wine Club.

The directors of The Wine Club need answers to questions about sales. Looking back at the five example questions, they all concerned sales: Sales by Product, Sales by Customer, Sales by Area. As in the example above, the subject area for their data warehouse is clearly Sales.

So what are the dimensions of analysis? Well, we've just mentioned three:

  1. Product

  2. Customer

  3. Area

So is that it? Not quite; we must not forget the Time dimension. So now we have it, a subject area and four dimensions of analysis.

As we cannot draw four-dimensional models, we can represent the conceptual dimensional model as shown in Figure 2.3.

Figure 2.3. Wine sales dimensional model for the Wine Club.
graphics/02fig03.gif

The diagram in Figure 2.3 is often referred to as a Star Schema because the diagram loosely resembles a star shape. The subject area is the center of the star and the dimensions of analysis form the points of the star. The subject area is often drawn long and thin because the table itself is usually long and thin in that it contains a small number of columns but a very large number of rows.

The Star Schema is the most commonly used diagram for dimensional models.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net