Dimensions

4.5 Dimensions

With Oracle 8i, in addition to defining primary-foreign-key relationships among tables, you can also describe the relationships between the columns within one or more tables by creating a new schema object, called a dimension. Telling Oracle about such relationships in the data makes it possible for the advisor to recommend the best set of summaries to create, for the optimizer to rewrite more queries to use the summaries, and for the OLAP tools to perform rollup and drill-down operations. We will see the power of dimensions in the section 4.6, where we talk about query rewrite.

Typically, relationships between fact and dimension tables are specified using referential integrity constraints. Dimensions are used to define the following types of relationships:

  • The hierarchical relationships among the data within the dimension tables. This lets Oracle know what data can be rolled up to higher levels within the hierarchy for summarization and how to drill down to a lower level in the hierarchy to see more detail. Hierarchical relationships are described using HIERARCHY clauses in the CREATE DIMENSION statement.

  • Functional dependencies or one-to-one relationships between two columns in a dimension table. This is useful for looking up one value based on another value in the table. For example, given a date, we can determine the day of the week. Given a month number, we can determine the month name. Given a postal code, we can determine the town. Given a product_name, we know its manufacturer. One-to-one relationships are defined using the ATTRIBUTE clause in the CREATE DIMENSION statement.

4.5.1 Creating a dimension

Once the tables that contain the dimension data is created, you can create the DIMENSION object, using the CREATE DIMENSION statement. It helps to draw a bubble diagram showing the hierarchical relationships among the columns prior to defining a dimension.

Figure 4.6 shows a bubble diagram for the geography dimension. Postal codes roll up into towns, which roll up into counties, which, in turn, roll up into countries.


Figure 4.6: Bubble diagram for the geography dimension.

To convert the bubble diagram in Figure 4.6 into a dimension definition, each bubble in the diagram becomes a LEVEL in the hierarchy, as shown in the following CREATE DIMENSION statement.

 CREATE DIMENSION geography_dim LEVEL postal_code IS customer.postal_code LEVEL town        IS customer.town LEVEL county      IS customer.county LEVEL country     IS customer.country HIERARCHY loc_rollup (           postal_code   CHILD OF           town          CHILD OF           county        CHILD OF           country          ); 

The relationships between the levels are described with the HIERARCHY clause. The bubbles with arrows coming out of them in the diagram are described with the CHILD OF clause. In our example, postal code (zip code) rolls up into town; therefore, postal_code is a CHILD OF town. Town rolls up into county, therefore, town is a CHILD OF county. County rolls up into country, so county is a CHILD OF country.

For any value of a child column in a hierarchy, there must be one, and only one, value of its parent column. In the customer table used in the example, postal code 02134 is in Boston, which is in Massachusetts. Since a postal code is unique to any given town, it satisfies this rule. Postal code 02134 refers to addresses in Boston only. It cannot also be used to refer to addresses in San Francisco.

4.5.2 Defining a dimension with multiple hierarchies

Sometimes the same data can be rolled up in different ways. In our sample schema, EASYDW, as in many businesses, we use both a regular calendar and a fiscal calendar. In the regular calendar, days roll up into months, which roll up into years. In the fiscal calendar, days roll up into weeks, which roll up into fiscal quarters. This can be represented by defining multiple hierarchies within the dimension.

Figure 4.7 shows the bubble diagram for the time dimension. It contains two hierarchies, describing the different ways the time_key can be rolled up. Each arrow coming out of a bubble is described as a CHILD OF clause. Since there are two arrows coming out of the time_key bubble, there are two CHILD OF statements—one in each hierarchy. From a given date, one hierarchy tells us which week and fiscal quarter this date is in and the other tells us the month and year. At the top of every bubble diagram is the special level "ALL," representing the grand total level for that dimension. All levels of a hierarchy can be rolled up to ALL. We don't explicitly define the ALL level in the CREATE DIMENSION statement.

click to expand
Figure 4.7: Bubble diagram for the time dimension.

The following SQL segment is an example of a dimension with multiple hierarchies.

 CREATE DIMENSION time_dim  LEVEL time_key    IS time.time_key  LEVEL month       IS time.month  LEVEL quarter     IS time.quarter  LEVEL year        IS time.year  LEVEL week_number IS time.week_number  HIERARCHY calendar_rollup (                             time_key  CHILD OF                             month     CHILD OF                             Year                             )  HIERARCHY fiscal_rollup (                           time_key    CHILD OF                           week_number CHILD OF                           quarter                           ); 

Again, for each value of a child column in a hierarchy, there is only one parent value. Any given date, say March 21, 2002 falls into one, and only one fiscal week_number and in a specific quarter.

4.5.3 Defining a dimension with attributes

Dimensions can be used to define relationships between columns within the same table that are not part of the hierarchy. Often users do not know the unique identifier for a record but know the name. For example, a customer may not know the product_id, but does know the product_name of what he or she is interested in purchasing. In relational database theory these relationships are called functional dependencies, or one-to-one relationships. A functional dependency from column A to B means that for every value of A, there is only one value of B. So if you know the value of A, then you can determine the value of B.

In a dimension definition, the ATTRIBUTE clause is used to define any functional dependencies. The ATTRIBUTE clause should be used only when there is a one-to-one relationship between the two columns.

Here we see the definition of a dimension with the attribute clause. The attribute clause tells us that given a product_id, there is only one product_name and one manufacturer. However, it does not mean that given the manufacturer we can determine the product_id.

 CREATE DIMENSION product_dim  LEVEL product_id   IS product.product_id  LEVEL category     IS product.category  HIERARCHY merchandise_rollup (                                product_id   CHILD OF                                category                                )  ATTRIBUTE product_id DETERMINES (product_name, manufacturer); 

4.5.4 Defining a dimension with normalized tables

If you have a snowflake or other normalized schema, then your hierarchy may refer to columns in multiple tables. The dimension statement in Oracle allows you to declare such relationships as well.

Figure 4.8 shows a normalized time dimension. The bubbles and bold arrows define the rollup relationships as before. The dotted rectangles represent the tables where the levels come from. The dotted arrows show how the tables join to each other. In this example, there are separate tables for time, week, month, quarter, and year. The time table joins to the week table using the predicate time.week = week.week_number and to the month table using the predicate time.month = month.month.

click to expand
Figure 4.8: A normalized dimension.

To convert this diagram to a CREATE DIMENSION statement, you would first define a level for each of the bubbles, specifying both the table name and column name for that level. For example, the level week is week.week_number. Next you would define your hierarchy using the level names as before. Finally, you would specify any joins that must be performed when traversing up the hierarchy using the JOIN KEY clause. Thus you would get the following CREATE DIMENSION statement:

 CREATE DIMENSION time_dim  LEVEL time_key IS time.time_key  LEVEL month    IS month.month  LEVEL quarter  IS quarter.quarter  LEVEL year     IS year.year  LEVEL week     IS week.week_number  HIERARCHY calendar_rollup (                             time_key CHILD OF                             month    CHILD OF                             year                             JOIN KEY time.month REFERENCES month                             JOIN KEY month.year REFERENCES year                            )  HIERARCHY fiscal_rollup   (                             time_key CHILD OF                             week     CHILD OF                             quarter                             JOIN KEY time.week    REFERENCES week                             JOIN KEY week.quarter REFERENCES quarter                             ); 

In the fiscal_rollup hierarchy in this example, the JOIN KEY time.week column is used to join the time table to the week level.

As discussed in Chapter 2, normalized dimension tables incur the overhead of extra joins during query processing and hence must be used with care.

4.5.5 Validating a dimension

We have said that the relationships declared by the dimension object are not automatically checked by Oracle. In order to get accurate results with query rewrite, you must ensure that these relationships hold in your data. To help with this, Oracle provides a procedure known as DBMS_OLAP.VALIDATE_DIMENSION. You can run this procedure every time new data is loaded into your dimension tables to ensure data integrity.

For example, suppose we had the following customer dimension:

 CREATE DIMENSION customer_dim   LEVEL customer IS customer.customer_id   LEVEL town     IS customer.town   LEVEL region   IS customer.county   HIERARCHY  customer_zone (                             customer CHILD OF                             town     CHILD OF                             region                            )   ATTRIBUTE town DETERMINES postal_code   ATTRIBUTE customer DETERMINES (sex, occupation); 

According to this dimension, town determines the postal_code, and every town should fall in one, and only one, county. Let us check that this is indeed the case.

 SELECT distinct town, county, postal_code FROM customer; TOWN       COUNTY     POSTAL_COD ---------- ---------- ---------- Chandlers  Hants      SO53 1TZ Eastleigh  Hants      SO53 1LD Liverpool  Mersey     LV1 1QT London     London     W1 1QC Soton      Hants      SO6 1gh Soton2     Hants      SO1 1TF Soton3     Hants      SO1 1TF Soton4     Hants      SO1 1TF Soton5     Hants      SO1 1TF Soton6     Hants      SO1 1TF Soton7     Hants      SO1 1TF Soton8     Hants      SO1 1TF Soton9     Hants      SO1 1TF 

Now suppose some bad data got inserted into the customer table. The postal code for London was mistyped as W1 2QC and the town of Liver-pool was misclassified in the Hants county.

 INSERT INTO customer VALUES ('AB130000', 'London', 'London', 'W1 2QC', '7-JUL-74', 'UK', 'Doctor', 'F'); INSERT INTO customer VALUES ('AB130001', 'Liverpool', 'Hants', 'LV1 1QT','18-JUN-73', 'UK', 'Doctor', 'M'); COMMIT; 

If we run VALIDATE_DIMENSION, it will detect that there is a discrepancy in the HIERARCHY and ATTRIBUTE relationships declared by the customer_dim dimension.

 variable run_id number; execute dbms_olap.create_id(:run_id); execute dbms_olap.validate_dimension('CUSTOMER_DIM', 'EASYDW',                                      FALSE, TRUE, :run_id); 

The VALIDATE_DIMENSION procedure will verify the integrity of the data and place any exceptions found in the MVIEW_EXCEPTIONS table.

 SELECT distinct owner, table_name, dimension_name, relationship FROM system.MVIEW_EXCEPTIONS WHERE runid = :run_id; OWNER   TABLE_NAME  DIMENSION_NAME  RELATIONSHIP ------  ----------- --------------  ------------ EASYDW     CUSTOMER   CUSTOMER_DIM     ATTRIBUTE EASYDW     CUSTOMER   CUSTOMER_DIM     CHILD OF 

The bad_rowid column of MVIEW_EXCEPTIONS gives the table rowids with the discrepancy (i.e., all rows corresponding to the violated relationship). In our example, it would return all rows corresponding to the towns of London and Liverpool. We can now look at the actual data values by looking up the customer table, as follows. We can see that customer_ids AB130000 and AB130001 have mismatched values for town, postal_code, and county with respect to the remaining data.

 SELECT customer_id, town, county, postal_code FROM customer WHERE rowid IN (select bad_rowid FROM system.mview_exceptions); CUSTOMER_I TOWN       COUNTY     POSTAL_COD ---------- ---------- ---------- ---------- AB123477   London     London     W1 1QC AB123478   London     London     W1 1QC AB123480   London     London     W1 1QC ... AB130000   London     London     W1 2QC    <- bad data AB123496   Liverpool  Mersey     LV1 1QT AB123497   Liverpool  Mersey     LV1 1QT AB123498   Liverpool  Mersey     LV1 1QT ... AB130001   Liverpool  Hants      LV1 1QT   <- bad data 

In the next section, we will look at how query rewrite can be used to transparently rewrite queries with materialized views. We will also see how dimension objects are used.



Oracle9iR2 Data Warehousing
Oracle9iR2 Data Warehousing
ISBN: 1555582877
EAN: 2147483647
Year: 2005
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net