OLAP Features | DB2(R) Universal Database for OS/390 V7.1 Application Certification Guide (IBM DB2 Certification Guide Series)


Team-Fly

	DB2 Universal Database for OS/390 v7.1 Application Certification Guide By Susan Lawson
	Table of Contents

	Chapter 4. Advanced SQL Coding

OLAP Features

Databases normally hold large amounts of data that can be updated, deleted, queried, and inserted on a daily basis. Databases in which data is constantly updated, deleted, and inserted are known as online transaction processing ( OLTP ) systems.

Databases that hold large amounts of data and do not have a heavy transaction work load but do have a large number of concurrent queries executing all the time, are known as decision support systems ( DSS ). Certain decision support systems have fewer queries, but each query can be very complex. These allow users to examine the data from different perspectives by performing online analytical processing ( OLAP ).

The functionality of the database is required to provide multidimensional views of relational data without a significant performance effect. DB2 provides a star join access path to assist in the processing of the star schema and snowflake (star schema with further normalized dimensions) data models.

Star Schemas

The concept of a star schema is illustrated in Figure 4-1. A business view of a highly normalized database often requires a number of attributes associated with one primary object. Each of these attributes is contained in separate tables.

Figure 4-1. Star schema in the DB2CERT database.

graphics/04fig01.gif

The following points are characteristic of a star schema design:

There is a large centralized fact table that is surrounded by normalized dimension tables. The fact table contains the fact or event. In Figure 4-1, the fact table is the TEST_TAKEN table. It contains detailed information on each test taken, including exactly which test was taken, in which center the test was taken, and who took the test.
There are a number of small dimension tables that typically hold descriptive information about an entity that has a relatively small number of rows. In Figure 4-1, the dimension tables are TEST, CANDIDATE, and TEST_CENTER.
The primary keys of the dimension tables involved in the star schema supply foreign key entries in the fact table. The concatenation of foreign keys from the dimension tables usually forms a small subset of the fact table. In Figure 4-1, the foreign keys are Candidate ID, Center ID, and Test ID.

This approach allows as few attributes as possible to be stored in the fact table. The benefit of this is that the fact table is usually very large, and therefore any data duplication in this table would be very costly in terms of storage and access times. If the DB2CERT database were used to store information on a university's entrance examinations, for example, the TEST_TAKEN table could grow enormously.

OLAP schemas such as the star schema, are frequently used for large databases in data warehousing and decision support systems. These schemas make it very important to access the data in these databases in the optimal manner. Otherwise, the joins involved in the schemas may result in poor performance due to the potentially large size and number of the rows and tables being joined.

OLAP Indexes

A typical star schema may include a large number of indexes. This is due to the ad hoc nature of queries in an OLAP environment. Such an environment is typically not subjected to constant insert or update activity and therefore does not have to suffer from significant performance degradation as a result of index maintenance.

The prime considerations of indexes in an OLAP environment are to facilitate the filtering of the large result set as quickly as possible and the joining of the tables. This is particularly important for the fact table, where multiple indexes are defined, especially on combinations of foreign key columns relating to the dimension tables. The benefit of multiple indexes in this environment is improved query performance against the fact table, as the query can focus on any combination of dimensions. Given that the fact table is the largest table, the application of restrictive join predicates and an efficient access path for the join to this table are of significant importance. The indexes defined on the tables could be either single-column or multicolumn indexes. However, the exploitation of DB2's star join access path utilizes the most cost-effective multicolumn index on the fact table for access.

There are also some maintenance issues to be considered when using multiple indexes in the OLAP environment. The first is that multiple indexes will require a certain amount of space, depending on the number of columns in each index and the size of the tables. The second is that there will be a significant one-time cost when building indexes, perhaps during a bulk load.

Star Joins

A typical query against databases designed with the star schema would consist of multiple local predicates referencing values in the dimension tables and containing join predicates connecting the dimension tables to the fact table, as shown in the following example. These types of queries are called star join s.

 SELECT name, YEAR_TAKEN,     AVG(SCORE) AS avgsc FROM (SELECT t.name, YEAR(tt.date_taken) AS YEAR_TAKEN, tt.score FROM test t, test_taken tt, test_center tc, candidate c WHERE c.cid = tt.cid AND    tc.tcid = tt.tcid AND    t.number = tt.number AND    t.name LIKE 'DB2%' AND    c.country='Canada' AND    TC.NOSEATS < 10) AS STAR_TABLE GROUP BY name, YEAR_TAKEN

NOTE

The nested table expression is required in this example purely to overcome the limitation that the GROUP BY clause can contain only columns and not expressions. The YEAR(TT.DATE_TAKEN) expression must be logically resolved first in the nested table expression. This is merely a syntactical limitation.

In this example, we wish to find the average score of DB2 tests taken by Canadian citizens in the small test centers year by year. A star join query can be difficult to execute efficiently . Even though the intersection of all dimensions with the fact table can produce a small result set, the predicates applied to a single dimension table are typically insufficient to reduce the enormous number of fact table rows.

If a join based upon related tables (dimension to fact table) does not provide adequate performance, then an alternative is to join unrelated tables. Joining of unrelated tables results in a Cartesian product, whereby every row of the first table is joined with every row of the second.

Performing a Cartesian join of all dimension tables before accessing the fact table may not be efficient. DB2 must decide how many dimension tables should be accessed first to provide the greatest level of filtering of fact table rows using available indexes. This can be a delicate balance, as further Cartesian products will produce a massive increase in the size of the intermediate result sets. Alternatively, minimal prejoining of unrelated dimension tables may not provide adequate filtering for the join to the fact table.


Team-Fly

Top