Materialized Query Tables

 <  Day Day Up  >  

graphics/v8_icon.gif

DB2 Version 8 offers a new capability called Materialized Query Tables , or MQTs . Although not exclusively for data warehousing, MQTs can be used to greatly improve the elegance and efficiency of DB2-based data warehouses. An MQT can be thought of as a view that has been materialized ”that is, a view whose data is physically stored instead of virtually accessed when needed. Each MQT is defined as a SQL query, similar to a view. But the MQT pre-computes the query results and stores the data. Subsequent user queries that require the data can re-use the data from the MQT instead of re-computing it, which can save time and resources.


NOTE

Materialized query tables are sometimes referred to as automatic summary tables (ASTs) because this used to be the name of this feature on DB2 for Linux, Unix, and Windows platforms.


Why Use MQTs?

graphics/v8_icon.gif

The primary use for MQTs is to optimize the performance of complex queries. Complex queries can be poor performers, because DB2 might have to access thousands of rows across multiple tables using multiple methods . By materializing the complex query into an MQT and then accessing the materialized results, the cost of materialization is borne only once ”when the MQT is refreshed.


So, you might consider using MQTs for your existing queries that are most complex and inefficient. Another approach is to consider using MQTs instead of denormalization (when denormalization is required). Simply implement the fully normalized base tables and then build MQTs where you would have denormalized . Then you get the best of both worlds ”fully normalized tables to ensure data integrity during modification and MQTs for efficient querying.

Potential Drawbacks of MQTs

But there are potential drawbacks to using MQTs. These problems cross the spectrum from data currency to resource consumption to administration.

First of all, MQTs are not magic; they need to be refreshed when the data upon which they are based changes. Therefore, the underlying data should be relatively static or, failing that, your queries should be satisfied with somewhat out-of-date data. If neither of these situations is the case, MQTs might not be a reasonable solution because the materialized data will need to be constantly refreshed.

Additionally, MQTs consume disk storage. If your shop is storage-constrained, you might not be able to create many MQTs. Remember, an MQT will query underlying tables and then physically store that data. The tradeoff for MQTs is using more disk space in return for more efficient queries.

Finally, keep in mind that MQTs need to be maintained . If data in the underlying base table(s) changes, then the MQT must periodically be refreshed with that current data. If the MQT is not used often enough, the cost and effort of maintaining the MQT may exceed the benefit in terms of performance.

How to Create MQTs

graphics/v8_icon.gif

There are two methods for creating an MQT: You can create it from scratch using CREATE TABLE or you can modify an existing table into an MQT using ALTER TABLE .


The first method uses the CREATE TABLE statement using syntax that has been augmented to look like a view definition. Consider the following, for example:

 

 CREATE TABLE DEPT_SAL   (DEPT, TOTAL_SALARY, TOTAL_BONUS, TOTAL_COMM, TOTAL_COMPENSATION, EMPLOYEES) AS   (SELECT   WORKDEPT, SUM(SALARY), SUM(BONUS) SUM(COMM),             SUM(SALARY+BONUS+COMM), COUNT(*)    FROM     DSN8810.EMP    GROUP BY WORKDEPT) DATA INITIALLY DEFERRED REFRESH DEFERRED MAINTAINED BY SYSTEM DISABLE QUERY OPTIMIZATION; 

Let's examine each section of this DDL. First of all, we are creating a table named DEPT_SAL . The first set of parameters is the list of column names. This list is optional; DB2 will use the original names of the columns from the subsequent query if no list of names is provided. In such a scenario, every expression, constant, or function must be named using an AS clause.

The actual SELECT statement that defines this MQT follows . So far, so good ”this statement looks very much like a CREATE VIEW statement, except we are creating a table.

NOTE

If the SELECT statement defining the MQT references any CAST functions or user-defined functions, the owner of the MQT must have the EXECUTE privilege on those functions.


After the SELECT statement, there are several parameters that define the nature of the MQT.

Refreshable Table Options

When you create an MQT there are several options available to specify how the data is to be populated and refreshed into the MQT. Population is deferred for MQTs defined for DB2 for z/OS. There are other options, though, for DB2 on other platforms. That is why there are two parameters for deferring data population even though this is currently the only choice. These parameters are

DATA INITIALLY DEFERRED ” This options indicates that data will not be inserted into the MQT when it is first created. Instead, either the REFRESH TABLE statement or INSERT statements must be used to populate data into the MQT.

REFRESH DEFERRED ” This option indicates that data in the table can be refreshed at any time using the REFRESH TABLE statement. The data in the table only reflects the result of the query as a snapshot at the time when the REFRESH TABLE statement is processed or when it was last updated for a user-maintained materialized query table.

Furthermore, you can specify whether the MQT is to be maintained by the system or the user. These options are

MAINTAINED BY SYSTEM ” Indicates that the MQT is maintained by the system. This option is the default and it means that the MQT does not allow LOAD , INSERT , UPDATE , DELETE , or SELECT FOR UPDATE statements. The REFRESH TABLE statement is used to populate data in the MQT.

MAINTAINED BY USER ” Indicates that the MQT is maintained by the user. The user can execute LOAD , INSERT , DELETE , UPDATE , SELECT FOR UPDATE , or REFRESH TABLE statements to populate the MQT.

Query Optimization Options

The CREATE statement also provides parameters to specify how DB2 uses the MQT for query optimization. Basically, there are two choices: You will either enable or disable query optimization. The choice you make will impact the type of SELECT that can be used by the MQT being defined.

The default option is ENABLE QUERY OPTIMIZATION . When this option is chosen , the MQT can be used for query optimization. However, if the SELECT statement used to define the MQT does not satisfy the following restrictions of query optimization, an error will occur:

  • The SELECT statement must be a subselect . This means you can specify a SELECT , a FROM , a WHERE , a GROUP BY , and HAVING . You cannot specify a UNION or UNION ALL , though.

  • The subselect cannot reference a scalar or table UDF with the EXTERNAL ACTION or NON-DETERMINISTIC attributes

  • The subselect cannot use the RAND built-in function.

  • The subselect cannot contain any predicates that include subqueries.

  • The subselect cannot contain a nested table expression or view that requires materialization.

  • The subselect cannot contain a join using the INNER JOIN syntax.

  • The subselect cannot contain an outer join.

  • The subselect cannot contain a special register.

  • The subselect cannot contain a scalar fullselect.

  • The subselect cannot contain a row expression predicate.

  • The subselect cannot contain sideway references.

  • The subselect cannot contain table objects with multiple CCSID sets.

  • If the subselect references a view, the fullselect in the view definition must follow all of the previous rules.

Optionally, you can specify DISABLE QUERY OPTIMIZATION . Of course, this means that the MQT cannot be used for query optimization, but it can be queried directly. When query optimization is disabled, be aware of the following restrictions on the SELECT statement in the MQT:

  • It cannot reference a temporary table ”neither created nor declared.

  • It cannot reference another MQT.

Attribute Copy Options

When an MQT is created, column attributes are not inherited from the underlying columns of the SELECT statement. The MQT creator must specify how to handle IDENTITY columns and default values.

The attributes of IDENTITY columns in an MQT can either be inherited or not by specifying either of the following:

  • EXCLUDING IDENTITY COLUMN ATTRIBUTES

  • INCLUDING IDENTITY COLUMN ATTRIBUTES

If neither is specified, the default is to exclude IDENTITY column attributes. When you choose to include IDENTITY column attributes, be sure that your SELECT statement maps existing columns to the IDENTITY column such that the attributes can continue to be used correctly.

The default value specification for columns can be controlled by specifying one of the following:

  • EXCLUDING COLUMN DEFAULTS ” Indicates that column defaults will not be inherited from the source table. The default values of the column of the new table are either null or there are no default values. If the column can be null, the default is the null value. If the column cannot be null, there is no default value, and an error occurs if a value is not provided for a column on INSERT for the new table.

  • INCLUDING COLUMN DEFAULTS ” Indicates that column defaults will be inherited from the source table. However, columns that are not updatable will not have a default defined.

  • USING TYPE DEFAULTS ” Indicates that the default values depend on the data type as delineated in Table 45.2.

Table 45.2. Default Data Types

Data Type

Default Value

Numeric

Fixed-length string

Blanks

Varying-length string

A string of length 0

Date

Current Date

Time

Current Time

Timestamp

Current Timestamp


Inheriting FIELDPROC s

The MQT will inherit an existing FIELDPROC from a column in the SELECT -list if that column can be mapped directly to a column of a base table or a view in the FROM clause.

WITH NO DATA

When the WITH NO DATA clause is used to define an MQT the table actually ceases to be an MQT. Using WITH NO DATA means that the table is not populated with the results of the query. Instead, the SQL statement is used to define the columns of the new table.

NOTE

To maintain compatibility with DB2 running on other platforms, the clause DEFINTION ONLY can be used as a synonym for WITH NO DATA .


The SELECT statement used when creating a table specifying the WITH NO DATA clause cannot contain any of the following:

  • Host variables

  • Parameter markers

  • Any references to remote objects

  • A ROWID column (or a column with a distinct type based on a ROWID )

  • Any columns having a BLOB , CLOB , or DBCLOB data type (or a distinct type based on any of these data types)

  • PREVIOUS VALUE or NEXT VALUE expressions

  • An INSERT statement in the FROM clause

Converting an Existing Table into an MQT

graphics/v8_icon.gif

Some shops implemented DB2 tables to operate as a type of materialized query table prior to DB2 V8. After migrating to V8, it makes sense for these shops to consider converting these existing tables into MQTs to take advantage of automatic query rewrite and the automated refresh features built in to DB2. An existing table can be converted to an MQT using the ALTER TABLE statement and the ADD MATERIALIZED QUERY clause. For example


 

 ALTER TABLE DEPT_SAL   ADD MATERIALIZED QUERY   (SELECT   WORKDEPT, SUM(SALARY), SUM(BONUS) SUM(COMM),             SUM(SALARY+BONUS+COMM), COUNT(*)    FROM     DSN8810.EMP    GROUP BY WORKDEPT) DATA INITIALLY DEFERRED REFRESH DEFERRED MAINTAINED BY USER; 

This ALTER statement causes the existing DEPT_SAL table to be converted into an MQT. It defines the query to be used, as well as the additional MQT parameters. Note that we used the MAINTAINED BY USER to allow on-going user maintenance tasks to continue to work. Of course, we can now use the REFRESH TABLE statement, too.

After converting the table to an MQT, its data will remain the same until the user refreshes it.

MQT Population and Maintenance

graphics/v8_icon.gif

As of DB2 V8, when an MQT is created, data is not initially populated into the table. The manner in which data is added to the MQT depends on the MAINTAINED BY option, as previously discussed. Most MQTs will be maintained by the system, and as such, data will be populated when the REFRESH TABLE statement is executed. The REFRESH TABLE statement is implemented "behind the scenes" by DB2 using DELETE , INSERT , and UPDATE , so MQTs follow DATA CAPTURE and AUDIT rules.


If the MQT is specified as MAINTAINED BY USER though, it can be refreshed using the LOAD utility, INSERT , UPDATE and DELETE statements, as well as the REFRESH TABLE statement.

The REFRESH TABLE statement is easy to understand and execute. You simply specify the name of the MQT to refresh, and DB2 automatically rebuilds the data for the MQT using the SELECT statement upon which the MQT was defined. For example, to refresh the sample table we have been using, you would issue the following:

 

 REFRESH TABLE DEPT_SAL; 

Running this statement causes DB2 to delete all rows in the MQT, runs the SELECT statement in the MQT, inserts the results into the MQT, and updates the DB2 catalog to modify the timestamp and cardinality metadata for the MQT. Keep in mind, though, that this all happens as one unit-of-work. So, either the entire refresh is completed, or it fails ”there will be no in-between state. Also, all of the changes made by the REFRESH TABLE statement are logged.

The SELECT will run with the same isolation level that was in effect at the time the CREATE TABLE statement for the MQT was issued. This is important because it can impact the results of the SELECT ; but the isolation level of the MQT also impacts automatic query rewrite, which is discussed next.

Automatic Query Rewrite

Up until now we have merely discussed how MQTs can be used to simplify the propagation of data from multiple tables into a physical query table. But MQTs are much more useful because the DB2 optimizer understands them.

Your queries can continue to reference the base table. But during access path selection, the optimizer will examine your query to determine whether your table(s) can be replaced by an MQT to reduce the query cost.

The process undertaken by the DB2 optimizer to recognize when an MQT can be used and then rewrite the query to use the MQT is called automatic query rewrite , or AQR .

For AQR to be invoked for a query, the result that query must be derivable from the MQT definition. The query need not match the SELECT in the MQT definition exactly. When the query can be resolved using the MQT, the query and the MQT are said to match . When DB2 finds a match, the submitted query is rewritten by the optimizer to use the MQT.

By using AQR, MQTs can be utilized without the user having to rewrite his SQL statements ”instead DB2 rewrites any SQL statement that matches. The goal is to reduce query execution cost. After the optimizer uses AQR to rewrite the query, it will compare the access path of the rewritten query against the original access path, and the one with the lowest cost will be run.

An EXPLAIN will show whether AQR was invoked to use an MQT. If the final query plan comes from a rewritten query, the PLAN_TABLE will show the new access path using the name of the matched MQTs in the TNAME column. Additionally, the TABLE_TYPE column will be set to M to indicate that an MQT was used.

NOTE

You can also determine whether AQR was used through IFCID 0022 , which contains the mini-plan performance trace record.


How to Encourage AQR

Many factors can influence whether AQR is considered for use by the optimizer. A big factor is the way in which you build your MQTs and write your queries. But other factors are involved including database design and some new special registers.

MQT DDL for AQR

The query optimization options discussed previously in this chapter will be a big determining factor in whether DB2 will consider the MQT for AQR. Fortunately, the default is ENABLE QUERY OPTIMIZATION , which specifies that the MQT can be exploited by AQR. However, if you specify DISABLE QUERY OPTIMIZATION , the MQT will not be considered for AQR.

The Impact of REFRESH TABLE on AQR

Additionally, the optimizer is somewhat aware of the freshness of system-maintained MQTs. AQR will be used for a system-maintained MQT only if a REFRESH TABLE has occurred. Of course, the MQT may not be up-to-date, but DB2 knows that the MQT was refreshed at least once.

The time between refreshes is recorded in the REFRESH_TIME column of the SYSIBM.SYSVIEWS table in the DB2 Catalog.

No such restriction exists when using user-maintained MQTs. Because such an MQT is user maintained, DB2 cannot know whether the data is fresh or not, nor will DB2 maintain the REFRESH_TIME data in the DB2 Catalog. Therefore DB2 does not restrict user-maintained MQTs usage for AQR if the MQT is defined with the ENABLE QUERY OPTIMIZATION specification.

Special Registers

There are new special registers in DB2 V8 that can be set to control AQR. These special registers are CURRENT REFRESH AGE and CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION :

CURRENT REFRESH AGE ” Used to control how "fresh" the MQT data must be for it to be considered for AQR. The refresh age of an MQT is the time between the current timestamp and the time that REFRESH TABLE was last run for this MQT. This special register can be set to the following values:

” Indicates that DB2 will not consider AQR for any MQTs. A zero in this register means that only current MQTs are considered by AQR. But DB2 V8 does not support immediately refreshed MQTs, so AQR will not be used.

ANY ” Indicates that all MQTs are considered by AQR.

NOTE

You can set the REFSHAGE DSNZPARM to specify a default value for CURRENT REFRESH AGE to be used across the DB2 subsystem (or on the CURRENT REFRESH AGE field of the DSNTIP4 panel during installation).


CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION ” Indicates the type of MQTs to be considered by AQR; that is system-maintained, user-maintained, or both. This special register can be set to the following values:

ALL ” Indicates that all MQTs will be considered by AQR.

NONE ” Indicates that no MQTs will be considered by AQR.

SYSTEM ” Indicates that only system-maintained MQTs will be considered by AQR.

USER ” Indicates that only user-maintained MQTs will be considered by AQR.

NOTE

You can set the MAINTYPE DSNZPARM to specify a default value for CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION to be used across the DB2 subsystem (or on the CURRENT MAINT TYPES field of the DSNTIP4 panel during installation).


Keep in mind, too, that MQTs having the DISABLE QUERY OPTIMIZATION specification are never eligible for AQR, regardless of the setting of CURRENT MAINT TYPES .

Types of Queries Supported by AQR

Only certain specific types of queries can be impacted by AQR. As of DB2 V8, only queries that are dynamically prepared and read only can be rewritten by AQR. Static SQL cannot be rewritten by AQR to use MQTs. If you wish to use an MQT in static SQL, you will have to manually code the SQL to access the MQT. AQR cannot be used.

The DB2 optimizer considers AQR at the query block level. So each block of a query might be rewritten. For a query block to be rewritten to use an MQT it must not contain an outer join, the RAND function, or a user-defined scalar or table function with the EXTERNAL ACTION or NON-DETERMINISTIC attribute. Furthermore, SELECT statements in an INSERT or on the SET condition of an UPDATE are not considered for AQR.

Additionally, only MQTs that have an isolation level equal to or higher than the query isolation level will be considered during the AQR process.

Let's look at a quick example. Remember our sample MQT from before:

 

 CREATE TABLE DEPT_SAL   (DEPT, TOTAL_SALARY, TOTAL_BONUS, TOTAL_COMM, TOTAL_COMPENSATION, EMPLOYEES) AS   (SELECT   WORKDEPT, SUM(SALARY), SUM(BONUS) SUM(COMM),             SUM(SALARY+BONUS+COMM), COUNT(*)    FROM     DSN8810.EMP    GROUP BY WORKDEPT) DATA INITIALLY DEFERRED REFRESH DEFERRED MAINTAINED BY SYSTEM ENABLE QUERY OPTIMIZATION; 

Now, assume that we submit the following query:

 

 SELECT   DEPT, AVG(SALARY+BONUS+COMM) FROM     DSN8810.EMP GROUP BY WORKDEPT; 

Obviously, AQR should be able to rewrite this query to utilize the DEPT_SAL MQT. This is so because the query conforms to the rules for AQR, and the information requested is a subset of the information in the MQT.

The Impact of Database Design on AQR

AQR relies on referential constraints defined between tables to help determine whether a query can be rewritten or not. The referential constraints are used by AQR when the MQT contains extra tables that are not referenced by the query.

In many shops, though, DB2 RI is not used. However, most databases have some form of referential integrity ”and if it is not defined to DB2, it is maintained by the application.

graphics/v8_icon.gif

Informational referential constraints are introduced in DB2 V8 to assist AQR. An information referential constraint is not maintained by DB2 during INSERT , UPDATE , and DELETE processing or by the LOAD and CHECK utilities. But the AQR process will consider the constraint to help determine whether queries can be rewritten.


If you have tables where referential integrity is maintained programmatically, seriously consider building information referential constraints when you build MQTs on those tables.

An informational referential constraint can be defined using either CREATE TABLE or ALTER TABLE . It is defined like a normal constraint, but with the NOT ENFORCED keyword. For example, the following column DDL extract defines an information referential foreign key constraint for the MGRNO column to the EMPNO of the EMP table:

 

 . . . MGRNO CHAR(6) NOT NULL,   CONSTRAINT MGREMP FOREIGN KEY (EMPNO)   REFERENCES EMP NOT ENFORCED, . . . 

CAUTION

Although informational referential constraints are not enforced by DB2 in most contexts, they are used by the QUIESCE and REPORT TABLESPACESET utilities. Additionally, when using LISTDEF with the RI option, information referential constraints will be considered.


ALTER and Query Rewrite

The ALTER TABLE statement can be used to change the MQT's setting to specify whether query rewrite is enabled or disabled. If you alter an MQT to specify ENABLE QUERY OPTIMIZATION then that MQT is immediately eligible for query rewrite. This is so even if the MQT is empty, so be cautious.

Usually, the proper course of action will be to ALTER the table to an MQT with query optimization disabled. Then the MQT should be refreshed ”and only then, when the data is accurate, should query optimization be enabled for the MQT.

NOTE

You can also use the ALTER TABLE statement to change an MQT into a base table.


MQT Guidelines

graphics/v8_icon.gif

Consider the following guidelines as you implement materialized query tables in your shop.


Enable Query Optimization After Creation

Create user-maintained MQTs with query optimization initially disabled. After the MQT is populated, enable the table for query optimization. If you create the table initially enabled for query optimization, DB2 might rewrite queries to use the empty MQT. And though the query would run quickly, you would not get the desired results.

As Always, Avoid SELECT *

If you create an MQT using SELECT * , the columns of the MQT will be determined at the time the MQT is created. To avoid confusion, always explicitly list the columns in the SELECT -list of the MQT definition.

Avoid Defining MQTs with HAVING

Consider leaving the HAVING clause off of MQTs defined with GROUP BY . Creating an MQT with a HAVING clause is rarely worth the effort. Generally, the HAVING clause reduces the usefulness , and therefore the benefit of the MQT.

An MQT that aggregates data using GROUP BY can be queried for all values grouped. Specifying HAVING will reduce the amount data that can be queried.

Create Indexes on MQTs

DB2 can use indexes to optimize data retrieval from a materialized query table. Build indexes on your MQTs based on their access characteristics and usage patterns.

Use Segmented Table Spaces for MQTs

Consider standardizing on segmented table spaces for your MQTs. This is wise because REFRESH TABLE will perform better if the MQT is stored in a segmented table space. When REFRESH TABLE is executed to populate an MQT, DB2 uses a mass delete to clear the existing data from the MQT. A mass delete also is more efficient for segmented table spaces because of its more efficient use of the space map.

Consider Running EXPLAIN on your REFRESH TABLE Statements

You can run an EXPLAIN on the REFRESH TABLE statement to analyze the access paths used to populate the MQT with data. The PLAN_TABLE output will contain rows for INSERT with the SELECT in the MQT definition.

MQTs Cannot Be Unique

Although most DB2 table characteristics can be applied to MQTs, you cannot specify unique constraints or unique indexes on them. Additionally, because an MQT cannot have a unique key, you cannot make an MQT the parent table in a referential constraint.

Monitor BIND Time with MQTs

Be aware that it might take longer to BIND plans and packages when MQTs are in use. Queries referencing tables on which MQTs are defined can experience increased BIND time due to the DB2 Catalog access and additional processing required during the automatic query rewrite phase.

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net