EXPLAIN Guidelines

 <  Day Day Up  >  

EXPLAIN Guidelines

Implement the following guidelines to effectively EXPLAIN and optimize the SQL statements used in your DB2 applications.

Influence the Optimizer to Obtain Efficient Access Paths

You can influence the optimizer to choose different access paths in a variety of ways. Methods for accomplishing this task are outlined in Chapter 28, "Tuning DB2's Components." The best approach for influencing the Optimizer is to use optimization hints. This approach uses the PLAN_TABLE to define the access path you want DB2 to use.

Populate the EXPLAIN Tables in Production

Bind production packages and plans using EXPLAIN(YES) . This way, you can create a trail of access paths, cost estimates, and function resolution information that can be examined in the event of a performance problem or UDF bug.

It is a good practice to use a managed, common PLAN_TA BLE in production. You do not want to have to search through multiple PLAN_TABLE s to find explain output.

Understand How the PLAN_TABLE Is Populated

When EXPLAIN is issued a PLAN_TABLE is required to receive the EXPLAIN output. If the EXPLAIN statement is embedded in an application program, the authorization rules that apply are those defined for embedding the specified SQL statement in an application program. In addition, the authid of the owner of the plan or package must also either be the owner of the PLAN_TABLE or have an alias on another plan table (with SELECT and INSERT privileges).

Educate All DB2 Technicians in the Use of EXPLAIN

Train all technical DB2 users in the use of EXPLAIN . Although not everyone will be able to analyze the results in depth, all programmers, analysts, and systems programmers should understand, at a minimum, how to issue EXPLAIN for plans, packages, and single SQL statements, the meaning of each column in the PLAN_TABLE , and how to identify whether an index was used for a query.

Identify Modifications with Care

Use the QBLOCK_TYPE column in the PLAN_TABLE to identify INSERT , UPDATE , and DELETE statements. This column contains a description of the type of statement that was analyzed for each specific query block. Sometimes all performance tuning attention is focused on queries and not data modification statements. Be sure to review this column when you analyze PLAN_TABLE rows.

Consider Using REMARKS for Documentation

Use the REMARKS column in the PLAN_TABLE to record historical information in the PLAN_TABLE for specific access paths. One recommendation is to record in the REMARKS column the SQL statement that was EXPLAIN ed to produce the given PLAN_TABLE rows. Another recommendation is to record identifying comments. For example, if the rows represent the access path for a given query after an index was added, set the REMARKS column to something like ADDED INDEX INDEXNAME .

Keep RUNSTATS Accurate

The EXPLAIN results are only as good as the statistics in the DB2 Catalog. Ensure that RUNSTATS has been run before issuing any EXPLAIN commands. If RUNSTATS has not been run, verify that the DB2 Catalog statistics are still appropriate before running EXPLAIN .

Be Aware of Missing Pieces

Keep in mind that to analyze SQL performance properly, you will require more than just the EXPLAIN results in the PLAN_TABLE . Proper performance analysis requires the following:

  • A listing of the actual SQL statement

  • A listing of the actual DDL (or the DB2 Catalog information) for the objects being accessed and/or modified

  • The actual filter factors used when creating the access path

  • The high-level code (3GL/4GL) in which the SQL statement is embedded

  • The actual DB2 Catalog statistics that were in place at the time the EXPLAIN was performed

  • The DB2 release level and maintenance level at the time the EXPLAIN was run

  • Knowledge of the bind parameters used for the plan(s) and/or package(s) in which the SQL statement is embedded

  • Knowledge of the DB2 subsystem(s) in which the SQL statement will be executed (including settings for bufferpools, hiperpools, EDM Pool, locking parameters, and so on)

  • Knowledge of the hardware environment where the SQL is being run (including type of mainframe, number and type of processors, amount of memory, and so on)

  • Knowledge of concurrent activity in the system when the SQL statement was (or will be) executed

This additional information can be used, along with the PLAN_TABLE output, to estimate the performance of any given SQL statement.

Several other pieces of information are missing from the PLAN_TABLE , thus making the task of performance estimation significantly more difficult. The first missing EXPLAIN component is that the PLAN_TABLE does not show access paths for referentially accessed tables. For example, the following statement accesses not only the DEPT table but also the EMP table and the PROJ table because they are tied to DEPT by referential constraints:

 

 DELETE FROM    DSN8810.EMP WHERE   EMPNO = '000100'; 

EXPLAIN should record the fact that these tables are accessed because of the RI defined on the EMP table, but it does not. (This information should also be recorded in the DB2 Catalog in the SYSIBM.SYSPLANDEP table, but it is not there either.) The only way to determine the extent of referentially accessed data is with a performance monitoring tool.

When indexes are accessed as the result of a DELETE or UPDATE statement, EXPLAIN fails to record this information. RID sorts invoked (or not invoked) by list prefetch also are not reported by EXPLAIN .

Runtime modifications to the access path determined at bind time are not recorded in the PLAN_TABLE . For example, simply by examining the PLAN_TABLE , you cannot determine whether sequential detection will be invoked or whether the degree of parallelism will be reduced at runtime.

Additionally, EXPLAIN cannot provide information about the high-level language in which it is embedded. An efficient access path could be chosen for an SQL statement that is embedded improperly in an application program. Examples of inefficient SQL embedding follow:

  • The SQL statement is executed more than once unnecessarily.

  • A singleton SELECT is embedded in a loop and executed repeatedly when fetching from a cursor is more efficient.

  • Cursor OPEN s and CLOSE s are not evaluated as to their efficiency; a program might perform many opens and closes on a single cursor unnecessarily, and EXPLAIN will not record this fact.

EXPLAIN does not provide information on the order in which predicates are applied. For example, consider the following statement:

 

 SELECT  DEPTNO, DEPTNAME FROM    DSN8610.DEPT WHERE   MGRNO > '000030' AND     ADMRDEPT = 'A00'; 

Which predicate does DB2 apply first?

 

 MGRNO > '000030' 

or

 

 ADMRDEPT = 'A00' 

EXPLAIN does not provide this data. Pieces of some of this data are available in the DSN_STATEMNT_TABLE in the REASONS column. Of course, the statement table only contains general indications to help you further analyze potential problems. It does not contain detailed information. But it can help to indicate if referential constraints, UDFs, triggers, or host variables are utilized for SQL statements.

Delete Unneeded PLAN_TABLE Rows

Periodically purge rows from your PLAN_TABLE s to remove obsolete access path information. However, you might want to retain more than the most recent EXPLAIN data to maintain a history of access path selection decisions made by DB2 for a given SQL statement. Move these "history" rows to another table defined the same as the PLAN_TABLE but not used by EXPLAIN . This way, you can ensure that the PLAN_TABLE s used by EXPLAIN are as small as possible, thus increasing the efficiency of EXPLAIN processing.

Consider PLAN_TABLE Indexes

Create indexes for very large PLAN_TABLE s. Consider indexing on columns frequently appearing in predicates or ORDER BY clauses. If you join the PLAN_TABLE to the DB2 Catalog consider creating indexes on the join columns.

Run RUNSTATS on All EXPLAIN Tables

Always run RUNSTATS on the table spaces for the PLAN_TABLE , DSN_STATEMNT_TABLE , and DSN_FUNCTION_TABLE . These tables are frequently updated and queried. As such, DB2 needs current statistics to create optimal access paths for these queries. Furthermore, the statistics accumulated by RUNSTATS can help to determine if a REORG of these table spaces is required.

Be aware, though, that indexes on these tables can slow down the BIND process when EXPLAIN(YES) is specified because DB2 must update the three EXPLAIN tables and their indexes.

Consider Compressing PLAN_TABLE s

For PLAN_TABLE s that will grow to be very large, consider enabling compression to reduce the amount of disk space required for EXPLAIN data.

Specify EXPLAIN(YES) in Production

Be sure to specify EXPLAIN(YES) when binding production plans and packages. Doing so will ensure that you have an accurate recording of the access paths and function resolution details for all production programs.

Strive for the Most Efficient Access Path

As you analyze PLAN_TABLE results, remember that some access paths are more efficient than others. Only three types of access paths can be chosen: direct index lookup, index scan, or table space scan. However, these three types of accesses can be combined with other DB2 performance features (refer to Chapter 21, "The Optimizer"). A basic hierarchy of efficient access paths from most efficient (those incurring the least I/O) to least efficient (those incurring the most I/O) follows :

Index-only direct index lookup

Direct index lookup with data access

Index-only matching index scan

Index-only non-matching index scan

Matching clustered index access

Matching non-clustered index access

Non-matching clustered index access

Non-matching non-clustered index access

Partitioned table space scan skipping multiple partitions (partition scan)

Segmented table space scan (table scan)

Simple table space scan

This list represents only general cases in which a limited number of rows are to be retrieved. The hierarchy should be viewed in reverse order when most of the rows of a table are being accessed. For example, a table space scan can outperform indexed access if as little as 25% of the rows of the table are accessed to satisfy the query. Likewise, a table space scan almost always outperforms indexed access for small tables (fewer than ten pages), regardless of the number of rows to be accessed. Although keeping the preceding hierarchy in mind when evaluating EXPLAIN results is a good idea, each SQL statement should be analyzed independently to determine the optimal access paths.

When determining which path is most efficient, the answer always comes down to the number of rows required to be read and the number of rows that qualify.

In general, the optimizer does a great job for this complete task. The exceptional cases, however, will compel you to become an EXPLAIN /access path expert so that you can tune the troublesome queries.

Use Tools to Assist in EXPLAIN Analysis

Several products that augment the functionality of the EXPLAIN command are available. Examples include BMC Software's SQL-Explorer and Computer Associates' Plan Analyzer. Refer to Chapter 39, "Components of a Total DB2 Solution," for a discussion of SQL access path analysis products.

Use Cost Estimate Information with Caution

The cost estimates provided by EXPLAIN are rough estimates. Although they can be used to provide a general estimation of application performance, they are not 100% accurate. Additionally, other factors impact the performance of application programs. The cost estimates are for SQL statements only. DB2 and EXPLAIN do not provide cost estimates for work done by programs outside of DB2.

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net