Tuning the Application

< Day Day Up >

As was evident from the DB2 performance tuning pie, tuning the application design provides the single greatest benefit to overall DB2 performance. You can use several methods to accomplish this, each of which is covered in this section. Before proceeding, however, I will review the access paths, particularly the information about filter factors.

Analyzing Access Paths

To determine the actual "behind the scenes" operations being performed by DB2 for each SQL statement, you must analyze the access path chosen for the statement by the DB2 optimizer. An access path, as discussed in Chapter 21, "The Optimizer," is the method DB2 chooses to carry out the data manipulation requested in SQL statements. The DB2 EXPLAIN statement places information about the access paths in a PLAN_TABLE , which can be inspected by a technical analyst. You can use the information in Chapter 25 in conjunction with the access path data to create a complete picture of the operations being performed for each SQL statement.

Is DB2 on its own when making access path determinations? The ideal answer to this question would be "Yes." It would be wonderful if DB2 always had all the information it needed, required no external input, and never chose the wrong access path. However, we do not yet live in this ideal world. DB2 sometimes chooses an inefficient access path over another, more efficient one for the following reasons:

The statistics might be outdated if RUNSTATS was never run or not run recently. This causes the access paths to be chosen based on incorrect assumptions about the current environment.
Certain physical parameters are not yet taken into account by the optimizer when it determines access paths. Some examples are differences between physical storage devices (the model of DASD device, or faster devices), the number of data set extents, and COBOL (or other 3GL and 4GL) code.
Concurrent processes (scheduling) are not considered by the optimizer.
The DB2 optimizer is prone to the same problems associated with every computer program; it is fallible. (However, given its complexity, its success rate is admirable.)

For these reasons, you may decide to artificially influence the optimizer's decision process. Techniques for accomplishing this are addressed in the next section.

Before I move on, I will survey the factors addressed by the DB2 optimizer. The first consideration is the versions of DB2 that are being used ”they should be the same in test and production. Obviously, there are access path techniques available to newer releases of DB2 than were available to older releases.

The optimizer takes the size of the buffer pools into account when determining access paths. As the size of the buffer pools increases, DB2 assumes that read efficiency increases also.

The optimizer also takes into account the type of CPU being used during access path selection. DB2 chooses different access techniques based on the perceived performance of the processor. This is important to remember when modeling SQL in a test DB2 subsystem using production statistics. If the production DB2 subsystem has a different number of buffers or if it runs on a different CPU, the optimizer might choose a different access path in the production environment than it did in the test environment, even if the SQL and the DB2 Catalog statistics are identical.

To get around this, the following measures can be taken:

When evaluating access paths for SQL statements using production statistics, be sure that the test DB2 subsystem is using the same CPU or a different CPU of the same type. This may be difficult for larger shops with several DB2 subsystems running on various machines, all configured differently.
Specify test DB2 buffer pools to be the same as the production buffer pools to ensure that access paths do not change as a result of different buffer pool sizes. However, setting test buffer pools as high as production buffer pools can waste memory resources, and setting production buffer pools as low as test buffer pools will degrade performance.

The wisest course of action is simply to realize that access path differences will exist between DB2 subsystems and not to try to avoid access path discrepancies between DB2 subsystems. Running DB2 subsystems with artificial constraints such as those just outlined is counterproductive to optimizing DB2 performance. Just remember that a test access path determined using production statistics does not guarantee that the production access path will be identical. Besides, it is wise to continuously monitor the production access paths for all SQL statements, because they can change when plans or packages are bound or rebound, or when RUNSTATS is run for dynamic SQL.

TUNING STRATEGY

Analyze all production DB2 access paths. Some shops analyze only the access paths for static SQL embedded in application programs, but this is inadequate. Develop a plan for analyzing all components of DB2 programs, including the following:

The structure of the application program to ensure that proper coding techniques are used. Also be sure that otherwise efficient-looking SQL embedded in a program loop does not occur without a proper reason. In other words, a finely- tuned SQL statement inside of a loop that runs two million times is likely to cause performance problems.
All SQL, whether static or dynamic, embedded in application programs. This includes SQL in online transactions, batch programs, client/server programs, report writers, 4GLs, CASE tools, decision support systems, and packaged applications.
All regularly executed or critical ad hoc, dynamic SQL. This includes, but is not necessarily limited to, SQL executed by SPUFI, QMF, DSNTIAD , DSNTIAUL , or DSNTEP2 , SQL generated by any application system "on the fly," dynamic SQL in packaged applications, SQL generated or submitted using vendor tools, data warehouse queries, and SQL shipped from remote sites, including remote mainframes, minis, and PC workstations.
All stored procedure and user -defined function programs that contain SQL.
All SQL in triggers. When a trigger is created DB2 also creates a trigger package but no EXPLAIN data. To EXPLAIN the trigger package you will need to REBIND it specifying EXPLAIN YES .
Every SQL statement in the DB2 program must be followed by a check of the SQLCODE or SQLSTATE .

If you utilize triggers in your DB2 databases, you need to be aware that code exists within the triggers. This code needs to be examined regularly to ensure that it is optimal given the database design and the application processes that modify the data, causing the trigger to fire. Even if the SQL in the trigger is efficient, other processes performed by the trigger may not be efficient.

Influencing the Optimizer

There are several methods of tuning the system to change access paths or influence access path selection. This section describes several observations on changing the access paths selected by DB2.

The DB2 optimizer is one of the most intricate pieces of software on the market. It does an admirable job of optimizing SQL requests . To achieve this level of success, the optimizer contains a great deal of performance-specific expertise. For example, the optimizer estimates both elapsed times and CPU times when choosing an access path. When a SQL statement is rebound, the optimizer might choose a new access path that increases CPU time but decreases elapsed time. Most shops choose to enhance elapsed time at the expense of additional CPU use because elapsed time has a measurable effect on user productivity. In other words, it is good to trade off CPU cycles for user satisfaction, and the DB2 optimizer attempts to accomplish this. Of course, if both CPU and elapsed time can be reduced, the optimizer will try to do so.

However, the optimizer is not infallible. Sometimes the application analyst understands the nature of the data better than DB2 (at the present time). You can influence the optimizer into choosing an access path that you know is a better one but the optimizer thinks is a worse one. As the functionality and complexity of the optimizer is enhanced from release to release of DB2, the need to trick the optimizer in this way will diminish.

There are five ways to influence the optimizer's access path decisions:

Standard, DB2-based methods
Tweaking SQL statements
Specifying the OPTIMIZE FOR n ROWS clause
Updating DB2 Catalog statistics
Using OPTHINT to indicate that an access path in the PLAN_TABLE should be chosen

The next section discusses each of these methods.

Standard Methods

Of all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting one of the other methods. There are several reasons for this.

The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort.

There are four standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS , which is discussed in detail in Chapter 34, "Catalog Manipulation Utilities," populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:

Their organization

Clustering information

The cardinality of tablespaces, tables, columns , and indexes

The range of values for columns

All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement.

TUNING STRATEGY

Execute RUNSTATS at least once for every tablespace, table, column, and index known to your DB2 subsystem. Schedule regular RUNSTATS executions for all DB2 objects that are not read-only. This keeps the DB2 Catalog information current, enabling proper access path selection.

The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently , run the REORG utility, followed by RUNSTATS and REBIND . In-depth coverage of the REORG utility and guidelines for its use are in Chapter 33, "Data Organization Utilities."

TUNING STRATEGY

Use the DB2 Catalog queries in Chapter 26, "DB2 Object Monitoring Using the DB2 Catalog and RTS," to determine when your DB2 tablespaces and indexes need to be reorganized:

Reorganize a tablespace when the CLUSTERRATIO of its clustering index falls below 95%. (Schedule this so that it does not affect system performance and availability.)
Reorganize any index (or index partition) when LEAFDIST is greater than 200. If the value of FREEPAGE for the index is not 0, reorganize only when LEAFDIST is greater than 300. Of course, you should not blindly reorganize indexes when they reach these thresholds. You should weigh the observed performance degradation against the cost of running the index reorganization jobs before reorganizing your application's indexes.
Reorganize all DB2 tablespaces and indexes when their data set is in multiple physical extents. Before reorganizing, ensure that space allocations have been modified to cause all data to be stored in a single extent.

You may want to reorganize more frequently than indicated here by creating scheduled REORG jobs for heavily accessed or critical DB2 tablespaces and indexes. This limits performance problems due to disorganized DB2 objects and reduces the number of reorganizations that must be manually scheduled or submitted by a DBA or performance analyst.

The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented tablespaces to partitioned tablespaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned tablespaces to better align ranges of values, thereby promoting better parallel access.

The fourth and final standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance.

Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. Refer to Chapter 2, "Data Manipulation Guidelines," for a discussion of indexable and non-indexable predicates. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases.

Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing.

Tweaking the SQL Statement

If you do not want to change the DB2 Catalog statistics but the standard methods outlined in the preceding section are not helpful, you might consider tweaking the offending SQL statement. Tweaking is the process of changing a statement in a non-intuitive fashion, without altering its functionality.

At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT . Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT .

 SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY FROM    DSN8610.EMP WHERE   EMPNO BETWEEN '000020' AND '000350' AND     (WORKDEPT > 'A01' OR 0 = 1);

The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen.

Another method of tweaking SQL to influence DB2's access path selection is to code redundant predicates. Recall from Chapter 21 that when DB2 calculates the filter factor for a SQL statement, it multiplies the filter factors for all predicates connected with AND .

TUNING STRATEGY

You can lower the filter factor of a query by adding redundant predicates as follows :

Change this statement	To this
`SELECT`	`LASTNAME`	`SELECT`	`LASTNAME`
`FROM`	`DSN8810.EMP`	`FROM`	`DSN8810.EMP`
`WHERE`	`WORKDEPT = :VAR`	`WHERE`	`WORKDEPT = :VAR`
		`AND`	`WORKDEPT = :VAR`
		`AND`	`WORKDEPT = :VAR`

The two predicates added to the end are redundant and do not affect SQL statement functionally. However, DB2 calculates a lower filter factor, which increases the possibility that an index on the WORKDEPT column will be chosen. The lower filter factor also increases the possibility that the table will be chosen as the outer table, if the redundant predicates are used for a join.

TUNING STRATEGY

When redundant predicates are added to enhance performance, as outlined in the preceding strategy, be sure to document the reasons for the extra predicates. Failure to do so may cause a maintenance programmer to assume that the redundant predicates are an error and thus remove them.

Another option for getting a small amount of performance out of an SQL statement is to change the physical order of the predicates in your SQL code. DB2 evaluates predicates first by predicate type, then according to the order in which it encounters the predicates. The four types of SQL predicates are listed in the order that DB2 processes them:

Equality, in which a column is tested for equivalence to another column, a variable, or a literal

Ranges, in which a column is tested against a range of values (for example, greater than, less than, or BETWEEN )

IN , where a column is tested for equivalence against a list of values

Stage 2 predicates

TUNING STRATEGY

Place the most restrictive predicates at the beginning of your predicate list. For example, consider the following query:

 SELECT   LASTNAME FROM     DSN8810.EMP WHERE    WORKDEPT = 'A00' AND      SEX = 'M'

The first predicate has a lower filter factor than the second because there are fewer workers in department A00 than there are males in the entire company. This does not increase performance by much, but it can shave a little off a query's processing time.

Before deciding to tweak SQL statements to achieve different access paths, remember that you are changing SQL code in a nonintuitive fashion. For each modification you make to increase performance, document the reasons in the program, the data dictionary, and the system documentation. Otherwise, the tweaked SQL could be maintained after it is no longer required, or modified away when it still is required for performance.

Also remember that the changes could enhance performance for one release of DB2 but result in no gain or decreased efficiency in subsequent releases. Re-examine your SQL for each new version and release of DB2.

`OPTIMIZE FOR` `n` `ROWS`

Another method of influencing access path selection is to specify OPTIMIZE FOR n ROWS for a cursor SELECT statement. This clause enables programmers to specify the estimated maximum number of rows that will be retrieved.

By indicating that a different number of rows will be returned than DB2 anticipates, you can influence access path selection. For example, consider the following statement:

 EXEC SQL     DECLARE OPT_CUR FOR         SELECT   WORKDEPT, EMPNO, SALARY         FROM     DSN8810.EMP         WHERE    WORKDEPT IN ('A00', 'D11')         OPTIMIZE FOR 5 ROWS END-EXEC.

The number of rows to be returned has been set to 5, even though this query could return more than 5 rows. DB2 formulates an access path optimized for 5 rows. More rows can be retrieved, but performance could suffer if you greatly exceed the estimated maximum.

This type of tuning is preferable to both updating the DB2 Catalog statistics and tweaking the SQL statement. It provides more information to DB2's optimization process, thereby giving DB2 the opportunity to establish a better access path. The crucial point, though, is that DB2 is doing the optimization; no manual updates or artificial SQL constructs are required.

TUNING STRATEGY

When using the OPTIMIZE FOR n ROWS clause, make n as accurate as possible. An accurate estimate gives DB2 the best opportunity to achieve optimum performance for the statement and also helps document the purpose of the SQL statement. Using an accurate value for n also positions your application to take advantage of future enhancements to the OPTIMIZE FOR n ROWS clause.

When coding online transactions in which 25 rows (for example) are displayed on the screen, use the OPTIMIZE FOR n ROWS clause, setting n equal to 25.

NOTE

When using OPTIMIZE FOR n ROWS to disable list prefetch, set the value of n to 1. This technique works well to ensure that list prefetch is not used.

CAUTION

DB2 uses the value of n for the block size of a distributed network request. The smaller the value of n , the fewer rows sent across the network for each block. The only exception is that when n =1, DB2 will set the block size to 16.

Changing DB2 Catalog Statistics

When the standard methods of influencing DB2's access path selection are not satisfactory, you can resort to updating the statistics in the DB2 Catalog. Only certain DB2 Catalog statistics can be modified using SQL UPDATE , INSERT , and DELETE statements instead of the normal method using RUNSTATS . This SQL modification of the DB2 Catalog can be performed only by a SYSADM.

Table 28.3 lists the DB2 Catalog statistics that can be modified. You can use this table to determine which DB2 Catalog columns are updateable (using SQL) and which are used by the optimizer during sequential and parallel access path determination. Remember, for parallel queries, the sequential access path is generated and only then is the parallel access strategy generated.

Table 28.3. The Updateable DB2 Catalog Statistics

Catalog Table	Column	How Used?	Description
`SYSCOLDIST`	`FREQUENCYF`	Y	Percentage that `COLVALUE` in the column named in `NAME` occurs
	`COLVALUE`	Y	Column value for this statistic
	`CARDF`	Y	Number of distinct values
	`COLGROUPCOLNO`	Y	The set of columns for the statistics
	`NUMCOLUMNS`	Y	Number of columns for the statistics
	`TYPE`	Y	Type of stats: `C` for cardinality, or `F` for frequent value
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSCOLDISTSTATS`	`PARTITION`	N	The partition to which this statistic applies
	`FREQUENCYF`	N	Percentage that `COLVALUE` in the column named in `NAME` occurs
	`COLVALUE`	N	Column value for this statistic
	`TYPE`	N	Type of statistics (cardinality or frequent value)
	`CARDF`	N	Number of distinct values
	`COLGROUPCOLNO`	N	The set of columns for the statistics
	`KEYCARDDATA`	N	Representation of the estimate of distinct values in this partition
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSCOLSTATS`	`LOWKEY`	P	Lowest value for the column
	`LOW2KEY`	P	Second lowest value for the column
	`HIGHKEY`	P	Highest value for the column
	`HIGH2KEY`	P	Second highest value for the column
	`COLCARD`	P	Number of distinct values for the column
	`COLCARDDATA`	P	Number of distinct values for the column
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSCOLUMNS`	`LOW2KEY`	Y	Second lowest value for the column
	`HIGH2KEY`	Y	Second highest value for the column
	`COLCARDF`	Y	Number of distinct values for the column
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSINDEXES`	`CLUSTERRATIOF`	Y	Percentage of rows in clustered order
	`CLUSTERED`	N	Indicates whether the tablespace is actually clustered
	`FIRSTKEYCARDF`	Y	Number of distinct values for the first column of the index key
	`FULLKEYCARDF`	Y	Number of distinct values for the full index key
	`NLEAF`	Y	Number of active leaf pages
	`NLEVELS`	Y	Number of index b-tree levels
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSINDEXPART`	`DSNUM`	N	Number of data sets
	`EXTENTS`	N	Number of data set extents
	`LEAFFAR`	N	Number of leaf pages far from previous leaf page
	`LEAFNEAR`	N	Number of leaf pages near previous leaf page
	`PSEUDO_DEL_ENTRIES`	N	Number of pseudo deleted index keys
	`SPACEF`	N	Disk storage space
`SYSINDEXSTATS`	`CLUSTERRATIOF`	N	Percentage of rows in clustered order
	`FIRSTKEYCARDF`	N	Number of distinct values for the first column of the index key
	`FULLKEYCARDF`	N	Number of distinct values for the full index key
	`FULLKEYCARDDATA`	N	Representation of number of distinct values of the full key
	`NLEAF`	N	Number of active leaf pages
	`NLEVELS`	N	Number of index b-tree levels
	`KEYCOUNTF`	N	Number of rows in the partition
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSLOBSTATS`	`AVGSIZE`	N	Average size of LOB
	`FREESPACE`	N	Available space in the LOB tablespace
	`ORGRATIO`	N	Ratio of disorganization for LOB tablespace
`SYSROUTINES`	`IOS_PER_INVOC`	Y	Estimated number of I/Os per invocation of the routine
	`INSTS_PER_INVOC`	Y	Estimated number of instructions per invocation of the routine
	`INITIAL_IOS`	Y	Estimated number of I/Os for the first invocation of the routine
	`INITIAL_INSTS`	Y	Estimated number of instructions for the first invocation of the routine
	`CARDINALITY`	Y	Predicted cardinality of a table function
`SYSTABLEPART`	`DSNUM`	N	Number of data sets
	`EXTENTS`	N	Number of data set extents
	`SPACEF`	N	Disk storage space
`SYSTABLES`	`CARDF`	Y	Number of rows for a table
	`NPAGES`	Y	Number of pages used by the table
	`NPAGESF`	Y	Number of pages used by the table
	`PCTPAGES`	N	Percentage of tablespace pages that contain rows for this table
	`PCTROWCOMP`	Y	Percentage of rows compressed
	`AVGROWLEN`	N	Average row length
	`SPACEF`	N	Disk storage space
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSTABLESPACE`	`NACTIVEF`	Y	Number of allocated tablespace pages
`SYSTABLESPACE`	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
`SYSTABSTATS`	`CARDF`	P	Number of rows for the partition
	`NPAGES`	P	Number of pages used by the partition
	`NACTIVE`	P	Number of active pages in the partition
	`PCTPAGES`	P	Percentage of tablespace pages that contain rows for this partition
	`PCTROWCOMP`	P	Percentage (x100) of rows compressed
	`STATSTIME`	N	Indicates the time `RUNSTATS` was run to generate these statistics
Legend: N = Not used by the optimizer P = Used for parallel path generation Y = Used by the optimizer

The two predominant reasons for changing DB2 Catalog statistics to influence the access path selection are to influence DB2 to use an index and to influence DB2 to change the order in which tables are joined. In each case, the tuning methods require that you "play around" with the DB2 Catalog statistics to create a lower filter factor. You should keep in mind five rules when doing so.

Rule 1 : As first key cardinality ( FIRSTKEYCARDF ) increases, the filter factor decreases. As the filter factor decreases, DB2 is more inclined to use an index to satisfy the SQL statement.

Rule 2 : As an index becomes more clustered, you increase the probability that DB2 will use it. To enhance the probability of an unclustered index being used, increase its cluster ratio ( CLUSTERRATIOF ) to a value between 96 and 100, preferably 100.

TUNING STRATEGY

To influence DB2 to use an index, adjust the COLCARDF , FIRSTKEYCARDF , and FULLKEYCARDF columns to an artificially high value. As cardinality increases, the filter factor decreases. As the filter factor decreases, the chance that DB2 will use an available index becomes greater. DB2 assumes that a low filter factor means that only a few rows are being returned, causing indexed access to be more efficient. Adjusting COLCARDF , FIRSTKEYCARDF , and FULLKEYCARDF is also useful for getting DB2 to choose an unclustered index because DB2 is more reluctant to use an unclustered index with higher filter factors. You also can change the value of CLUSTERRATIOF to 100 to remove DB2's reluctance to use unclustered indexes from the access path selection puzzle.

Rule 3 : DB2's choice for inner and outer tables is a delicate trade-off. Because the inner table is accessed many times for each qualifying outer table row, it should be as small as possible to reduce the time needed to scan multiple rows for each outer table row. The more inner table rows, the longer the scan. But the outer table should also be as small as possible to reduce the overhead of opening and closing the internal cursor on the inner table.

It is impossible to choose the smallest table as both the inner table and the outer table. When two tables are joined, one must be chosen as the inner table, and the other must be chosen as the outer table. My experience has shown that as the size of a table grows, the DB2 optimizer favors using it as the outer table in a nested loop join. Therefore, changing the cardinality ( CARDF ) of the table that you want as the outer table to an artificially high value can influence DB2 to choose that table as the outer table.

Rule 4 : As column cardinality ( COLCARDF ) decreases, DB2 favors the use of the nested loop join over the merge scan join. Lower the value of COLCARDF to favor the nested loop join.

Rule 5 : HIGH2KEY and LOW2KEY can be altered to more accurately reflect the overall range of values stored in a column. This is particularly useful for influencing access path selection for data with a skewed distribution.

The combination of HIGH2KEY and LOW2KEY provides a range of probable values accessed for a particular column. The absolute highest and lowest values are discarded to create a more realistic range. For certain types of predicates, DB2 uses the following formula when calculating filter factor:

 Filter factor = (Value-LOW2KEY) / (HIGH2KEY-LOW2KEY)

Because HIGH2KEY and LOW2KEY can affect the size of the filter factor, the range of values that they provide can significantly affect access path selection.

TUNING STRATEGY

For troublesome queries, check whether the distribution of data in the columns accessed is skewed. If you query SYSIBM.SYSCOLDIST , as discussed in Chapter 26, the 10 most frequently occurring values are shown for indexed columns. To be absolutely accurate, however, obtain a count for each column value, not just the top 10:

 SELECT     COL, COUNT(*) FROM       your.table GROUP BY   COL ORDER BY   COL

This query produces an ordered listing of column values. You can use this list to determine the distribution of values. If a few values occur much more frequently than the other values, the data is not evenly distributed. In this circumstance, consider using dynamic SQL, hard coding predicate values, or binding with REOPT(VARS) . This enables DB2 to use nonuniform distribution statistics when calculating filter factors.

TUNING STRATEGY

Referring back to the results of the query in the preceding tuning strategy, if a few values are at the beginning or end of the report, consider changing LOW2KEY and HIGH2KEY to different values. DB2 uses LOW2KEY and HIGH2KEY when calculating filter factors. So, even though the valid domain of small integers is “32768 to +32767, the valid range for access path selection is defined by LOW2KEY and HIGH2KEY , which may set the range to +45 to +1249, for example. As the range of values decreases, the filter factor decreases because there are fewer potential values in the range of values.

TUNING STRATEGY

If neither dynamic SQL nor hard-coded predicates are practical, change HIGH2KEY to a lower value and LOW2KEY to a higher value to reduce the range of possible values, thereby lowering the filter factor. Alternatively, or additionally, you can increase COLCARDF , FIRSTKEYCARDF , and FULLKEYCARDF .

Remember that modifying DB2 Catalog statistics is not a trivial exercise. Simply making the changes indicated in this section might be insufficient to resolve your performance problems because of DB2's knowledge of the DB2 Catalog statistics. Some statistical values have implicit relationships. When one value changes, DB2 assumes that the others have changed also. For example, consider these relationships:

When you change COLCARDF for a column in an index, be sure to also change the FIRSTKEYCARDF of any index in which the column participates as the first column of the index key, and the FULLKEYCARDF of any index in which the column participates.
Provide a value to both HIGH2KEY and LOW2KEY when you change cardinality information. When COLCARDF is not “1, DB2 assumes that statistics are available. DB2 factors these high and low key values into its access path selection decision. Failure to provide both a HIGH2KEY and a LOW2KEY can result in the calculation of inaccurate filter factors and the selection of inappropriate access paths.

Before deciding to update DB2 Catalog statistics to force DB2 to choose different access paths, heed the following warnings.

First, never change the DB2 Catalog statistics without documenting the following:

Why the statistics will be modified
How the modifications will be made and how frequently the changes must be run
The current values for each statistic and the values they will be changed to

Secondly, be aware that when you change DB2 Catalog statistics, you are robbing from Peter to pay Paul. In other words, your changes might enhance the performance of one query at the expense of the performance of another query.

DB2 maintenance (PTFs, new releases, and new versions) might change the access path selection logic in the DB2 optimizer. As a result of applying maintenance, binding or rebinding static and dynamic SQL operations could result in different access paths, thereby invalidating your hard work. In other words, IBM might get around to correcting the problem in the logic of the optimizer (that you solved using trickery ).

Choosing the correct values for the statistics and keeping the statistics accurate can be an intimidating task. Do not undertake this endeavor lightly. Plan to spend many hours changing statistics, rebinding plans, changing statistics again, rebinding again, and so on.

The situation that caused the need to tinker with the statistics in the DB2 Catalog could change. For example, the properties of the data could vary as your application ages. Distribution, table and column cardinality, and the range of values stored could change. If the statistics are not changing because they have been artificially set outside the jurisdiction of RUNSTATS , these newer changes to the data cannot be considered by the DB2 optimizer, and an inefficient access path could be used indefinitely.

TUNING STRATEGY

When DB2 Catalog statistics have been changed to influence access path selection, periodically execute RUNSTATS and rebind to determine if the artificial statistics are still required. If they are, simply reissue the DB2 Catalog UPDATE statements. If not, eliminate this artificial constraint from your environment. Failure to implement this strategy eventually results in inefficient access paths in your environment (as DB2 and your applications mature).

Only a SYSADM can update the DB2 Catalog. SYSADM s have a great amount of authority, so it is generally a good idea to limit the number of SYSADM s in your shop. When the DB2 Catalog needs to be altered, an undue burden is placed on the SYSADM s.

When the DB2 Catalog has been updated using SQL, all subsequent RUNSTATS executions must be followed by a series of SQL statements to reapply the updates to the DB2 Catalog.

TUNING STRATEGY

If possible, give a single production userid SYSADM authority for modifying DB2 Catalog statistics. This userid has the following requirements:

Should not have online TSO logon capabilities because only batch jobs need to be run using it
Should be under the same strict controls placed on production jobs at your site
Should be used to run only DB2 Catalog update jobs

A DBA or some other knowledgeable user can then create UPDATE statements to change the DB2 Catalog statistics as desired. A batch job running under the authid for the production SYSADM can then run the UPDATE statements in production. Because the SYSADM userid has no logon capabilities, the possibility for abuse is limited to the controls placed on the production environment (such as who can update production job streams, who can submit them, or what review process is in place).

Using Optimization Hints ( `OPTHINT` ) to Force an Access Path

You also can influence access paths using the OPTHINT feature. Actually, though, this method does not "influence" the access path; instead it directs DB2 to use a specific access path instead of determining a new access path using statistics. IBM refers to this process as specifying optimization hints.

CAUTION

The same basic cautions that apply to modifying DB2 Catalog statistics also apply to optimization hints. Only experienced analysts and DBAs should attempt to use optimization hints. However, optimization hints are much easier to apply than updating DB2 Catalog statistics.

Optimization hints are implemented using the PLAN_TABLE . However, before you can use optimization hints, the DB2 DSNZPARM parameter for optimization hints ( OPTHINTS ) must be set to YES . If it is set to NO , you cannot use optimization hints.

There are two ways to use the PLAN_TABLE to provide an optimization hint to DB2:

Alter the PLAN_TABLE to use an access path that was previously created by the DB2 optimizer
INSERT rows to the PLAN_TABLE to create a new access path independently

In general, favor the first method over the second method. It is a difficult task to create an accurate access path in the PLAN_TABLE . If you do not get every nuance of the access path correct, it is possible that DB2 will ignore the optimization hint and calculate an access path at bind time. However, if you use an access path that was originally created by DB2, you can be reasonably sure that the access path will be valid.

CAUTION

Sometimes an access path created for an older version of DB2 will not be valid in a newer version of DB2. Of course, the opposite is true, too. Some access paths for a newer version of DB2 will not work for older versions.

You should consider using optimization hints for all of the same reasons you would choose to modify DB2 Catalog statistics or tweak SQL. The general reason is to bypass the access path chosen by DB2 and use a different, hopefully more efficient, access path.

In addition to this reason, optimization hints are very useful as you migrate from release to release of DB2. Sometimes, a new release or version of DB2 can cause different access paths to be chosen for queries that were running fine. Or perhaps new statistics were accumulated between binds causing access paths to change. By saving old access paths in a PLAN_TABLE , you can use optimization hints to direct DB2 to use the old access paths instead of the new, and perhaps undesirable, access paths due to the new release orstatistics.

Always test and analyze the results of any query that uses optimization hints to be sure that the desired performance is being achieved.

Defining an Optimization Hint

To specify that an optimization hint is to be used, you will have to update the PLAN_TABLE . The first step is to make sure that your PLAN_TABLE includes the following columns:

 OPTHINT              CHAR(8)   NOT NULL WITH DEFAULT HINT_USED            CHAR(8)   NOT NULL WITH DEFAULT PRIMARY_ACCESSTYPE   CHAR(1)   NOT NULL WITH DEFAULT

For more information on the PLAN_TABLE and a definition of all PLAN_TABLE columns, refer to Chapter 25, "Using EXPLAIN."

To set an optimization hint, you need to first identify (or create) the PLAN_TABLE rows that refer to the desired access path. You will then need to update those rows in the PLAN_TABLE , specifying an identifier for the hint in the OPTHINT column. For example,

 UPDATE PLAN_TABLE    SET OPTHINT = 'SQLHINT' WHERE  PLANNO = 50 AND    APPLNAME = 'PLANNAME';

Of course, this is just an example. You may need to use other predicates to specifically identify the PLAN_TABLE rows to include in the optimization hint. Some columns that might be useful, depending on your usage of dynamic SQL and packages, include QUERYNO , PROGNAME , VERSION , and COLLID .

CAUTION

If you change a program that uses static SQL statements, the statement number might change, causing rows in the PLAN_TABLE to be out of sync with the modified application.

You can use the QUERYNO clause in SQL statements to ease correlation of SQL statements in your program with your optimization hints. Statements that use the QUERYNO clause are not dependent on the statement number. To use QUERYNO , you will need to modify the SQL in your application to specify a QUERYNO , as shown in the following:

 SELECT MGRNO FROM   DEPT WHERE  DEPNO = 'A00' QUERYNO 200;

You can then UPDATE the PLAN_TABLE more easily using QUERYNO and be sure that the optimization hint will take effect, as shown in the following:

 UPDATE PLAN_TABLE    SET OPTHINT = 'SQLHINT' WHERE  QUERYNO = 200 AND    APPLNAME = 'PLANNAME';

When the PLAN_TABLE is correctly updated (as well as possibly the application), you must REBIND the plan or package to determine if the hint is being used by DB2. When rebinding you must specify the OPTHINT parameter:

 REBIND PLAN PLANNAME . . . OPTHINT(SQLHINT)

Be aware that the optimization hints may not actually be used by DB2. For optimization hints to be used, the hint must be correctly specified, the REBIND must be accurately performed, and the environment must not have changed. For example, DB2 will not use an access path specified using an optimization hint if it relies on an index that has since been dropped.

Use EXPLAIN(YES) to verify whether the hint was actually used. If the hint was used, the HINT_USED column for the new access path will contain the name of the optimization hint (such as SQLHINT in the previous example).

Miscellaneous Guidelines

The following miscellaneous guidelines provide you with useful general tips for improving DB2 performance.

Favor Optimization Hints Over Updating the DB2 Catalog

Optimization hints to influence access paths are less intrusive and easier to implement than changing columns in the DB2 Catalog. However, use optimization hints only as a last resort. Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path.

Limit Ordering to Avoid Scanning

The optimizer is more likely to choose an index scan when ordering is important ( ORDER BY , GROUP BY , or DISTINCT ) and the index is clustered by the columns to be sorted.

Maximize Buffers and Minimize Data Access

If the inner table fits in 2% of the buffer pool, the nested loop join is favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if possible).

Consider Deleting Nonuniform Distribution Statistics

To decrease wild fluctuations in the performance of dynamic SQL statements, consider removing the nonuniform distribution statistics (NUDS) from the DB2 Catalog. Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected.

For example, consider the following dynamic SQL statement:

 SELECT   EMPNO, LASTNAME FROM     DSN8810.EMP WHERE    WORKDEPT = ?

The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins.

These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements.

This suggested guideline does not mean that you should always delete the NUDS. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later with RUNSTATS .)

Consider Collecting More Than Just the Top Ten NUDS

If non-uniform distribution impacts more than just the top ten most frequently occurring values, you should consider using the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data.

DB2 Referential Integrity Use

Referential integrity (RI) is the implementation of constraints between tables so that values from one table control the values in another. Recall that a referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table's primary key columns control the values permissible in the dependent table's foreign key columns. For example, in the sample table, DSN8810.EMP , the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table's DEPTNO column (the primary key).

You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints.

Application-enforced referential integrity is coded into application programs. Every program that can update referentially constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates.

With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time.

In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.

TUNING STRATEGY

DB2-enforced referential integrity is generally more efficient than application-enforced RI. When you build new applications, use DB2-enforced referential integrity and consider retrofitting older applications that require performance tuning.

Declarative RI has the further benefit that it cannot be bypassed, like application-enforced RI.

Triggers also can be used to implement complex RI and data integrity rules. Triggers, like declarative RI, cannot be bypassed by ad hoc SQL. All SQL data modification, whether static or dynamic, planned or ad hoc, must conform to the trigger logic.

TUNING STRATEGY

If no ad hoc updating is permitted, consider using application-based RI in the following two situations:

If an application program can be written so that a single check is made for a row from the parent table, when multiple inserts to the child table are performed.
If the application processing needs are such that the parent table is read before inserting the child (even one child), DB2 just repeats the read process that the application must do anyway.

Of course, application-enforced RI still has the negative aspect of not being enforced for ad hoc data modifications.

TUNING STRATEGY

Consider not implementing DB2-enforced or application-enforced RI in the following cases:

If DB2 tables are built from another system that is already referentially intact
If application tables are accessed as read-only

General Application Tuning

This chapter has concentrated on some of the more complex methods of tuning your DB2 applications. A wealth of less complex information about building efficient SQL is also available. For this type of general SQL coding advice, and guidelines for coding efficient, performance-oriented SQL (DCL, DDL, and DML), refer to Chapters 2 through 10.

< Day Day Up >

Analyzing Access Paths

TUNING STRATEGY

Influencing the Optimizer

Standard Methods

TUNING STRATEGY

TUNING STRATEGY

Tweaking the SQL Statement

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

OPTIMIZE FOR n ROWS

TUNING STRATEGY

Changing DB2 Catalog Statistics

Table 28.3. The Updateable DB2 Catalog Statistics

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

Using Optimization Hints ( OPTHINT ) to Force an Access Path

Defining an Optimization Hint

Miscellaneous Guidelines

Favor Optimization Hints Over Updating the DB2 Catalog

Limit Ordering to Avoid Scanning

Maximize Buffers and Minimize Data Access

Consider Deleting Nonuniform Distribution Statistics

Consider Collecting More Than Just the Top Ten NUDS

DB2 Referential Integrity Use

TUNING STRATEGY

TUNING STRATEGY

TUNING STRATEGY

General Application Tuning

`OPTIMIZE FOR` `n` `ROWS`

Using Optimization Hints ( `OPTHINT` ) to Force an Access Path