< Day Day Up > |
As was evident from the DB2 performance tuning pie, tuning the application design provides the single greatest benefit to overall DB2 performance. You can use several methods to accomplish this, each of which is covered in this section. Before proceeding, however, I will review the access paths, particularly the information about filter factors. Analyzing Access PathsTo determine the actual "behind the scenes" operations being performed by DB2 for each SQL statement, you must analyze the access path chosen for the statement by the DB2 optimizer. An access path, as discussed in Chapter 21, "The Optimizer," is the method DB2 chooses to carry out the data manipulation requested in SQL statements. The DB2 EXPLAIN statement places information about the access paths in a PLAN_TABLE , which can be inspected by a technical analyst. You can use the information in Chapter 25 in conjunction with the access path data to create a complete picture of the operations being performed for each SQL statement. Is DB2 on its own when making access path determinations? The ideal answer to this question would be "Yes." It would be wonderful if DB2 always had all the information it needed, required no external input, and never chose the wrong access path. However, we do not yet live in this ideal world. DB2 sometimes chooses an inefficient access path over another, more efficient one for the following reasons:
For these reasons, you may decide to artificially influence the optimizer's decision process. Techniques for accomplishing this are addressed in the next section. Before I move on, I will survey the factors addressed by the DB2 optimizer. The first consideration is the versions of DB2 that are being used ”they should be the same in test and production. Obviously, there are access path techniques available to newer releases of DB2 than were available to older releases. The optimizer takes the size of the buffer pools into account when determining access paths. As the size of the buffer pools increases, DB2 assumes that read efficiency increases also. The optimizer also takes into account the type of CPU being used during access path selection. DB2 chooses different access techniques based on the perceived performance of the processor. This is important to remember when modeling SQL in a test DB2 subsystem using production statistics. If the production DB2 subsystem has a different number of buffers or if it runs on a different CPU, the optimizer might choose a different access path in the production environment than it did in the test environment, even if the SQL and the DB2 Catalog statistics are identical. To get around this, the following measures can be taken:
The wisest course of action is simply to realize that access path differences will exist between DB2 subsystems and not to try to avoid access path discrepancies between DB2 subsystems. Running DB2 subsystems with artificial constraints such as those just outlined is counterproductive to optimizing DB2 performance. Just remember that a test access path determined using production statistics does not guarantee that the production access path will be identical. Besides, it is wise to continuously monitor the production access paths for all SQL statements, because they can change when plans or packages are bound or rebound, or when RUNSTATS is run for dynamic SQL.
If you utilize triggers in your DB2 databases, you need to be aware that code exists within the triggers. This code needs to be examined regularly to ensure that it is optimal given the database design and the application processes that modify the data, causing the trigger to fire. Even if the SQL in the trigger is efficient, other processes performed by the trigger may not be efficient. Influencing the OptimizerThere are several methods of tuning the system to change access paths or influence access path selection. This section describes several observations on changing the access paths selected by DB2. The DB2 optimizer is one of the most intricate pieces of software on the market. It does an admirable job of optimizing SQL requests . To achieve this level of success, the optimizer contains a great deal of performance-specific expertise. For example, the optimizer estimates both elapsed times and CPU times when choosing an access path. When a SQL statement is rebound, the optimizer might choose a new access path that increases CPU time but decreases elapsed time. Most shops choose to enhance elapsed time at the expense of additional CPU use because elapsed time has a measurable effect on user productivity. In other words, it is good to trade off CPU cycles for user satisfaction, and the DB2 optimizer attempts to accomplish this. Of course, if both CPU and elapsed time can be reduced, the optimizer will try to do so. However, the optimizer is not infallible. Sometimes the application analyst understands the nature of the data better than DB2 (at the present time). You can influence the optimizer into choosing an access path that you know is a better one but the optimizer thinks is a worse one. As the functionality and complexity of the optimizer is enhanced from release to release of DB2, the need to trick the optimizer in this way will diminish. There are five ways to influence the optimizer's access path decisions:
The next section discusses each of these methods. Standard MethodsOf all the methods for influencing the DB2 optimizer, standard DB2 methods are the only mandatory ones. Try all the standard methods covered in this section before attempting one of the other methods. There are several reasons for this. The standard methods place the burden for generating optimal access paths on the shoulders of DB2, which is where it usually belongs. They also use IBM-supported techniques available for every version and release of DB2. Finally, these methods generally provide the greatest gain for the smallest effort. There are four standard methods for tuning DB2 access paths. The first method is ensuring that accurate statistics are available using the RUNSTATS utility and the BIND or REBIND command. RUNSTATS , which is discussed in detail in Chapter 34, "Catalog Manipulation Utilities," populates the DB2 Catalog with statistics that indicate the state of your DB2 objects, including the following:
All of these factors are considered by the optimizer when it chooses what it deems to be the optimal access path for a given SQL statement.
The second standard method for tuning DB2 access paths is ensuring that the DB2 objects are properly organized. Disorganized objects, if properly reorganized, might be chosen for an access path. An object is disorganized when data modification statements executed against the object cause data to be stored in a non-optimal fashion, such as non-clustered data or data that exists on a different page than its RID, thereby spanning more than one physical page. To organize these objects more efficiently , run the REORG utility, followed by RUNSTATS and REBIND . In-depth coverage of the REORG utility and guidelines for its use are in Chapter 33, "Data Organization Utilities."
The third standard method for tuning DB2 access paths is to encourage parallelism. Consider changing simple and segmented tablespaces to partitioned tablespaces to encourage I/O, CPU, and Sysplex parallelism. Furthermore, it may be advantageous to repartition already partitioned tablespaces to better align ranges of values, thereby promoting better parallel access. The fourth and final standard method for tuning DB2 access paths is ensuring that there are proper indexes by creating new indexes or dropping unnecessary and unused indexes. DB2 relies on indexes to achieve optimum performance. Analyze the predicates in your SQL statements to determine whether there is an index that DB2 can use. Indexes can be used efficiently by DB2 if the first column of the index key is specified in an indexable predicate in the SQL statement. Refer to Chapter 2, "Data Manipulation Guidelines," for a discussion of indexable and non-indexable predicates. If no index meets these requirements, consider creating one. As you index more columns referenced in predicates, performance generally increases. Dropping unused indexes is another critical part of application tuning. Every table INSERT and DELETE incurs I/O to every index defined for that table. Every UPDATE of indexed columns incurs I/O to every index defined for that column. If an index is not being used, drop it. This reduces the I/O incurred for data modification SQL statements, reduces RUNSTATS resource requirements, and speeds REORG and RECOVER processing. Tweaking the SQL StatementIf you do not want to change the DB2 Catalog statistics but the standard methods outlined in the preceding section are not helpful, you might consider tweaking the offending SQL statement. Tweaking is the process of changing a statement in a non-intuitive fashion, without altering its functionality. At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT . Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT . SELECT EMPNO, WORKDEPT, EDLEVEL, SALARY FROM DSN8610.EMP WHERE EMPNO BETWEEN '000020' AND '000350' AND (WORKDEPT > 'A01' OR 0 = 1); The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen. Another method of tweaking SQL to influence DB2's access path selection is to code redundant predicates. Recall from Chapter 21 that when DB2 calculates the filter factor for a SQL statement, it multiplies the filter factors for all predicates connected with AND .
Another option for getting a small amount of performance out of an SQL statement is to change the physical order of the predicates in your SQL code. DB2 evaluates predicates first by predicate type, then according to the order in which it encounters the predicates. The four types of SQL predicates are listed in the order that DB2 processes them:
Before deciding to tweak SQL statements to achieve different access paths, remember that you are changing SQL code in a nonintuitive fashion. For each modification you make to increase performance, document the reasons in the program, the data dictionary, and the system documentation. Otherwise, the tweaked SQL could be maintained after it is no longer required, or modified away when it still is required for performance. Also remember that the changes could enhance performance for one release of DB2 but result in no gain or decreased efficiency in subsequent releases. Re-examine your SQL for each new version and release of DB2. OPTIMIZE FOR n ROWSAnother method of influencing access path selection is to specify OPTIMIZE FOR n ROWS for a cursor SELECT statement. This clause enables programmers to specify the estimated maximum number of rows that will be retrieved. By indicating that a different number of rows will be returned than DB2 anticipates, you can influence access path selection. For example, consider the following statement: EXEC SQL DECLARE OPT_CUR FOR SELECT WORKDEPT, EMPNO, SALARY FROM DSN8810.EMP WHERE WORKDEPT IN ('A00', 'D11') OPTIMIZE FOR 5 ROWS END-EXEC. The number of rows to be returned has been set to 5, even though this query could return more than 5 rows. DB2 formulates an access path optimized for 5 rows. More rows can be retrieved, but performance could suffer if you greatly exceed the estimated maximum. This type of tuning is preferable to both updating the DB2 Catalog statistics and tweaking the SQL statement. It provides more information to DB2's optimization process, thereby giving DB2 the opportunity to establish a better access path. The crucial point, though, is that DB2 is doing the optimization; no manual updates or artificial SQL constructs are required.
NOTE When using OPTIMIZE FOR n ROWS to disable list prefetch, set the value of n to 1. This technique works well to ensure that list prefetch is not used. CAUTION DB2 uses the value of n for the block size of a distributed network request. The smaller the value of n , the fewer rows sent across the network for each block. The only exception is that when n =1, DB2 will set the block size to 16. Changing DB2 Catalog StatisticsWhen the standard methods of influencing DB2's access path selection are not satisfactory, you can resort to updating the statistics in the DB2 Catalog. Only certain DB2 Catalog statistics can be modified using SQL UPDATE , INSERT , and DELETE statements instead of the normal method using RUNSTATS . This SQL modification of the DB2 Catalog can be performed only by a SYSADM. Table 28.3 lists the DB2 Catalog statistics that can be modified. You can use this table to determine which DB2 Catalog columns are updateable (using SQL) and which are used by the optimizer during sequential and parallel access path determination. Remember, for parallel queries, the sequential access path is generated and only then is the parallel access strategy generated. Table 28.3. The Updateable DB2 Catalog Statistics
The two predominant reasons for changing DB2 Catalog statistics to influence the access path selection are to influence DB2 to use an index and to influence DB2 to change the order in which tables are joined. In each case, the tuning methods require that you "play around" with the DB2 Catalog statistics to create a lower filter factor. You should keep in mind five rules when doing so.
The combination of HIGH2KEY and LOW2KEY provides a range of probable values accessed for a particular column. The absolute highest and lowest values are discarded to create a more realistic range. For certain types of predicates, DB2 uses the following formula when calculating filter factor: Filter factor = (Value-LOW2KEY) / (HIGH2KEY-LOW2KEY) Because HIGH2KEY and LOW2KEY can affect the size of the filter factor, the range of values that they provide can significantly affect access path selection.
Remember that modifying DB2 Catalog statistics is not a trivial exercise. Simply making the changes indicated in this section might be insufficient to resolve your performance problems because of DB2's knowledge of the DB2 Catalog statistics. Some statistical values have implicit relationships. When one value changes, DB2 assumes that the others have changed also. For example, consider these relationships:
Before deciding to update DB2 Catalog statistics to force DB2 to choose different access paths, heed the following warnings. First, never change the DB2 Catalog statistics without documenting the following:
Secondly, be aware that when you change DB2 Catalog statistics, you are robbing from Peter to pay Paul. In other words, your changes might enhance the performance of one query at the expense of the performance of another query. DB2 maintenance (PTFs, new releases, and new versions) might change the access path selection logic in the DB2 optimizer. As a result of applying maintenance, binding or rebinding static and dynamic SQL operations could result in different access paths, thereby invalidating your hard work. In other words, IBM might get around to correcting the problem in the logic of the optimizer (that you solved using trickery ). Choosing the correct values for the statistics and keeping the statistics accurate can be an intimidating task. Do not undertake this endeavor lightly. Plan to spend many hours changing statistics, rebinding plans, changing statistics again, rebinding again, and so on. The situation that caused the need to tinker with the statistics in the DB2 Catalog could change. For example, the properties of the data could vary as your application ages. Distribution, table and column cardinality, and the range of values stored could change. If the statistics are not changing because they have been artificially set outside the jurisdiction of RUNSTATS , these newer changes to the data cannot be considered by the DB2 optimizer, and an inefficient access path could be used indefinitely.
Only a SYSADM can update the DB2 Catalog. SYSADM s have a great amount of authority, so it is generally a good idea to limit the number of SYSADM s in your shop. When the DB2 Catalog needs to be altered, an undue burden is placed on the SYSADM s. When the DB2 Catalog has been updated using SQL, all subsequent RUNSTATS executions must be followed by a series of SQL statements to reapply the updates to the DB2 Catalog.
Using Optimization Hints ( OPTHINT ) to Force an Access PathYou also can influence access paths using the OPTHINT feature. Actually, though, this method does not "influence" the access path; instead it directs DB2 to use a specific access path instead of determining a new access path using statistics. IBM refers to this process as specifying optimization hints. CAUTION The same basic cautions that apply to modifying DB2 Catalog statistics also apply to optimization hints. Only experienced analysts and DBAs should attempt to use optimization hints. However, optimization hints are much easier to apply than updating DB2 Catalog statistics. Optimization hints are implemented using the PLAN_TABLE . However, before you can use optimization hints, the DB2 DSNZPARM parameter for optimization hints ( OPTHINTS ) must be set to YES . If it is set to NO , you cannot use optimization hints. There are two ways to use the PLAN_TABLE to provide an optimization hint to DB2:
In general, favor the first method over the second method. It is a difficult task to create an accurate access path in the PLAN_TABLE . If you do not get every nuance of the access path correct, it is possible that DB2 will ignore the optimization hint and calculate an access path at bind time. However, if you use an access path that was originally created by DB2, you can be reasonably sure that the access path will be valid. CAUTION Sometimes an access path created for an older version of DB2 will not be valid in a newer version of DB2. Of course, the opposite is true, too. Some access paths for a newer version of DB2 will not work for older versions. You should consider using optimization hints for all of the same reasons you would choose to modify DB2 Catalog statistics or tweak SQL. The general reason is to bypass the access path chosen by DB2 and use a different, hopefully more efficient, access path. In addition to this reason, optimization hints are very useful as you migrate from release to release of DB2. Sometimes, a new release or version of DB2 can cause different access paths to be chosen for queries that were running fine. Or perhaps new statistics were accumulated between binds causing access paths to change. By saving old access paths in a PLAN_TABLE , you can use optimization hints to direct DB2 to use the old access paths instead of the new, and perhaps undesirable, access paths due to the new release orstatistics. Always test and analyze the results of any query that uses optimization hints to be sure that the desired performance is being achieved. Defining an Optimization HintTo specify that an optimization hint is to be used, you will have to update the PLAN_TABLE . The first step is to make sure that your PLAN_TABLE includes the following columns: OPTHINT CHAR(8) NOT NULL WITH DEFAULT HINT_USED CHAR(8) NOT NULL WITH DEFAULT PRIMARY_ACCESSTYPE CHAR(1) NOT NULL WITH DEFAULT For more information on the PLAN_TABLE and a definition of all PLAN_TABLE columns, refer to Chapter 25, "Using EXPLAIN." To set an optimization hint, you need to first identify (or create) the PLAN_TABLE rows that refer to the desired access path. You will then need to update those rows in the PLAN_TABLE , specifying an identifier for the hint in the OPTHINT column. For example, UPDATE PLAN_TABLE SET OPTHINT = 'SQLHINT' WHERE PLANNO = 50 AND APPLNAME = 'PLANNAME'; Of course, this is just an example. You may need to use other predicates to specifically identify the PLAN_TABLE rows to include in the optimization hint. Some columns that might be useful, depending on your usage of dynamic SQL and packages, include QUERYNO , PROGNAME , VERSION , and COLLID . CAUTION If you change a program that uses static SQL statements, the statement number might change, causing rows in the PLAN_TABLE to be out of sync with the modified application. You can use the QUERYNO clause in SQL statements to ease correlation of SQL statements in your program with your optimization hints. Statements that use the QUERYNO clause are not dependent on the statement number. To use QUERYNO , you will need to modify the SQL in your application to specify a QUERYNO , as shown in the following: SELECT MGRNO FROM DEPT WHERE DEPNO = 'A00' QUERYNO 200; You can then UPDATE the PLAN_TABLE more easily using QUERYNO and be sure that the optimization hint will take effect, as shown in the following: UPDATE PLAN_TABLE SET OPTHINT = 'SQLHINT' WHERE QUERYNO = 200 AND APPLNAME = 'PLANNAME'; When the PLAN_TABLE is correctly updated (as well as possibly the application), you must REBIND the plan or package to determine if the hint is being used by DB2. When rebinding you must specify the OPTHINT parameter: REBIND PLAN PLANNAME . . . OPTHINT(SQLHINT) Be aware that the optimization hints may not actually be used by DB2. For optimization hints to be used, the hint must be correctly specified, the REBIND must be accurately performed, and the environment must not have changed. For example, DB2 will not use an access path specified using an optimization hint if it relies on an index that has since been dropped. Use EXPLAIN(YES) to verify whether the hint was actually used. If the hint was used, the HINT_USED column for the new access path will contain the name of the optimization hint (such as SQLHINT in the previous example). Miscellaneous GuidelinesThe following miscellaneous guidelines provide you with useful general tips for improving DB2 performance. Favor Optimization Hints Over Updating the DB2 CatalogOptimization hints to influence access paths are less intrusive and easier to implement than changing columns in the DB2 Catalog. However, use optimization hints only as a last resort. Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path. Limit Ordering to Avoid ScanningThe optimizer is more likely to choose an index scan when ordering is important ( ORDER BY , GROUP BY , or DISTINCT ) and the index is clustered by the columns to be sorted. Maximize Buffers and Minimize Data AccessIf the inner table fits in 2% of the buffer pool, the nested loop join is favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if possible). Consider Deleting Nonuniform Distribution StatisticsTo decrease wild fluctuations in the performance of dynamic SQL statements, consider removing the nonuniform distribution statistics (NUDS) from the DB2 Catalog. Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected. For example, consider the following dynamic SQL statement: SELECT EMPNO, LASTNAME FROM DSN8810.EMP WHERE WORKDEPT = ? The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins. These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements. This suggested guideline does not mean that you should always delete the NUDS. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later with RUNSTATS .) Consider Collecting More Than Just the Top Ten NUDSIf non-uniform distribution impacts more than just the top ten most frequently occurring values, you should consider using the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data. DB2 Referential Integrity UseReferential integrity (RI) is the implementation of constraints between tables so that values from one table control the values in another. Recall that a referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table's primary key columns control the values permissible in the dependent table's foreign key columns. For example, in the sample table, DSN8810.EMP , the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table's DEPTNO column (the primary key). You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints. Application-enforced referential integrity is coded into application programs. Every program that can update referentially constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates. With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time. In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.
General Application TuningThis chapter has concentrated on some of the more complex methods of tuning your DB2 applications. A wealth of less complex information about building efficient SQL is also available. For this type of general SQL coding advice, and guidelines for coding efficient, performance-oriented SQL (DCL, DDL, and DML), refer to Chapters 2 through 10. |
< Day Day Up > |