DB2 for iSeries Considerations | Intelligent Enterprises of the 21st Century

i5/OS for iSeries automates many of the configuration tasks for you, such as physical data placement or memory management. In addition to these tasks, DB2 UDB for iSeries handles query optimization as well. You may encounter situations where you would want to change or influence the default behaviors of these optimizations, and hence this section is focused on improving and monitoring query performance.

Note

DB2 UDB for iSeries is tightly integrated with the operating system; thus, tuning the performance at the operating system level may also impact the database. For a detailed look at improving performance at the operating system level, refer to the Performance section under System Administration in the IBM eServer iSeries Information Center.

Monitoring Memory Usage

On iSeries, tuning memory usage of any application is managed by the operating system. Hence, memory usage of the database is managed by the operating system, and you should not have the need to do this. To monitor and tune requires considerable knowledge of the operating system and its architecture. Refer to the iSeries Information Center for more information on this.

Performance Monitors and Commands

When tuning performance, it is necessary to analyze what is running to determine where performance bottlenecks may be occurring. The iSeries platform offers several avenues for investigating performance of SQL.

Often, you will have the need to investigate the performance characteristics of a workloadfor example, a set of queries. This may be required while you are developing applications, or while you are tasked to investigate a performance problem. The iSeries environment provides you with a database monitor tool to allow you to start capturing performance measurements at a certain point in time, and to stop measuring at a later point.

The commands that allow you to start and stop database monitors are STRDBMON and ENDDBMON, respectively. When you start a monitor, you have the option of collecting query statistics for all jobs on the system or just a particular job. All data collected is placed in a specified table, which can later be queried to analyze the data collected. The type of information collected is

System and job name
SQL statement and sub-select number
Start and end timestamp
Estimated processing time
Total rows in table queried
Number of rows selected
Estimated number of rows selected
Estimated number of joined rows
Key columns for advised index
Total optimization time
Join type and method
ODP implementation

This information can be used to target queries that require performance tuning. For example, you can look for long-running queries, or queries that have a high number of rows selected or joined.

The query optimizer (see the next section) is invoked for each statement that is captured. One of the tasks of the query optimizer is to evaluate if additional indexes can improve performance, and if so, provide advice on what indexes to create. This information is captured by the monitor and can be very useful in improving performance.

The table in which the data is captured is specified in the STRDBMON command. The definition for the table can be found in the IBM eServer iSeries Information Center in the Performance and Optimization section under Database. Commonly used queries against database monitor captured data can be found at www.iseries.ibm.com/db2/dbmonqrys.htm.

Note

Due to the high volume of information captured, when running the database performance monitor, system resources can become constrained. The iSeries server does provide a memory-resident database monitor tool, which can be accessed through APIs. This tool greatly reduces the overhead on system resources. Details can be found in the Information Center.

The iSeries Navigator is a very user-friendly tool in helping with performance monitoring. It basically provides a GUI interface to the STRDBMON and ENDDBMON commands. In addition, it provides a standard set of reports, which can be modified if you want, that can be run against the data collected from the monitor. Figure 12.10 shows how a monitor can be started from the iSeries Navigator.

Figure 12.10. Starting a monitor from the iSeries Navigator.

To stop the monitor, you can select the monitor, right-click, and select End. Once you have ended the monitor, you can analyze the results, again by right-clicking on the monitor and selecting Analyze Results. This will present you with a window that allows you to select the information you are interested in. Figure 12.11 shows the some of choices you have available to you.

Figure 12.11. Sample of available monitor reports through iSeries Navigator.

In contrast to the database monitor that captures data on real time workloads, the PRTSQLINF command is provided to capture information about SQL statements embedded in a program, an SQL package, or a service program. This command is the iSeries version of the db2expln command on the LUW platforms. Remember that when a stored procedure or trigger is created, it is compiled and a program is created. Similarly, when a function is created, the resulting object is a service program.

So if you wanted to analyze the SQL statements in a particular stored procedure, you would need to obtain the corresponding program that was created and specify it in the PRTSQLINF command. The information is analyzed and printed (sent to spooled output). The type of information captured is the same as in the database monitor.

Figure 12.12 shows an example of how to query the system catalogs to determine the system name for an SQL procedure, and then uses it as input for the PRTSQLINF command.

Figure 12.12. An example of PRTSQLINF output.

 Query the system catalog: SELECT external_name   FROM qsys2.sysroutine  WHERE specific_schema = 'DB2ADMIN'    AND specific_name = 'BONUS_INCR'; Issue the PRTSQLINF command: PRTSQLINF OBJ(DB2ADMIN/BONUS_INCR) OBJTYPE(*PGM) Output found under WRKSPLF:                                     Display Spooled File  File . . . . . :    BONUS_INCR Page/Line  1/1  Control . . . . . Columns     1 - 130  Find . . . . . . *...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+.... 8....+....9....+....0  5722SS1 V5R3M0 040528  Print SQL information  Program DB2ADMIN/BONUS_INCR 06/11/04 08:52:57 Page   1  Object name...............DB2ADMIN/BONUS_INCR  Object type...............*PGM    CRTSQLCI        OBJ(DB2ADMIN/BONUS_INCR)        SRCFILE(QTEMP/QSQLSRC)        SRCMBR(BONUS_INCR)        COMMIT(*NONE)        OPTION(*SQL *PERIOD *NOCNULRQD)        TGTRLS(V5R3M0)        ALWCPYDTA(*OPTIMIZE)        CLOSQLCSR(*ENDACTGRP)        RDB(*LOCAL)        DATFMT(*ISO)        TIMFMT(*ISO)        DFTRDBCOL(DB2ADMIN)        DYNDFTCOL(*NO)        SQLPKG(DB2ADMIN/BONUS_INCR)        ALWBLK(*ALLREAD)        DLYPRP(*YES)        DYNUSRPRF(*USER)        SRTSEQ(*HEX)        LANGID(ENU)        RDBCNNMTH(*DUW)        TEXT('SQL PROCEDURE BONUS_INC                           ')        STATEMENT TEXT CCSID(37)        SQLPATH("DB2ADMIN" "SYSIBM")        DECRESULT(31 31 0)  DECLARE C_SALES CURSOR WITH HOLD FOR SELECT WORKDEPT , BONUS , EMPNO FROM      EMPLOYEE ORDER BY WORKDEPT    SQL4021  Access plan last saved on 06/10/04 at 23:42:56.    SQL4020  Estimated query run time is 0 seconds.    SQL4027  Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.    SQL4002  Reusable ODP sort used.    SQL4010  Table scan access for table 1.    SQL4006  All indexes considered for table 1.  OPEN C_SALES  FETCH C_SALES INTO : H : H , : H : H , : H : H  SAVEPOINT SVPT_BONUS_INCR ON ROLLBACK RETAIN CURSORS  SET : H : H = : H : H * 1.1    SQL4021  Access plan last saved on 06/10/04 at 23:42:56.    SQL4020  Estimated query run time is 0 seconds.    SQL4027  Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.    SQL4010  Table scan access for table 1.  UPDATE EMPLOYEE SET BONUS = : H : H WHERE EMPNO = : H : H  5722SS1 V5R3M0 040528     Print SQL information          Program DB2ADMIN/BONUS_INCR                     06/11/04 08:52:57   Page    2    SQL4021  Access plan last saved on 06/10/04 at 23:42:56.    SQL4020  Estimated query run time is 0 seconds.    SQL4027  Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.    SQL4008  Index EMPLOYEE used for table 1.    SQL4026  Index only access used on table number 1.    SQL4011  Index scan-key row positioning used on table 1.    SQL4006  All indexes considered for table 1.  SET : H : H = : H : H + : H : H    SQL4021  Access plan last saved on 06/10/04 at 23:42:56.    SQL4020  Estimated query run time is 0 seconds.    SQL4027  Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.    SQL4010  Table scan access for table 1.  FETCH C_SALES INTO : H : H , : H : H , : H : H  COMMIT  ROLLBACK TO SAVEPOINT SVPT_BONUS_INCR  RELEASE SAVEPOINT SVPT_BONUS_INCR  CLOSE C_SALES  SELECT 1 INTO : H FROM QSYS2 . QSQPTABL WHERE ( : H : H = : H : H ) AND ( : H :      H = 0 )    SQL4021  Access plan last saved on 06/10/04 at 23:42:56.    SQL4020  Estimated query run time is 0 seconds.    SQL4027  Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.    SQL4010  Table scan access for table 1.  CLOSE C_SALES

Query Optimizer and Access Plans

The previous section alluded to the concept of a query optimizer. Each time a query is processed (or prepared), an access plan is created. An access plan directs the database manager on how to execute the query. In particular, it defines which indexes would be used, what type of scan operations will occur, if any intermediate sorts are required, the order of the operations, if any operations can execute in parallel, and more.

In order to compile the access plan, a key input to the query optimizer are the statistics on various database objects gathered by the Statistics Manager (see the next section). This information, along with other environment settingssuch as the number of processors, the commitment level (isolation level) under which the query will execute, and the row blocking optionsis used to generate the plan.

To help assess whether the optimizer will choose an efficient access plan, you can use the Visual Explain tool. This tool is available from the Run SQL Scripts window of the IBM iSeries Navigator. It takes as input a query and produces a visual representation of the access plan. In addition to the access plan, for each operation in the access plan very detailed information about the query and the particular operation is provided.

The best way to show the value of the tool is to show an example. Assume that there are no indexes in the database, and you want to see the access plan for the query shown in Figure 12.13.

Figure 12.13. Simple query to be explained.

 SELECT e.firstnme       ,e.midinit       ,e.lastnme       ,e.deptname   FROM employee e       ,department d  WHERE e.workdept = d.deptno

The access plan generated by the Visual Explain is shown in Figure 12.14.

Figure 12.14. Access plan without index.

Looking at the access plan graph, you see that a temporary hash table will be created to obtain the result set. The information on the right displays information about the temporary hash table. If this query was to be used many times, then it may be wise to add a permanent index on the DEPTNO column in DEPARTMENT. To help with determining which indexes to create, the tool also provides an Index Advisor. It can be accessed by clicking the footprints icon in the Visual Explain tool as shown in Figure 12.14. Figure 12.15 shows the recommended indexes based on the Index Advisor.

Figure 12.15. Recommendations from the Index Advisor.

Not surprisingly, the DEPTNO column on the DEPARTMENT table is recommended. This interface can also be used to create the index. Once created, you can re-run your Visual Explain and see the resulting access plan with the unique index, shown in Figure 12.16.

Figure 12.16. Access plan with index.

The revised access plan no longer creates a temporary has table, but uses the permanent index that was created.

The example, though simple, demonstrates the value of such a tool. It would be a good development practice to generate an access plan for every statement that is developed, and validate that an efficient plan is generated. In general, the goal would be to minimize the number of full table scans.

Variable Length Columns and Large Objects

On iSeries, a row has two sections: a fixed length portion and a variable length portion. The variable length columns (VARCHAR, VARGRAPHIC, VARBINARY) and large objects (CLOB, BLOB, DBCLOB) are stored in the variable length portion of the row. Access to these columns typically requires a secondary I/O operation; however, unlike on LUW, this I/O operation is buffered.

Tip

Use the ALLOCATE clause of the column definition in the CREATE TABLE statement to reserve a portion of the variable length columns in the fixed-length portion of the row to avoid a secondary I/O operation.

Database Statistics

Database table, column, and index statistics are vital to generating the most efficient query access plans. The Statistics Manager is tasked with collecting and keeping the statistics up to date. It keeps track of the following information:

Cardinality of column values
Selectivity (distribution of values in a column)
Frequent values (top values in a column and associated frequency)
Metadata information on a table (number of rows and indexes)
Estimate of I/O operation (for access to table or index)

The information is collected automatically, but updates to statistical information are not immediate, with the exception of indexes. Statistics on indexes are updated as changes occur. For other objects such as tables or columns, collection occurs in the background as system resources become available.

Because index statistics are immediately updated, they act as the primary source of statistical information for the DB2 UDB iSeries optimizer. One of the biggest causes of poor performance on iSeries is the absence of the correct indexes.

When the Statistics Manager receives requests from the query optimizer, estimates for column statistics are returned, regardless of whether the statistics have been collected. If a request for a column with no statistical data is received, a default value is returned and the column is marked for collection. The next time collection starts, statistics for that column will be collected.

To determine when to re-calculate statistics after they have been collected, the Statistics Manager relies on the Plan Cache. Recall from Chapter 7, "Working with Dynamic SQL," that the plan cache is an area in memory where generated access plans are stored. Each time a new access plan is created or an existing plan is re-used, the Statistics Manager will validate to see if more than 15 percent of the data in the underlying table(s) has changed. If it has, then the corresponding columns will be marked for collection. Additionally, the access plan will also be marked so the next time it is accessed, a new plan is generated.

The automation of the collection works well; however, there are cases when you may want to force the collection of statisticsfor example, if you've completed some mass updates or added large volumes of data. You have two options available to you for forcing the collection of statistics. You can use the IBM iSeries Navigator to select tables and columns for which to collect statistics, or you can do it programmatically through the use of APIs.

Note

If your system resources are busy for long periods of time, statistics collection will not occur because the Statistics Manager waits for system resources to become available.