SQL Performance Factors | DB2 Developers Guide (5th Edition)

< Day Day Up >

This first chapter discusses SQL basics, but little has been covered pertaining to SQL performance. You need at least a rudimentary knowledge of the factors affecting SQL performance before reading a discussion of the best ways to achieve optimum performance. This section is an introduction to DB2 optimization and some DB2 performance features. These topics are discussed in depth in Part V, "DB2 Performance Tuning."

Introduction to the Optimizer

The DB2 optimizer is integral to the operation of SQL statements. The optimizer, as its name implies, determines the optimal method of satisfying an SQL request. For example, consider the following statement:

 SELECT  EMPNO, WORKDEPT, DEPTNAME FROM    DSN8810.EMP,         DSN8810.DEPT WHERE   DEPTNO = WORKDEPT;

This statement, whether embedded statically in an application program or executed dynamically, must be passed through the DB2 optimizer before execution. The optimizer parses the statement and determines the following:

Which tables must be accessed
Whether the tables are in partitioned table spaces or not (to determine whether or not query I/O, CPU, and Sysplex parallelism is feasible )
Which columns from those tables need to be returned
Which columns participate in the SQL statement's predicates
Whether there are any indexes for this combination of tables and columns
The order in which to evaluate the query's predicates
What statistics are available in the DB2 Catalog

Based on this information (and system information), the optimizer analyzes the possible access paths and chooses the best one for the given query. An access path is the navigation logic used by DB2 to access the requisite data. A "table space scan using sequential prefetch" is an example of a DB2 access path . Access paths are discussed in greater detail in Part V.

The optimizer acts like a complex expert system. Based on models developed by IBM for estimating the cost of CPU and I/O time, the impact of uniform and non-uniform data distribution, evaluation DB2 object statistics, and the state of table spaces and indexes, the optimizer usually arrives at a good estimate of the optimal access path. Remember, though, that it is only a "best guess." Several factors can cause the DB2 optimizer to choose the wrong access path, such as incorrect or outdated statistics in the DB2 Catalog, an improper physical or logical database design, an improper use of SQL (for example, record-at-a-time processing), or bugs in the logic of the optimizer (although this occurs infrequently).

The optimizer usually produces a better access path than a programmer or analyst could develop manually. Sometimes, the user knows more than DB2 about the nature of the data being accessed. If this is the case, there are ways to influence DB2's choice of access path. The best policy is to allow DB2 initially to choose all access paths automatically, then challenge its decision only when performance suffers. Although the DB2 optimizer does a good job for most queries, you might need to periodically examine, modify, or influence the access paths for some SQL statements.

NOTE

As a general rule of thumb, be sure to review and tune all SQL statements prior to migrating the SQL to the production environment.

Influencing the Access Path

DB2's optimizer determines the best access method based on the information discussed previously. However, users can influence the DB2 optimizer to choose a different access path if they know a few tricks.

To influence access path selection, users can tweak the SQL statement being optimized or update the statistics in the DB2 Catalog. Both of these methods are problematic and not recommended, but can be used as a last resort. If an SQL statement is causing severe performance degradation, you could consider using these options.

NOTE

As of DB2 V6, though, there is another option for bypassing the DB2 optimizer's access path choices. IBM calls the feature optimizer "hints." Optimizer hints are covered briefly in the next section, and in more depth in Chapter 28, "Tuning DB2's Components ."

Using "hints" to modify access paths usually is preferable to manually updating DB2 Catalog statistics.

One option is to change the SQL statement. Some SQL statements function more efficiently than others based on the version of DB2. As you learned previously, SQL is flexible; you can write functionally equivalent SQL in many ways. Sometimes, by altering the way in which an SQL statement is written, you can influence DB2 to choose a different access path.

The danger in coding SQL to take advantage of release-dependent features lies in the fact that DB2 continues to be enhanced and upgraded. If a future DB2 release changes the performance feature you took advantage of, your SQL statement may degrade. It usually is unwise to take advantage of a product's undocumented features, unless it is as a last resort. If this is done, be sure to document and retain information about the workaround. At a minimum, keep the following data:

The reason for the workaround (for example, for performance or functionality).
A description of the workaround (what exactly was changed and why).
If SQL is modified, keep a copy of the old SQL statement and a copy of the new SQL statement.
The version and release of DB2 at the time of the workaround.

The second method of influencing DB2's choice of access path is to update the statistics in the DB2 Catalog on which the optimizer relies. DB2 calculates a filter factor for each possible access path based on the values stored in the DB2 Catalog and the type of predicates in the SQL statement to be optimized. Filter factors estimate the number of accesses required to return the desired results. The lower the filter factor, the more rows filtered out by the access path and the more efficient the access path.

There are two methods of modifying DB2 Catalog statistics. The first is with the RUNSTATS utility. RUNSTATS can be executed for each table space that requires updated statistics. This approach is recommended because it populates the DB2 Catalog with accurate statistics based on a sampling of the data currently stored in the table spaces. Sometimes, however, accurate statistics produce an undesirable access path. To get around this, DB2 allows SYSADM users to modify the statistics stored in the DB2 Catalog. Most, but not all, of these statistical columns can be changed using SQL update statements. By changing the statistical information used by the optimization process, you can influence the access path chosen by DB2. This method can be used to

Mimic production volumes in a test system to determine production access paths before migrating a system to production
Favor certain access paths over others by specifying either lower or higher cardinality for specific tables or columns
Favor indexed access by changing index statistics

Examples of this are shown in Chapter 21, along with additional information on access paths and influencing DB2.

Directly updating the DB2 Catalog, however, generally is not recommended. You may get unpredictable results because the values being changed will not accurately reflect the actual table space data. Additionally, if RUNSTATS is executed any time after the DB2 Catalog statistics are updated, the values placed in the DB2 Catalog by SQL update statements are overwritten. It usually is very difficult to maintain accurate statistics for some columns and inaccurate, tweaked values for other columns. To do so, you must reapply the SQL updates to the DB2 Catalog immediately after you run the RUNSTATS utility and before you run any binds or rebinds.

In order to update DB2 Catalog statistics, you must have been granted the authority to update the specific DB2 Catalog tables (or columns) or have SYSADM authority.

As a general rule, updating the DB2 Catalog outside the jurisdiction of RUNSTATS should be considered only as a last resort. If SQL is used to update DB2 Catalog statistics, be sure to record and maintain the following information:

The reason for the DB2 Catalog updates
A description of the updates applied:

Applied once; RUNSTATS never runs again

Applied initially; RUNSTATS runs without reapplying updates

Applied initially; RUNSTATS runs and updates are immediately reapplied
The version and release of DB2 when the updates were first applied
The SQL UPDATE and INSERT statements used to modify the DB2 Catalog
A report of the DB2 Catalog statistics overlaid by the UPDATE statements (must be produced before the initial updates)

DB2 Optimizer "Hints"

It is possible also to use optimizer "hints" to achieve more control over the access paths chosen by DB2. Similar to the techniques just discussed for influencing access paths, optimizer "hints" should be used only as a final approach when more traditional methods do not create optimal access paths. Optimizer "hints" are also useful when you need to temporarily choose an alternate access path, and later revert back to the access path chosen by DB2.

NOTE

IBM uses the term "hints," but I choose to place it in quotes because the technique is not literally a hint; instead it is a directive for DB2 to use a pre-determined specified access path. IBM probably chose the term "hints" because Oracle provides optimizer hints and IBM is competing quite heavily with Oracle these days. In Oracle, a hint is implemented by coding specific comments into SQL statements ”such as USE NLJ to force use of a nested-loop join.

The typical scenario for using an optimizer "hint" follows . Over time, a query that was previously performing well begins to experience severe performance degradation. The performance problem occurs even though the DB2 Catalog statistics are kept up-to-date using RUNSTATS , and the package and/or plan containing the SQL is rebound using the new and accurate statistics. Upon further examination, the performance analyst determines that DB2 has chosen a new access path that does not perform as well as the old access path.

Faced with a choice between poor performance, modifying DB2 Catalogs statistics manually, and optimizer "hints," the performance analyst chooses to use "hints." Querying the PLAN_TABLE that contains the access path information for the offending statement, the analyst finds the older access path that performed well. The analyst then uses BIND to use the "hint" in the PLAN_TABLE , redirecting DB2 to use the old access path instead of calculating a new one. More details on access path "hints" are provided in Chapters 21 and 28.

NOTE

Be sure to thoroughly test and analyze the results of any query using optimizer "hints." If the environment has changed since the optimizer "hint" access path was chosen, the "hint" may be ignored by DB2, or only partially implemented.

DB2 Performance Features

Finally, it is important to understand the performance features that IBM has engineered into DB2. Performance features have been added with each successive release of DB2. This section is a short synopsis of some of the DB2 performance features discussed in depth throughout this book.

Sequential Prefetch

Sequential prefetch is a look-ahead read engine that enables DB2 to read many data pages in large chunks of pages, instead of one page at a time. It usually is invoked when a sequential scan of pages is needed. The overhead associated with I/O can be reduced with sequential prefetch because many pages are read before they must be used. When the pages are needed, they then are available without additional I/O.

Sequential prefetch can be invoked for both table space scans and index scans .

Sequential Detection

DB2 can dynamically detect sequential processing and invoke sequential prefetch even if the optimizer did not specify its use. DB2 can trigger sequential detection for a query that appears at first to be random, but instead begins to process data sequentially.

List Prefetch

When the DB2 optimizer determines that an index will increase the efficiency of access to data in a DB2 table, it may decide also to invoke list prefetch . List prefetch sorts the index entries into order by record identifier ( RID ). This sorting ensures that two index entries that must access the same page will require no more than one I/O because they now are accessed contiguously by record identifier. This reduction in I/O can increase performance.

Index Lookaside

The index lookaside feature is a method employed by DB2 to traverse indexes in an optimal manner. When using an index, DB2 normally traverses the b-tree structure of the index. This can involve significant overhead in checking root and nonleaf index pages when DB2 is looking for the appropriate leaf page for the given data. When using index lookaside, DB2 checks for the RID of the desired row on the current leaf page and the immediately higher nonleaf page. For repetitive index lookups, it is usually more efficient to check recently accessed pages (that are probably still in the bufferpool), than to traverse the b-tree from the root. Index lookaside, therefore, generally reduces the path length of locating rows.

Index Only Access

If all the data being retrieved is located in an index, DB2 can satisfy the query by accessing the index without accessing the table. Because additional reads of table pages are not required, I/O is reduced and performance is increased.

RDS Sorting

DB2 sorting occurs in the Relational Data Services (RDS) component of DB2. (See Part III for in-depth descriptions of DB2's components.) DB2's efficient sort algorithm uses a tournament sort technique. Additionally, with the proper hardware, DB2 can funnel sort requests to routines in microcode that significantly enhance the sort performance.

Operating System Exploitation

DB2 exploits many features of z/OS and OS/390, including cross memory services, efficient virtual storage use, data space usage with DB2 virtual pools, hiperspace usage with DB2 hiperpools, and effective use of expanded storage, enabling the use of very large buffer pool and EDM pool specifications. The dynamic statement cache also can be stored in a data space as of DB2 Version 6.

One of the biggest impacts of DB2 Version 8 is the requirement that you first must be running a zSeries machine and z/OS v1.3. DB2 V8 does not support old hardware, nor will it support OS/390. Owing to these architectural requirements, DB2 will have the ability to support large virtual memory. DB2 V8 surmounts the limitation of 2GB real storage that was imposed due to S/390's 31-bit addressing. Theoretically, with 64-bit addressing DB2 could have up to 16 exabytes of virtual storage addressability to be used by a single DB2 address space. Now there is some room for growth!

CAUTION

Although z/OS V1.3 is the minimal requirement for DB2 V8, some features require z/OS V1.4 and even V1.5.

Stage 1 and Stage 2 Processing

Sometimes referred to as sargable and nonsargable processing, Stage 1 and Stage 2 processing effectively splits the processing of SQL into separate components of DB2. Stage 1 processing is more efficient than Stage 2 processing.

There is no magic regarding which SQL predicates are Stage 1 and which are Stage 2. Indeed, the stage in which a predicate is evaluated can change from version to version of DB2. Usually, IBM pushes predicates from Stage 2 to Stage 1 to make them more efficient. Consult Chapter 2 for more details on Stage 1 and Stage 2 predicates.

Join Methods

When tables must be joined, the DB2 optimizer chooses one of three methods based on many factors, including all the information referred to in the discussion on optimization. The join methods are a merge scan, a nested loop join, and a hybrid join. A merge scan requires reading sorted rows and merging them based on the join criteria. A nested loop join repeatedly reads from one table, matching rows from the other table based on the join criteria. A hybrid join uses list prefetch to create partial rows from one table with RIDs from an index on the other table. The partial rows are sorted, with list prefetch used to complete the partial rows.

Lock Escalation

During application processing, if DB2 determines that performance is suffering because an inordinate number of locks have been taken, the granularity of the lock taken by the application might be escalated. Simply stated, if a program is accessing DB2 tables using page locking, and too many page locks are being used, DB2 might change the locking strategy to table space locking. This reduces the concurrency of access to the tables being manipulated, but significantly reduces overhead and increases performance for the application that was the beneficiary of the lock escalation.

Lock Avoidance

With lock avoidance, DB2 can avoid taking locks under certain circumstances, while still maintaining data integrity. DB2 can test to see if a row or page has committed data on it. If it does then DB2 may not have to obtain a lock on the data at all. Lock avoidance reduces overhead and improves application performance for those programs that can take advantage of it.

Data Compression

DB2 provides Lempel Ziv data compression employing hardware-assist for specific high-end CPU models or software compression for other models. Additionally, data compression can be directly specified in the CREATE TABLESPACE and ALTER TABLESPACE DDL, thereby avoiding the overhead and restrictions of an EDITPROC .

Data Sharing

DB2 provides the ability to couple DB2 subsystems together enabling data to be shared between multiple DB2s. This allows applications running on more than one DB2 subsystem to read from and write to the same DB2 tables simultaneously . This was not possible in prior releases without using DB2's distributed data capabilities. Additionally, data sharing enables nonstop DB2 processing. If one subsystem becomes unavailable, workload can be shifted to other subsystems participating in the data sharing group . Refer to Chapter 19, "Data Sharing," for an in-depth discussion of data sharing.

Query Parallelism

DB2 can utilize multiple read tasks to satisfy a single SQL SELECT statement. By running multiple, simultaneous read engines the overall elapsed time for an individual query can be substantially reduced. This will aid I/O-bound queries.

DB2 V4 improved on query I/O parallelism by enabling queries to utilize CPU in parallel. When CPU parallelism is engaged, each concurrent read engine will utilize its own portion of the central processor. This will aid processor-bound queries.

DB2 V5 improved parallelism even further with Sysplex query parallelism. With Sysplex query parallelism DB2 can spread the work for a single query across multiple DB2 subsystems in a data sharing group. This will further aid intensive , processor-bound queries.

DB2 V6 further improved parallelism by enabling data accessed in a non-partitioned table space to use query parallelism.

Partition Independence

Using resource serialization, DB2 has the ability to process a single partition while permitting concurrent access to independent partitions of the same table space by utilities and SQL. This partition independence enhances overall data availability by enabling users concurrent access to data in separate partitions.

Limited Partition Scanning

When processing against a partitioned table space, DB2 can enhance the performance of table space scans by limiting the partitions that are read. A limited partition table space scan will only read the specific range of partitions required based on the specified predicates in the WHERE clause.

DB2 V5 further modified partition scanning to enable skipping partitions in the middle of a range.

Uncommitted Read, a.k.a. "Dirty" Read

When data integrity is not an issue, DB2 can bypass locking and enable readers to access data regardless of its state. The "UR" isolation level provides a dirty read by allowing a SELECT statement to access data that is locked, in the process of being deleted, inserted but not yet committed, or, indeed in any state. This can greatly enhance performance in certain situations.

CAUTION

Never use DB2's dirty read capability without a complete understanding of its ramifications on data integrity. For more information on uncommitted read processing refer to Chapter 2 for statement level usage; and Chapter 13, "Program Preparation," for plan and package level usage.

Run Time Reoptimization

DB2 can reoptimize static and dynamic SQL statements that rely on input variables in the WHERE clause during processing. This feature enables DB2 to optimize SQL statements after the host variable, parameter marker, and special register values are known. Run time reoptimization can result in better access paths (albeit at a cost).

DB2 V8 adds the capability to reoptimize variables once ”the first time the statement is executed, instead of every time.

Instrumentation Facility Interface (IFI)

DB2 provides the Instrumentation Facility Interface, better known to DB2 professionals as IFI. The IFI is a facility for gathering trace data enabling users to better monitor and tune the DB2 environment. Using the DB2 IFI users can submit DB2 commands, obtain trace information, and pass data to DB2.

Dynamic System Parameters

Changing DB2 system parameters, commonly referred to as DSNZPARM s or simply ZPARMs, required DB2 to be stopped and restarted prior to Version 7. Due to rising availability requirements, in large part spurred by the Internet, many (but not all) ZPARMs can be changed on the fly ”without requiring DB2 to be recycled. This capability offers DBAs and system administrators greater flexibility and adaptability to respond to system and performance problems.

Historical and Real-Time Statistics

AS of V7 DB2 now stores historical statistical details in the DB2 Catalog. Previous DB2 versions simply overlaid old statistics with new whenever RUNSTATS was run. By keeping historical statistics DBAs can now compare current DB2 object characteristics with past statistics. Such comparisons can be helpful to tune queries and to predict performance results based on history.

IBM also added support for real-time statistics as a feature upgrade in between Version 7 and 8. Real-time stats are collected by DB2 during normal operations ”without requiring a separate utility (such as RUNSTATS ) to be run. The additional real-time statistics provide growth and performance information to help DBAs determine when maintenance tasks, such as reorganization, should be scheduled.

Materialized Query Tables

Data warehousing queries regularly involve complex operations on large amounts of data. To reduce the amount of time required to respond to such queries, DB2 Version 8 introduced materialized query tables (or MQTs). Using an MQT, DB2 stores data derived from one or more source tables. This materialized data can be summarized, joined, and combined using SQL operations, but the data is stored so the data warehousing queries operate more efficiently. Consult Chapter 45 for more details on materialized query tables.

Data Partitioned Secondary Indexes

One of the biggest problems faced by DBAs when they are managing large partitioned DB2 table spaces is contending with non-partitioned indexes. DB2 Version 8 helps to alleviate this problem with data partitioned secondary indexes (or DPSIs). A DPSI is basically a partitioned NPI. So, DPSIs are partitioned based on the data rows ”similar to a partitioning index. The number of parts in the index will be equal to the number of parts in the table space ”even though the DPSI is created based on columns different from those used to define the partitioning scheme for the table space. Therefore, partition 1 of the DPSI will be for the same rows as partition 1 of the table space and partition 1 of the partitioning index, and so on.

These changes to DB2 V8 provide many benefits, including

The ability to cluster by a secondary index
The ability to drop and rotate partitions easily
Less data sharing overhead

Be careful though, because converting to a DPSI can require that your queries be rewritten in order to use the DPSI instead of a table space scan.

< Day Day Up >