Optimizing SQL | IBM i5/iSeries Primer(c) Concepts and Techniques for Programmers, Administrators, and Sys[... ]ators

Managing the performance of SQL statements is an important and complex task. This section reviews monitoring and measuring tools, such as Debug and the GUI Visual Explain. The role of indexes within an SQL database is also covered.

Using Debug to Optimize SQL Statements

Debug is the easiest performance monitoring tool to use and the most familiar for developers of traditional applications on the i5. To start Debug, simply initiate a 5250 session and key in the command:

      STRDBG

This activates the interactive Debug program within the session. Additional diagnostic messages will be reported for any SQL statements executed within the 5250 session. To see an example of the diagnostic messages, start the interactive SQL tool using:

      STRSQL

If you have not installed the DB2 SQL development kit, and the interactive SQL tool is not available, RUNSQLSTM can be used instead. The RUNSQLSTM is included in the base operating system in recent releases and allows SQL statements written in text files to be executed. Which command you use to execute the statement is not significant: As the statement executes, diagnostic messages (like the following) are written to the job log:

      **** Starting optimizer debug message for query .      ..... messages .....      **** Ending debug message for query .

If the SQL engine (also sometimes referred to as the query optimizer) determines that creating an index would improve the performance of the SQL statement, it will indicate that with the following message:

      Access path built for file your-file

If you put your cursor on this message and press Help, the second-level help text will include the name of the table that may require an index and the columns and sort sequence for the key to the index. Any time that the system builds an access path during the execution of an SQL statement, you should consider creating an index or a logical file to improve the performance of your application. Several factors are involved in this decision:

How often is the SQL statement run?
How critical is the speed of this statement?
How volatile is the data being indexed?

How often is the SQL statement run? If the statement is run infrequently, then building an index or logical file to improve its performance is not likely necessary. Having a rule of thumb for dealing with this sort of question is helpful. My rule of thumb is discussed here.

How critical is the speed of the SQL statement? If this SQL statement drives a production scheduling application and is critical for the smooth running of your facility, then by all means, build the required logical file or index. Or, if its needed by the president of your company, you may want to improve its performance. On the other hand, if its a report that runs at 3:00 AM, and there is no pressure to speed up its execution, then it might be wise to not build the logical file or index.

How volatile is the data being indexed? The volatility of the data is a critical factor in deciding whether to build an index. The more volatile the data, the more expensive (from a CPU standpoint) it is to maintain an index on the file. Therefore, files that are fairly static, such as history files and some master files, are good candidates for building indexes, whereas more active files, such as transaction and inventory files, are not as good. The basic considerations in deciding whether to build an index are outlined below.

A Bit about Indexes

Before we delve any further into the topic of creating indexes, we should discuss exactly what they are. Just as an index in a book is a short listing of key descriptions that reference more content elsewhere in the document, the database index is a short listing of "keys" that identify specific records within the database. For certain kinds of searches, these indexes greatly improve performance.

The standard index on the i5 and in most databases is a binary radix tree, which somewhat resembles a family tree. This is useful for sorting and selecting data when less than 20 percent of the database is being returned. For statements that return from 20 to 60 percent of the records in a file, a different type of index, the encoded vector index (EVI) might be more helpful.

The EVI is an advanced version of the bitmap index. Bitmap indexes are used in some databases and function quite differently from binary tree indexes. The bitmap index is made up of a table of bits. Figure 25.35 illustrates a simple bit map that has one bit in each row for every record in the file and one row for each distinct value in the database.

image from book
Figure 25.35: Simple bit map.

The more distinct values you have, the more rows in the index. So, an index over the STATE field in a Customer Master File containing 20,000 records will likely have no more than 50 distinct values in that column, thus making it an excellent choice for an EVI. If the index had been over the customer number column, it would be a terrible choice for an EVI, since it would have 20,000 distinct values! The EVI is actually a bit (pun intended) more complicated than the typical bitmap index, but the same rules apply, and that is the part we care about. Well let the engineers at IBM worry about the finer points of how it's implemented.

Indexes are not new with SQL; they have been included in our databases for decades. The logical files within our traditional databases often include indexes, and sometimes even the physical files define indexes. SQL indexes are different, in that they are handled separately from the tables and views. Some performancerelated issues arise when using SQL statements against traditional databases.

Index Overhead

We don't simply create every possible index for several good reasons. The first reason is that each index (or logical file) that we create adds overhead to the database. Each time a record is added or deleted from a physical file (table), every related index and logical file must also be updated. Every time a record in a file is changed, any indexes that use the field in their key must be updated. This "hidden" activity adds to the general overhead on the system and slows down your applications. Creating indexes is always an issue of "pay me now or pay me later." Other issues include determining the appropriate type of index to create and the order in which to create them, both of which are discussed later in the chapter. Every index you create adds to the overhead of the system, but if you use that index often, then creating it may actually reduce the workload on the database.

Binary Radix Tree versus Encoded Vector Indexes

There are two types of SQL indexes to choose from, binary radix tree or EVI. Use the default binary tree whenever there is a high degree of uniqueness to each key or few records with identical values in the key fields. For example, a date stamp within a transaction file is very unique. Most records, though perhaps not all, will have their own value. This makes it a poor choice for EVI and an excellent choice for a binary tree index. On the other hand, the product line column in the same history file might contain only ten different values. Thousands of records could easily share the same value. This is a perfect example of when to use an EVI instead of a binary tree.

Also note that EVIs are designed more for selecting than sorting, so think of them more in terms of improving the performance of the WHERE clause rather than the ORDER BY clause. EVIs are also quite a bit smaller than binary trees. They often take up as little as one-sixteenth the space that the binary tree does. Saving disk, and more important, allowing more of the index to be loaded into a single page of memory, greatly improves performance.

Create Index Syntax

The CREATE INDEX statement is fairly simple. It contains only a few different clauses, each of which are discussed below:

      CREATE INDEX TRANHIST01 ON TRANHST (TIMESTMP)

This example creates a binary radix tree (TRANHIST01) over the transaction history file (TRANHIST). It is built over just one column, the time stamp that indicates when it was posted. If date and time were stored in separate fields within the table, the statement might look like:

      CREATE INDEX TRANHIST02 ON TRANHST (TRANDATE, TRANTIME)

In this example, you can see that when the index is built over multiple key fields, you must include them all within the parentheses after the table name and separate them with commas. This index sorts the columns in ascending sequence by default, but sometimes the data is needed in descending order. The following statement illustrates that syntax:

      CREATE INDEX ARHIST01 ON ARHIST        (POSTDATE DESC, TRANTIME DESC)

The DESC keywords included after the key field names indicate that the index should be optimized for retrieving the data in descending sequence. For applications such as accounts receivable, in which aged statements show the most recent transactions first, this is a useful option.

This index will rarely have two or more records with exactly the same data and time. Sometimes, however, an index is expected to have unique values for each record, and the database is required to enforce this. The following statement illustrates that syntax:

      CREATE UNIQUE INDEX ITEMMAST01 ON ITEMMAST (ITEM)

The Item Master File has only one record for each item, and that rule is critical for keeping the application running smoothly and error free. This UNIQUE keyword in this index prevents the creation of duplicate records, and it is an important part of the database design.

In addition to controlling the behavior of the database, additional keywords can be added to improve performance. For example:

      CREATE INDEX CUSTMAST01 ON CUSTMAST (STATE)        WITH 60 DISTINCT VALUES

Adding the WITH x DISTINCT VALUES clause to a binary radix tree does not change its behavior, but it does pass that information to the SQL engine to assist in optimizing the execution of statements using the index.

If the index were an EVI, this clause would actually change its behavior. With the relatively few distinct values over that column, and the fact that many records will contain the same values in their STATE column, this index would be more effective as an EVI. The syntax for creating it as an EVI is:

      CREATE ENCODED VECTOR INDEX CUSTMAST01 ON      CUSTMAST (STATE) WITH 60 DISTINCT VALUES

The ENCODED VECTOR keywords at the beginning of the statement are all that is required to change the type of index created. The WITH x DISTINCT VALUES clause is not required, and it will default to 256. For an EVI, that value is more than a suggestion to the optimizerit actually determines what size the key values within the index must be. The lower the number of distinct values, the smaller the key fields can be and the better the performance will be. So, make the distinct values as small as possible while remaining accurate. If the actual number of distinct values in the key exceeds this number, the system will automatically adjust the size of the key.

Index Search Order

Anytime an index is needed by the SQL engine, it scans all the available indexes within the database looking for a useful index. The indexes are searched in the reverse order that they are created, so the most recently created indexes are examined first. Therefore, its logical that the most important, most useful indexes should be built last. This is difficult to accomplish in the natural life of your application, because the most common and useful indexes will be built early in the life of the database, while the more esoteric and bizarre indexes will be created later in the life of the database.

To correct this situation, I recommend that every application include a program or procedure that drops (deletes) all the indexes and recreates them in a logical sequence. As a general rule, build the larger, more complex indexes first and the simpler indexes later. This also is important because the query optimizer within the SQL engine only spends a limited amount of time analyzing the available indexes; so, when a large number of indexes are available, it's important that the best candidates be the first ones looked at.

Combining Smaller Indexes

It's tempting to create an index that matches every desired search condition, but that is not very practical and will quite likely add to the overhead on the system and make all your SQL statements run more slowly. Before you run out and build another index, consider that the SQL engine can and will combine different indexes together to get the results you need. For example, if one index exists over the product line column, and another index over the plant number column, and you issue this SQL statement:

      SELECT * FROM ITEMMAST WHERE PLANT IN (01,02,04) AND      PRODLIN = 'A1'

the SQL engine will very likely combine the two indexes together to create a better option, without paying the cost of creating a new index from scratch. Indexes are not the only consideration however; often the way that the statement itself is written causes an unnecessary slow-down in its execution.

The topics covered in this section are just the tip of the iceberg; the most important thing to take with you after reading this material is that SQL performance is not static. How you design your files, write your statements, and manage your system dramatically affects the performance of your application, and you have the ability to monitor and improve the performance of your application.

Indexes are critical to improving the performance of SQL statements. Both the traditional binary radix tree index and the EVI have advantages in certain situations. Its important to know what type of index to build and when to build it. When using logical files over legacy databases, other performance issues, such as access path maintenance, affect the overall performance of an application.