SQL Access Guidelines

< Day Day Up >

The SQL access guidelines will help you develop efficient data retrieval SQL for DB2 applications. Test them to determine their usefulness and effectiveness in your environment.

Pretest All Embedded SQL

Before embedding SQL in an application program, you should test it using SPUFI , QMF , or whatever ad hoc query tool you have available. This way, you can reduce the amount of program testing by ensuring that all SQL code is syntactically correct and efficient. Only after the SQL statements have been thoroughly tested and debugged should they be placed in an application program.

Use `EXPLAIN`

Use the EXPLAIN command to gain further insight into the performance potential for each SQL statement in an application. When EXPLAIN is executed on an SQL statement or application plan, information about the access path chosen by the optimizer is provided. This information is inserted into a DB2 table called the PLAN_TABLE . By querying the PLAN_TABLE , an analyst can determine the potential efficiency of SQL queries. Part V, "DB2 Performance Tuning," provides a complete description of the EXPLAIN command and guidelines for interpreting its output. EXPLAIN also populates the DSN_STATEMNT_TABLE with statement cost information and the DSN_FUNCTION_TABLE table with information about user -defined function usage.

Use EXPLAIN and analyze the results for each SQL statement before it is migrated to the production application. Following this procedure is important not only for SQL statements in application programs, but also for canned QMF queries, and any other, predictable, dynamic SQL queries. For application programs, EXPLAIN can be used with the EXPLAIN option of the BIND command. Specifying EXPLAIN(YES) when you use BIND on an application plan or package provides the access path information necessary to determine the efficiency of the statements in the program. For a QMF (or ad hoc) query, use EXPLAIN on it before allowing the statement to be used in production procedures.

The following is an example of running EXPLAIN for a SELECT statement:

 EXPLAIN PLAN SET QUERYNO = 1 FOR      SELECT  *      FROM    DSN8810.DEPT      WHERE   DEPTNO = 'D21';

EXPLAIN enables a programmer or DBA to analyze the chosen access path by studying the PLAN_TABLE .

Because EXPLAIN provides access path information based on the statistics stored in the DB2 Catalog, you should keep these statistics current and accurate. Sometimes you must "fudge" the DB2 Catalog statistics to produce production access paths in a test environment. (See the section, "Influencing the Access Path," in Chapter 1 for more information.)

Use All `PLAN_TABLE` Columns Available

Each new release or version of DB2 adds new columns to the PLAN_TABLE . These new columns are used to report on new access paths and features. Sometimes shops fail to add the new PLAN_TABLE columns after a new release is installed. Be sure to verify that the PLAN_TABLE actually contains every column that is available for the current DB2 release being run. For more information on the PLAN_TABLE and the columns available for each DB2 release, refer to Chapter 25, "Using EXPLAIN."

For DB2 V7, the following columns were added to the PLAN_TABLE to offer additional information on table functions, temporary intermediate result tables, and work files:

PARENT_QBLOCK ” SMALLINT
TABLE_TYPE ” CHAR(1)

For DB2 V8, the following columns were added to the PLAN_TABLE :

TABLE_ENCODE ” CHAR(1)
TABLE_SCCSID ” SMALLINT
TABLE_MCCSID ” SMALLINT
TABLE_DCCSID ” SMALLINT
ROUTINE_ID ” INTEGER

Use the `DSN_STATEMNT_TABLE`

As of DB2 V6, EXPLAIN also can determine an estimated cost of executing SQL SELECT , INSERT , UPDATE , or DELETE statements. EXPLAIN will populate DSN_STATEMNT_TABLE , also known as the statement table, at the same time it populates the PLAN_TABLE . After running EXPLAIN , the statement table will contain cost estimates, in service units and in milliseconds , for the SQL statements being bound or prepared (both static and dynamic SQL).

The estimates can be used to help determine the cost of running SQL statements. However, keep in mind that the cost numbers are just estimates. Factors that can cause the estimates to be inaccurate include cost adjustments caused by parallel processing, the use of triggers and user-defined functions, and inaccurate statistics.

For more information on statement tables and cost estimates, see Chapter 25.

Use `DSN_FUNCTION_TABLE` to Explain User-Defined Function Usage

If you have implemented user-defined functions (UDFs), be sure to create DSN_FUNCTION_TABLE , also known as the function table. DB2 inserts data into DSN_FUNCTION_TABLE for each function referenced in an SQL statement when EXPLAIN is executed on an SQL statement containing a UDF or when a program bound with EXPLAIN(YES) executes an SQL statement containing a UDF.

The data DB2 inserts to the function table contains information on how DB2 resolves the user-defined function references. This information can be quite useful when tuning or debugging SQL that specifies a UDF.

For more information on using EXPLAIN with the function table, see Chapter 4, "Using DB2 User-Defined Functions and Data Types."

Enable `EXPLAIN` for `Auto Rebind`

EXPLAIN during auto rebind can be enabled if you set an appropriate DSNZPARM . An auto rebind occurs when an authorized user attempts to execute an invalid plan or package. To revalidate the plan or package, DB2 will automatically rebind it. If EXPLAIN during auto rebind is not turned on, then you will not have a record of the new access paths that were created. The DSNZPARM for auto rebind is ABIND (the default is YES ); the DSNZPARM for binding during auto rebind is ABEXP (the default is YES ).

Plans and packages are invalidated when an object that an access path in the plan or package is using is dropped. Be sure that a proper PLAN_TABLE exists before enabling the EXPLAIN during auto rebind option.

When Data Sharing Specify `COEXIST` for `Auto Rebind`

If you are running in a data sharing environment, consider specifying COEXIST for ABIND . Doing so allows automatic rebind operations to be performed in a DB2 data sharing coexistence environment only when the plan or package is invalid, or when it was last bound on a new subsystem other than the one on which it is running. In this way, DB2 can keep the new access path even when a query is run in an older subsystem. Of course, DB2 will not magically use V7 features on a V6 subsystem (for example).

NOTE

When all members of a data sharing group have been migrated to the same DB2 version, ABIND COEXIST will be interpreted the same as ABIND YES .

Utilize Visual Explain and Query Analysis Tools

Visual Explain is a tool provided by IBM as a free feature of DB2 for z/OS. Visual Explain will display graphical representations of the DB2 access paths and advice on how to improve SQL performance. The display can be for access paths stored in a PLAN_TABLE or for EXPLAIN output from dynamic SQL statements.

One of the nice features of Visual Explain is its ability to display pertinent DB2 Catalog statistics for objects referenced in an access path. It is much easier to understand access paths from the visual representations of Visual Explain, than it is to interpret PLAN_TABLE output. Refer to Figure 2.1 for a sample Visual Explain screen shot.

Figure 2.1. Visual Explain graphically depicts an `EXPLAIN` .

You must run Visual Explain from a client workstation (there is no TSO or ISPF interface).

Significant improvements were made to the Visual Explain tool for DB2 Version 8. More detailed information is provided about access paths and more statistical details are available for each node in the query access graph. Furthermore, Visual Explain works with XML to document query access paths.

Even with all of the new functionality of Visual Explain, you may need to augment this tool with more in-depth SQL tuning options. To isolate potential performance problems in application plans or single SQL statements, utilize all available analysis tools, such as BMC Software's SQL Explorer or Computer Associates' Plan Analyzer. These products analyze the SQL code, provide a clear, textual description of the access path selected by the DB2 optimizer, and recommend alternative methods of coding your queries. They are similar in function to Visual Explain, but provide an ISPF interface and more in-depth tuning recommendations.

Avoid `SELECT *`

As a general rule, a query should never ask DB2 for anything more than is required to satisfy the desired task. Each query should access only the columns needed for the function to be performed. Following this dictum results in maximum flexibility and efficiency.

NOTE

Another cardinal rule of database performance is "never say always or never ." Well, perhaps this rule is better stated as " almost never say always or never." There are very few times in this text where I will say "always" or "never," but there are a few, such as never asking DB2 for anything more than is required.

The gain in flexibility is the result of decreased maintenance on application programs. Consider a table in which columns are modified, deleted, or added. Only programs that access the affected columns need to be changed. When a program uses SELECT * , however, every column in the table is accessed. The program must be modified when any of the columns change, even if the program doesn't use the changed columns. This use complicates the maintenance process.

For example, consider a program that contains the following statement:

 EXEC SQL      SELECT  *      INTO    :DEPTREC      FROM    DSN8810.DEPT      WHERE   DEPTNO = :HV-DEPT END-EXEC.

Suppose that the program is developed, tested, and migrated to the production environment. You then add a column to the DEPT table. The program then fails to execute the preceding statement because the DEPTREC layout does not contain the new column. (This program was compiled with the old DCLGEN .) The program must be recompiled with the new DCLGEN , a step that is not required when the program asks for only the columns it needs.

Additionally, by limiting your query to only those columns necessary

The programmer does not need extra time to code for the extraneous columns.
You avoid the DB2 overhead required to retrieve the extraneous columns. Overhead will be incurred because there is work required to move each column.
DB2 might be able to use an index-only access path that is unavailable for SELECT * .

Limit the Data Selected

Return the minimum number of columns and rows needed by your application program. Do not code generic queries (such as SELECT statements without a WHERE clause) that return more rows than necessary, and then filter the unnecessary rows with the application program. Doing so wastes disk I/O by retrieving useless data and wastes CPU and elapsed time returning the additional, unneeded rows to your program.

Allowing DB2 to use WHERE clauses to limit the data to be returned is more efficient than filtering data programmatically after it has been returned.

Do Not Ask for What You Already Know

This might sound obvious, but most programmers violate this rule at one time or another. For a typical example, consider what's wrong with this SQL statement:

 SELECT   EMPNO, LASTNAME, SALARY FROM     DSN8810.EMP WHERE    EMPNO = '000010';

Give up? The problem is that EMPNO is included in the SELECT -list. You already know that EMPNO will be equal to the value '000010' because that is what the WHERE clause tells DB2 to do. But with EMPNO listed in the WHERE clause, DB2 will dutifully retrieve that column, too. This incurs additional overhead for each qualifying row, thereby degrading performance. So, this statement would be better coded this way:

 SELECT   LASTNAME, SALARY FROM     DSN8810.EMP WHERE    EMPNO = '000010';

Singleton `SELECT` Versus the Cursor

To return a single row, an application program can use a cursor or a singleton SELECT . A cursor requires an OPEN , FETCH , and CLOSE to retrieve one row, whereas a singleton SELECT requires only SELECT...INTO . Usually, the singleton SELECT outperforms the cursor.

This is especially true for DB2 V7 and later releases because the FETCH FIRST 1 ROW ONLY clause can be added to any SELECT statement (both singleton SELECT s and cursors ) to ensure that only one row is ever returned. Historically, the biggest problem associated with using a singleton SELECT is managing what happens when more than one row can be returned. DB2 will return a SQLCODE of “811 if more than one row results from a singleton SELECT . By adding FETCH FIRST 1 ROW ONLY to the singleton SELECT , though, DB2 will never return more than one row.

Avoid Singleton `SELECT` When Modifying Data

When developing a program to retrieve data that must be subsequently modified, avoid using a singleton SELECT . When the selected row must be updated after it is retrieved, using a cursor with the FOR UPDATE OF clause is recommended over a singleton SELECT . The FOR UPDATE OF clause ensures the integrity of the data in the row because it causes DB2 to hold an X lock on the page containing the row to be updated. If you use a singleton SELECT , the row can be updated by someone else after the singleton SELECT but before the subsequent UPDATE , thereby causing the intermediate modification to be lost.

Use `FOR READ ONLY`

When a SELECT statement is used only for retrieval, code the FOR READ ONLY clause. This clause enables DB2 to use block fetch , which returns fetched rows more efficiently for distributed DB2 requests . Efficient row fetches are important for dynamic SQL in an application program or SPUFI . Furthermore, the FOR READ ONLY clause can be used to encourage DB2 to use lock avoidance techniques and parallelism.

QMF automatically appends FOR READ ONLY to SELECT statements. Static SQL embedded in an application program automatically uses block fetch if the BIND process determines it to be feasible .

Allowing block fetch is important in a distributed DB2 environment. If data is blocked, less overhead is required as data is sent over the communication lines.

NOTE

The FOR FETCH ONLY clause provides the same function as FOR READ ONLY , but FOR READ ONLY is preferable because it is ODBC-compliant.

Control Distributed Query Blocks Using `OPTIMIZE FOR` `n` `ROWS`

Additionally, consider using the OPTIMIZE FOR n ROWS clause to optimize network traffic. If your application opens a cursor and downloads a great amount of data, specifying a large value for n increases the number of DRDA query blocks that a DB2 server returns in each network transmission for a non-scrollable cursor. If n is greater than the number of rows that fit in a DRDA query block, OPTIMIZE FOR n ROWS lets the DRDA client request multiple blocks of query data on each network transmission instead of requesting a new block when the first block is full.

Use `DISTINCT` with Care

The DISTINCT verb removes duplicate rows from an answer set. If duplicates will not cause a problem, do not code DISTINCT because it might add to overhead if it must invoke a sort to remove the duplicates.

However, do not avoid DISTINCT for performance reasons if you must remove duplicates from your result set. It is better for DB2 to remove the duplicates than for the results to be passed to the program and then having the duplicates removed by application logic. One major benefit is that DB2 will not make any mistakes, but the application logic could contain bugs .

For example, the following SQL will return a list of all departments to which an employee has been assigned with no duplicate WORKDEPT values returned:

 SELECT  DISTINCT WORKDEPT FROM    DSN8810.EMP;

Consider Using Multiple `DISTINCT` Clauses

The DISTINCT keyword can be used at the statement level or at the column level. When used at the statement level, for example

 SELECT  DISTINCT LASTNAME, WORKDEPT FROM    DSN8810.EMP;

duplicate rows are removed from the result set. So, only one LASTNAME “ WORKDEPT combination will be returned even if multiple employees with the same last name work in the same department. When DISTINCT is used at the column level, for example

 SELECT  AVG(SALARY), COUNT(DISTINCT EMPNO) FROM    DSN8810.EMP;

duplicate values are removed for the column on which the DISTINCT keyword is used. Prior to DB2 V7 only one DISTINCT clause can be specified in a SELECT list.

With DB2 V7 you can specify multiple DISTINCT clauses, but on the same column only. For example

 SELECT  SUM(DISTINCT SALARY), COUNT(DISTINCT SALARY) FROM    DSN8810.EMP;

However, for DB2 V7 and earlier releases, you cannot specify multiple DISTINCT clauses on different columns, or you get SQLCODE “127 .

DB2 Version 8 extends the functionality of the DISTINCT clause by enabling you to code multiple DISTINCT s in the SELECT or HAVING clause of SQL statements. The retriction limiting multiple DISTINCT s to the same column is lifted with DB2 V8. This enhancement is accomplished by performing multiple sorts on multiple distinct columns. For example, the following SQL statement uses two DISTINCT clauses on two different columns and is legal as of Version 8:

 SELECT  COUNT(DISTINCT(ACTNO)),         SUM(DISTINCT(ACSTAFF)) FROM    DSN8810.PROJACT GROUP BY PROJNO;

You could even add a HAVING clause with a DISTINCT to this query as of V8, for example

 HAVING AVG(DISTINCT ACSTAFF) < 2.0;

Code Predicates on Indexed Columns

DB2 usually performs more efficiently when it can satisfy a request using an existing index rather than no index. However, indexed access is not always the most efficient access method. For example, when you request most of the rows in a table or access by a non-clustered index, indexed access can result in a poorer performing query than non-indexed access. This is so because the number of I/Os is increased to access index pages. As a general rule of thumb, keep in mind that an index enhances performance when the total I/O required to access the index pages and the specific table space pages is less than simply accessing all of the table space pages.

You can find comprehensive guidelines for the efficient creation of DB2 indexes in Chapter 6, "DB2 Indexes."

Use `ORDER BY` When the Sequence Is Important

You cannot guarantee the order of the rows returned from a SELECT statement without an ORDER BY clause. At times SQL developers get confused when DB2 uses an index to satisfy a query and the results are returned in the desired order even without the ORDER BY clause. But, due to the nature of the DB2 optimizer, the access path by which the data is retrieved might change from execution to execution of an application program. If the access path changes (or parallelism kicks in), and ORDER BY is not specified, the results can be returned in a different (non-desired) order. For this reason, always code the ORDER BY clause when the sequence of rows being returned is important.

Limit the Columns Specified in `ORDER BY`

When you use ORDER BY to sequence retrieved data, DB2 ensures that the data is sorted in order by the specified columns. Doing so usually involves the invocation of a sort (unless an appropriate index is available). The more columns that are sorted, the less efficient the query will be. Therefore, use ORDER BY on only those columns that are absolutely necessary.

Also, keep in mind that as of DB2 V6 you can ORDER BY columns that are not included in the SELECT list. And, as of Version 7, this feature is extended even if the item is an expression. So, be sure to prune the SELECT list to only the data required for the display.

Favor Stage 1 and Indexable Predicate

For SQL statements, you must consider at which stage the predicate is applied: Stage 1 or Stage 2.

NOTE

Stage 1 predicates were previously known as sargable predicates. Sargable is an IBM-defined term that stands for s earch arg u able . The term simply defines in which portion of DB2 a predicate can be satisfied. The term sargable is ostensibly obsolete and has been replaced in the IBM literature by the term Stage 1 processing .

A predicate that can be satisfied by Stage 1 processing can be evaluated by the Data Manager portion of DB2, not the Relational Data System. The Data Manager component of DB2 is at a level closer to the data than the Relational Data System. You can find a more complete description of the components of DB2 in Chapter 20, "DB2 Behind the Scenes."

Because a Stage 1 predicate can be evaluated at an earlier Stage of data retrieval, you avoid the overhead of passing data from component to component of DB2. Try to use Stage 1 predicates rather than Stage 2 predicates because Stage 1 predicates are more efficient.

Additionally, a query that can use an index has more access path options, so it can be more efficient than a query that cannot use an index. The DB2 optimizer can use an index or indexes in a variety of ways to speed the retrieval of data from DB2 tables. For this reason, try to use indexable predicates rather than those that are not.

Of course, this raises the questions: "Which predicates are Stage 1 and which are Stage 2?" and "How do I know if a predicate is indexable or not?" This information is consolidated for you in Table 2.1.

Table 2.1. Predicate Stage and Indexability

Predicate Type	Stage	Indexable
`COL = value`	1	YES
`COL = noncol expr`	1	YES
`COL IS NULL`	1	YES
`COL IS NOT NULL`	1	YES
`COL op value`	1	YES
`COL op noncol expr`	1	YES
`COL BETWEEN value1 AND value 2`	1	YES
`COL BETWEEN noncol expr1 AND noncol expr2`	1	YES
`COL BETWEEN expr1 and expr2`	1	YES
`COL LIKE 'pattern'`	1	YES
`COL IN (list)`	1	YES
`COL LIKE :HV`	1	YES
`T1.COL = T2.COL`	1	YES
`T1.COL op T2.COL`	1	YES
`COL = (non subq)`	1	YES
`COL op (non subq)`	1	YES
`COL op ANY (non subq)`	1	YES
`COL op ALL (non subq)`	1	YES
`COL IN (non subq)`	1	YES
`COL = expression`	1	YES
`(COL1, COLn) IN (non subq)`	1	YES
`(COL1, COLn) = (val1,3 valn)`	1	YES
`T1.COL = T2.col expr`	1	YES
`COL <> value`	1	NO
`COL <> noncol expr`	1	NO
`COL NOT BETWEEN value1 AND value2`	1	NO
`COL NOT BETWEEN noncol expr1 AND noncol expr2`	1	NO
`COL NOT IN (list)`	1	NO
`COL NOT LIKE ' char'`	1	NO
`COL NOT LIKE '%char'`	1	NO
`COL NOT LIKE '_char'`	1	NO
`T1.COL <> T2.COL`	1	NO
`T1.COL1 = T1.COL2`	1	NO
`COL <> (non subq)`	1	NO

If the predicate type is not listed in Table 2.1, then it is Stage 2 and non-indexable. Note that you can replace op with <= , >= , < , > , or <> . A noncol expr is a noncolumn expression; it refers to any expression in which a column of a table is not specified. Examples of such expressions include

 CURRENT TIMESTAMP - 10 DAYS :HOST-VARIABLE + 20 FLOAT(8.5)

Stage 1 predicates combined with AND , combined with OR , or preceded by NOT are also Stage 1. All others are Stage 2. Additionally, please note that a LIKE predicate ceases to be Stage 1 if the column is defined using a field procedure. Indexable predicates combined with AND or OR are also indexable. However, note that predicates preceded by NOT are not indexable.

NOTE

All indexable predicates are also Stage 1. The reverse, however, is not true: All Stage 1 predicates are not necessarily indexable.

Keep in mind, too, that a predicate defined such that it conforms to the syntax specified for Stage 1 might in fact be changed to Stage 2 by DB2. Prior to V8, this can occur because the predicate contains constants whose data type or length does not match. As of Version 8, though, many Stage 1 predicates will remain Stage 1 even if the data types are not perfect.

Additionally, Stage 1 processing and indexability are only two aspects of efficient query writing and, as such, do not guarantee the most effective way to code your query. Follow the rest of the advice in this chapter to formulate efficient SQL code.

Finally, do not read more into this guideline than is intended. I am not saying you should never use Stage 2 predicates. Feel free to code Stage 2 predicates when necessary based on your application requirements. Using Stage 2 predicates is much preferable to returning the data to the program and filtering it there.

CAUTION

This information is accurate as of DB2 Version 8. Determine which predicates are Stage 1 and indexable with care because IBM tends to change certain predicates' stage and indexability with each release of DB2.

Reformulate SQL to Achieve Indexability

Remember that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement:

 SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME FROM   DSN8810.EMP WHERE  MIDINIT NOT BETWEEN 'A' AND 'G';

It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as

 SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME FROM   DSN8810.EMP WHERE  MIDINIT >= 'H';

Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H' . Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A' .

Try to Avoid Using `NOT` (Except with `EXISTS` )

In older versions of DB2, predicates using NOT were non-indexable and Stage 2. As of DB2 V4, predicates formed using NOT are evaluated at Stage 1, but they are still non-indexable. Therefore, whenever possible, you should recode queries to avoid the use of NOT ( <> ). Take advantage of your understanding of the data being accessed. For example, if you know that no values are less than the value that you are testing for inequality, you could recode

 COLUMN1  <>  value

 COLUMN1  >=  value

See the section on complex SQL guidelines for guidance in the use of the EXISTS predicate.

Use Equivalent Data Types

Use the same data types and lengths when comparing column values to host variables or literals. This way, you can eliminate the need for data conversion. Because the data type or length does not match, DB2 evaluates the predicate as Stage 2 (even if the predicate could be Stage 1 if the data type and length matched).

For example, comparing a column defined as INTEGER to another column defined as INTEGER is more efficient than comparing an INTEGER column to a column defined as DECIMAL(5,2) . When DB2 must convert data, available indexes are not used.

DB2 also does not use an index if the host variable or literal is longer than the column being compared, or if the host variable has a greater precision or a different data type than the column being compared. This situation adversely affects performance and should be avoided.

As of Version 7, you are allowed to CAST numeric equi-join columns to match in order to preserve Stage 1. But, if you can wait for Version 8, all you will have to do is REBIND because V8 resolves many of the data type mismatch problems. Until then, though, use CAST .

Use a CAST function to resolve data type mismatches for date and numeric values to avoid demotion to Stage 2. For example, if you need to compare a DECIMAL(9,2) column to a SMALLINT , cast the integer value to a decimal value as follows :

 WHERE  DECIMAL(SMALLINTCOL, 9, 2) = DECIMALCOL

Be sure to CAST the column belonging to the larger result set if both columns are indexed. That is, the column that can take on more distinct values should be the one cast. However, if only one column is indexed, CAST the one that is not indexed. You will need to rebind in order to receive the promotion to Stage 1.

As of DB2 V6, and via a retrofit APAR to V5, DB2 partially alleviated the data type and length mismatch performance problem, but only for character data. When two character columns are specified in an equi-join predicate, they no longer need to be of the same length to be considered Stage 1 and indexable. Please note that this applies only to columns, not host variables or string literals. Also, note that the two columns being compared must be of CHAR or VARCHAR data type. For example, you cannot join an INTEGER column to a SMALLINT column and expect it to be Stage 1 or indexable (for DB2 V7 or earlier).

DB2 V8 provides even more relief for this problem, but only within the data type family. This means that numbers can match numbers, characters match characters , and so on, without having to be an exact type and length match. In other words, as long as you are close to matching DB2 will evaluate the predicate in Stage 1 (if you are running DB2 Version 8 or greater). Even though this is now the case, it is still wise to exactly match up the data type and length of all column values, host variables, and literals within your predicates.

Consider `BETWEEN` Instead of `<=` and `>=`

The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate ( <= ) and the greater than or equal to predicate ( >= ). In past releases it was also more efficient, but now the optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is much easier to understand and maintain than multiple <= and >= predicates. For this reason, favor using BETWEEN .

However, there is one particular instance where this guidelines does not apply ”when comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:

 WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2

However, it is possible to use BETWEEN to compare one value to two columns, as shown:

 WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2

This statement should be changed to

 WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2

The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1.

Consider `IN` Instead of `LIKE`

Whenever feasible, use IN or BETWEEN instead of LIKE in the WHERE clause of a SELECT . If you know that only a certain number of occurrences exist, using IN with the specific list usually is more efficient than using LIKE . For example, use

 IN ('VALUE1', 'VALUE2', 'VALUE3')

instead of

 LIKE 'VALUE_'

The functionality of LIKE can be imitated using a range of values. For example, if you want a query to retrieve all employees with a last name beginning with K , you know that last names between KAAAAAAAAAAA and KZZZZZZZZZZZZ also satisfy the request. To optimize performance, favor using

 BETWEEN :VALUE_LO AND :VALUE_HI

instead of

 LIKE 'VALUE%'

Formulate `LIKE` Predicates with Care

Avoid using the LIKE predicate when the percentage sign ( % ) or the underscore ( _ ) appears at the beginning of the comparison string because they prevent DB2 from using a matching index. The LIKE predicate can produce efficient results, however, when you use the percentage sign or underscore at the end or in the middle of the comparison string.

Not Okay	Okay
`LIKE %NAME`	`LIKE NAME%`
`LIKE _NAME`	`LIKE NA_ME`

DB2 does not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. DB2 can determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Specify Appropriate Host Variable Values with `LIKE`

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately ”especially when LIKE is used with host variables.

Let's assume that you need to create an application that retrieves employees by last name, but the supplied value for the last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

 SELECT EMPNO, FIRSTNME, LASTNAME FROM   DSN8810.EMP WHERE  LASTNAME LIKE :host_variable;

In order for this to work, when you enter the value for host_variable always append percent signs ( % ) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM , the host_variable should contain SM%%%%%%%% and if the value entered is SMITH , the host_variable should contain SMITH%%%%% . Append as many percent signs as required to fill up the entire length of the host variable. Failure to do this will result in DB2 searching for blank spaces. Think about it ”if you assign SMITH% to a 10-byte host variable, that host variable will think it should search for SMITH% , that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for SMITH%%%%% , SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH ). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

Code Most Restrictive Predicate First (Within Predicate Type)

DB2 uses a predefined method for evaluating SQL predicates. The sequence in which predicates are evaluated is dependent upon four different factors:

The indexes being used
Whether the predicate is Stage 1 or Stage 2
The type of predicate (for example, = , > , < , BETWEEN , and so on)
The sequence in which the predicates are physically coded in the SQL statement

First, DB2 will apply the predicates that match the indexes selected in the access path. The sequence in which these predicates are applied is based on the order of the column in the index. So, you must design efficient indexes to impact performance (see Chapter 5, "Data Definition Guidelines," for more information on efficient index design).

After applying matching index predicates, DB2 then applies

Stage 1 predicates that were not chosen as matching predicates but still refer to index columns, followed by
Stage 1 predicates in columns that were not in the indexes being used, and then
any Stage 2 predicates

Within each of these three groups, the sequence in which predicates are evaluated is based on the predicate type and the sequence in which the predicate appears in the SQL statement.

Predicate types are applied in the following sequence:

All equality predicates (including column IN -list, where list has only one element)
All range predicates and predicates specifying column IS NOT NULL
All other predicate types, but non-correlated subqueries are processed before correlated subqueries

Due to the preceding set of rules, when you code predicates in your SELECT statement, place the predicate that will eliminate the greatest number of rows first (within predicate type). For example, consider the following statement:

 SELECT  EMPNO, FIRSTNME, LASTNAME FROM    DSN8810.EMP WHERE   WORKDEPT = 'D21' AND     SEX = 'F';

Suppose that the WORKDEPT has 10 distinct values. The SEX column obviously has only 2 distinct values. Because both are equality predicates, the predicate for the WORKDEPT column should be coded first (as shown) because it eliminates more rows than the predicate for the SEX column. The performance gain from predicate placement is usually minimal, but sometimes every little performance gain is significant.

CAUTION

Remember, this guideline is true only for like predicate types. If the predicates are not of the same type, the guideline is not applicable .

Use Predicates Wisely

By reducing the number of predicates on your SQL statements, you might be able to achieve better performance in two ways:

Reduced BIND time due to fewer options that must be examined by the DB2 optimizer.
Reduced execution time due to a smaller path length caused by the removal of redundant search criteria from the optimized access path. DB2 processes each predicate coded for the SQL statement. Removing predicates removes work, and less work equals less time to process the SQL.

However, if you remove predicates from SQL statements, you run the risk of changing the data access logic. So, remove predicates only when you're sure that their removal will not have an impact on the query results. For example, consider the following query:

 SELECT  FIRSTNME, LASTNAME FROM    DSN8810.EMP WHERE   JOB = 'DESIGNER' AND     EDLEVEL >= 16;

This statement retrieves all rows for designers who are at an education level of 16 or above. But what if you know that the starting education level for all designers in an organization is 16? No one with a lower education level can be hired as a designer. In this case, the second predicate is redundant. Removing this predicate does not logically change the results, but it might enhance performance.

On the other hand, performance possibly can degrade when you remove predicates. The DB2 optimizer analyzes correlation statistics when calculating filter factors. Examples of correlated columns include CITY and STATE ( Chicago and Illinois are likely to occur together); FIRST_NAME and GENDER ( Robert and male are likely to occur together).

Because the filter factor might change when a predicate is changed or removed, a different access path can be chosen. That access path might be more (or less) efficient than the one it replaces . The basic rule is to test the SQL both ways to determine which will perform better for each specific statement.

Truly "knowing your data," however, is imperative. For example, it is not sufficient to merely note that for current rows in the EMP table no designers are at an EDLEVEL below 16. This may just be a data coincidence . Do not base your knowledge of your data on the current state of the data, but on business requirements. You must truly know that a correlation between two columns (such as between JOB and EDLEVEL ) actually exists before you modify your SQL to take advantage of this fact.

In any case, whenever you make changes to SQL statements based on your knowledge of the data, be sure to document the reason for the change in the actual SQL statement using SQL comments. Good documentation practices make future tuning, maintenance, and debugging easier.

Be Careful with Arithmetic Precision

When you select columns using arithmetic expressions, be careful to ensure that the result of the expression has the correct precision. When an arithmetic expression operates on a column, DB2 determines the data type of the numbers in the expression and decides the correct data type for the result. Remember the following rules for performing arithmetic with DB2 columns:

DB2 supports addition, subtraction, multiplication, and division.
DATE , TIME , and TIMESTAMP columns can be operated on only by means of addition and subtraction. (See the section "Use Date and Time Arithmetic with Care" later in this chapter.)
Floating-point numbers are displayed in scientific notation. Avoid using floating-point numbers because scientific notation is difficult for some users to comprehend. DECIMAL columns can contain as many as 31 bytes of precision, which is adequate for most users.
When an arithmetic expression operates on two numbers of different data types, DB2 returns the result using the data type with the highest precision. The only exception to this rule is that an expression involving two SMALLINT columns is returned as an INTEGER result.

The last rule may require additional clarification . When DB2 operates on two numbers, the result of the operation must be returned as a valid DB2 data type. Consult the following chart to determine the result data type for operations on any two numbers in DB2:

Statement	Yields
`SMALLINT` `operator` `SMALLINT`	INTEGER
`SMALLINT` `operator` `INTEGER`	INTEGER
`SMALLINT` `operator` `DECIMAL`	DECIMAL
`SMALLINT` `operator` `FLOAT`	FLOAT
`INTEGER` `operator` `SMALLINT`	INTEGER
`INTEGER` `operator` `INTEGER`	INTEGER
`INTEGER` `operator` `DECIMAL`	DECIMAL
`INTEGER` `operator` `FLOAT`	FLOAT
`DECIMAL` `operator` `SMALLINT`	DECIMAL
`DECIMAL` `operator` `INTEGER`	DECIMAL
`DECIMAL` `operator` `DECIMAL`	DECIMAL
`DECIMAL` `operator` `FLOAT`	FLOAT
`FLOAT` `operator` `ANY DATA TYPE`	`FLOAT`

For example, consider the following SELECT :

 SELECT  EMPNO, EDLEVEL/2, SALARY/2 FROM    DSN8810.EMP WHERE   EMPNO BETWEEN '000250' AND '000290';

This statement returns the following results:

  EMPNO             COL1             COL2  000250             7              9590.00000000 000260             8              8625.00000000 000270             7             13690.00000000 000280             8             13125.00000000 000290             6              7670.00000000 DSNE610I NUMBER OF ROWS DISPLAYED IS 5

Because EDLEVEL is an INTEGER and 2 is specified as an INTEGER , the result in COL1 is truncated and specified as an INTEGER . Because SALARY is a DECIMAL column and 2 is specified as an INTEGER , the result is a DECIMAL . If you must return a more precise number for COL1, consider specifying EDLEVEL/2.0 . The result is a DECIMAL because 2.0 is specified as a DECIMAL .

As of DB2 V7, you can use the MULTIPLY_ALT function to assure precision. MULTIPLY_ALT is preferable to the multiplication operator when performing decimal arithmetic where a scale of at least 3 is desired and the sum of the precisions exceeds 31. With MULTIPLY_ALT , DB2 performs the internal computation avoiding overflows. For example, the result of the following expression is the value of COL1 multiplied by the value of :HV2 :

 MULTIPLY_ALT(COL1, :HV2)

The precision of the result will be the precision of each argument added together (unless that is greater than 31, in which case, the precision will be 31). The scale of the result is determined as follows:

If the scale of both arguments is 0, the scale of the result will be 0.
If the sum of the two precisions added together is less than or equal to 31, the scale of the result will be either the two scales added together or 31, whichever is smaller.
If the sum of the two precisions added together is greater than 31, the scale of the result will be determined using the following formula:
```
 
```
```
 MAX( MIN(3, s1+s2), 31-(p1-s1+p2-s2) ) 
```
where s1 is the scale of the first argument, and so on.

Use Column Renaming with Arithmetic Expressions and Functions

You can use the AS clause to give arithmetic expressions a column name, as follows:

 SELECT  EMPNO, EDLEVEL/2 AS HALF_EDLEVEL, SALARY/2 AS HALF_SALARY FROM    DSN8810.EMP WHERE   EMPNO BETWEEN '000250' AND '000290';

If you give expressions a descriptive name, SQL becomes easier to understand and maintain. Likewise, when specifying functions in the SELECT list, use the AS clause to give the new column a name.

Decimal Precision and Scale

The precision of a decimal number is the total number of digits in the number (do not count the decimal point). For example, the number 983.201 has a precision of 6. The scale of a decimal number is equal to the number of digits to the right of the decimal point. In the previous example, the scale is 3.

Avoid Arithmetic in Column Expressions

An index is not used for a column when the column participates in an arithmetic expression. For example, the predicate in the following statement is non-indexable:

 SELECT  PROJNO FROM    DSN8810.PROJ WHERE   PRSTDATE - 10 DAYS = :HV-DATE;

You have two options to make the predicate indexable. You can switch the arithmetic to the non-column side of the predicate. For example

 SELECT  PROJNO FROM    DSN8810.PROJ WHERE   PRSTDATE = DATE(:HV-DATE) + 10 DAYS;

It makes no logical difference whether you subtract 10 from the column on the left side of the predicate, or add 10 to the host variable on the right side of the predicate. However, it makes a big performance difference because DB2 can use an index to evaluate non-column arithmetic expressions.

Alternatively, you can perform calculations before the SQL statement and then use the result in the query. For example, you could recode the previous SQL statement as this sequence of COBOL and SQL:

 ADD +10 TO HV-DATE.  COBOL  SELECT  PROJNO  SQL  FROM    DSN8810.PROJ WHERE   PRSTDATE = :HV-DATE;

In general, though, it is wise to avoid arithmetic in predicates altogether, if possible. In this case, however, we are dealing with date arithmetic, which can be difficult to emulate in a program.

The fewer arithmetic expressions in the SQL statement, the easier it is to understand the SQL. Furthermore, if arithmetic is avoided in SQL, you do not need to remember the exact formulations which are indexable and Stage 1. For these reasons, favor performing arithmetic outside of the SQL when possible.

Use the Dummy Table to Select Data not in a DB2 Table

Sometimes you will need to use DB2 facilities to retrieve data that is not stored in a DB2 table. This can be a challenge until you learn about the dummy table, SYSIBM.SYSDUMMY1 .

Why would you want to SELECT data that is not stored in a DB2 table? Well, perhaps you need to use a function that does not require DB2 table data. One such function is RAND, which is used to return a random number. To use this function, you can select it from the dummy table as follows:

 SELECT RAND(:HOSTVAR) FROM   SYSIBM.SYSDUMMY1;

The dummy table is part of the DB2 Catalog and is available to all DB2 installations.

NOTE

Take care when using the RAND function to generate a random value. To get a random value every time, use RAND() , without the host variable. If using a host variable, supplying it with the same value will cause RAND always to return the same random value. Of course, this can be useful if you want consistent random values to be generated.

Use Date and Time Arithmetic with Care

DB2 enables you to add and subtract DATE , TIME , and TIMESTAMP columns. In addition, you can add date and time durations to or subtract them from these columns.

Use date and time arithmetic with care. If users understand the capabilities and features of date and time arithmetic, they should have few problems implementing it. Keep the following rules in mind:

When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the date arithmetic statement
```
 
```
```
 DATE('2004/04/03') - 1 MONTH 
```
is not equivalent to the statement
```
 
```
```
 DATE('2004/04/03') - 30 DAYS 
```
April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 2004/03/03, but the result of the second statement is 2004/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.
If one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.
If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.
Date durations are expressed as a DECIMAL(8,0) number. The valid date durations are

DAY

DAYS

MONTH

MONTHS

YEAR

YEARS
Time durations are expressed as a DECIMAL(6,0) number. The valid time durations are

HOUR

HOURS

MINUTE

MINUTES

SECOND

SECONDS

MICROSECOND

MICROSECONDS

Additional guidelines on handling date and time data in DB2 are provided in the "Date and Time Guidelines" section later in this chapter.

Use Built-in Functions Where Available

DB2 comes with more than 120 built-in functions that can be used in SQL statements to transform data from one state to another. Use the built-in functions instead of performing similar functionality in your application programs.

Prior to Version 6, DB2 provided only a minimal set of built-in functions. As such, developers needed to write their own work-arounds to achieve certain functionality. For example, previous editions of this book recommended using the following logic to return a day of the week

 DAYS(CURRENT DATE) - (DAYS(CURRENT DATE)/7) * 7

However, DB2 now provides a DAYOFWEEK function that is easier to use and understand than this expression. I do not recommend going back to your old programs and retrofitting them to use the new functions because the manpower required would be excessive and the return would be marginal. However, for all new and future SQL, use the built-in functions. For more information on the built-in functions available to DB2 consult Chapter 3, "Using DB2 Functions."

NOTE

As of V7, DB2 also provides the DAYOFWEEK_ISO function. This function is similar to the DAYOFWEEK function, but results in different numbers to represent the days of the week. For DAYOFWEEK_ISO , the result is a number where 1 represents Monday, 2 Tuesday, 3 Wednesday, 4 Thursday, 5 Friday, 6 Saturday, and 7 Sunday. For DAYOFWEEK , the resulting value is a number where 1 represents Sunday, 2 Monday, 3 Tuesday, 4 Wednesday, 5 Thursday, 6 Friday, and 7 Saturday.

Unfortunately, neither of the results matches the results of the old formula ( for Sunday, 1 for Monday and so on), as returned by

 DAYS(CURRENT DATE) - (DAYS(CURRENT DATE)/7) * 7)

Limit the Use of Scalar Functions in `WHERE` Clauses

For performance reasons, you can try to avoid using scalar functions referencing columns in WHERE clauses, but do not read too much into this recommendation. It is still wise to use scalar functions to offload work from the application to DB2. But remember that an index is not used for columns to which scalar functions are applied. Scalar functions typically can be used in the SELECT list of SQL statements with no performance degradation.

Specify the Number of Rows to Be Returned

When you code a cursor to fetch a predictable number of rows, consider specifying the number of rows to be retrieved in the OPTIMIZE FOR n ROWS clause of the CURSOR . This way, DB2 can select the optimal access path for the statement based on actual use.

Coding the OPTIMIZE FOR n ROWS clause of the CURSOR does not limit your program from fetching more than the specified number of rows.

This statement can cause your program to be inefficient, however, when many more rows or many fewer rows than specified are retrieved. So be sure you specify a reasonable estimate for the number of rows to be returned if you code this clause.

Disable List Prefetch Using `OPTIMIZE FOR 1 ROW`

If a particular query experiences sub-optimal performance due to list prefetch, consider specifying OPTIMIZE FOR 1 ROW . Doing so makes it less likely that DB2 will choose an access path that uses list prefetch. This capability might be of particular use in an online environment in which data is displayed to the end user a screen at a time.

NOTE

Keep in mind that there is a difference between OPTIMIZE FOR 1 ROW and OPTIMIZE FOR n ROWS (were n is greater than 1). OPTIMIZE FOR 1 ROW tries to avoid sorts; OPTIMIZE FOR n ROWS will try to use the least cost access path for n .

Disable Index Access

During the tuning process, you can append OR 0 = 1 to a predicate to eliminate index access. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT .

 SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY FROM    DSN8810.EMP WHERE   EMPNO BETWEEN '000020' AND '000350' AND     (WORKDEPT > 'A01' OR 0 = 1);

In this case, the 0 = 1 prohibits DB2 from choosing the WORKDEPT index by making the predicate Stage 2. This forces DB2 to use either the index on EMPNO or a table space scan. Similar techniques include adding 0 to a numeric column or appending a null string to a character column to avoid indexed access. The latter is preferred because it disables matching index access but leaves the predicate Stage 1. For example:

 SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY FROM    DSN8810.EMP WHERE   EMPNO BETWEEN '000020' AND '000350' AND     WORKDEPT > 'A01' CONCAT '';

Consider Other Forms of Query Tweaking

Both OPTIMIZE FOR 1 ROW and using OR 0=1 are valid query tweaks for specific types of tuning. The following techniques can be used to tweak queries to try to encourage DB2 to use different access paths:

OPTIMIZE FOR n ROWS : Note that the n can be any value.

FETCH FIRST n ROWS ONLY : Again, where n can be any value.

No Operation ( +0 , -0 , /1 , *1 , CONCAT '' ) : Adding or subtracting zero, dividing or multiplying by 1, or concatenating an empty string will not change the results of a query but might change the optimizer's decision.

These techniques can cause DB2 to choose a different access path. Consider using them when you are in a jam and need to try different types of access. Compare and contrast the results and costs of each scenario to determine which might be most useful to your particular situation.

Although non-column expressions are indexable (at least as of DB2 V5), IBM has made an exception for the "no operation" expressions because they are used as tricks to fool the optimizer. IBM did not include these expressions because these tricks were deployed by DB2 developers to avoid indexed access for more than a decade . An example SQL statement using one of these tricks follows:

 SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY FROM    DSN8810.EMP WHERE   EMPNO < :HOST-VAR CONCAT '';

In this case, a table space scan is used because an empty string is concatenated to the host variable in the predicate and no other predicates are available for indexed access. However, the predicate remains Stage 1.

Consider Using `REOPT` to Change Access Paths

When SQL is bound into a plan or package you can specify whether to have DB2 determine an access path at runtime using values for host variables, parameter markers, and special registers. This is achieved using the REOPT (or NOREOPT ) parameter.

Specifying NOREOPT(VARS) will cause DB2 to determine access paths at BIND time, and not at runtime. Instead, specifying REOPT(VARS) will cause DB2 to redetermine the access path at runtime. Consider choosing this option if performance fluctuates based on the values supplied to host variables and parameter markers when your program is run.

DB2 V8 introduces another useful REOPT option, REOPT(ONCE) . Consider choosing REOPT(ONCE) instead of REOPT(VARS) when you wish to avoid constantly recalculating access paths at a PREPARE . With REOPT(ONCE) , access path selection will be deferred until the cursor is opened. The host variable and parameter marker values at cursor OPEN time will be used to determine the access path. The resultant access path will be cached in the global prepare cache.

Be Aware of Table Space Partitioning Key Ranges

When you access data in partitioned table spaces, be aware of the values used for the partitioning scheme. Prior to V4, DB2 scanned the entire table in a table space scan of a partitioned table. As of DB2 V4, you can limit a table space scan to accessing a subset of the partitions if the predicates of the WHERE clause can be used to limit the key ranges that need to be scanned. As of DB2 V5, the key ranges do not have to be contiguous.

For this technique to work with host variables you must BIND using the REOPT(VARS) parameter.

Specify Isolation Level for Individual SQL Statements

You can use the WITH clause to specify an explicit isolation level at the SQL statement level. Four options are available:

WITH RR : Repeatable Read

WITH RS : Read Stability

WITH CS : Cursor Stability

WITH UR : Uncommitted Read (can be specified only if the result table is read-only)

Sometimes it makes sense to change the isolation level of an SQL statement within a program, without changing the isolation level of the other SQL statements in the program. For example, one query might be able to tolerate a dirty read because the data is being aggregated and only an estimated result is required. In this case, that query can be specified as WITH UR , even though the package for the program is bound as ISOLATION(CS) .

Use the WITH clause when you need to change the isolation level for specific SQL statements within a package or plan. More information on isolation levels is provided in Chapter 13, "Program Preparation."

CAUTION

Do not confuse the use of WITH to specify isolation levels with the use of WITH to specify common table expressions. Common table expressions were introduced with DB2 Version 8 and are placed at the beginning of SQL statements. SQL statement isolation levels are placed at the end of SQL statements.

An example of a common table expression is given later in this chapter in the section titled "Common Table Expressions and Recursion."

Consider `KEEP UPDATE LOCKS` to Serialize Updates

The KEEP UPDATE LOCKS clause can be specified for RR and RS isolation levels. With KEEP UPDATE LOCKS , DB2 acquires X locks instead of U or S locks on all qualified rows or pages. Use this option to serialize updates when concurrency is not an issue.

Use SQL Assist to Help Build SQL Statements

SQL Assist is a new feature of DB2 Version 7 that can greatly assist SQL developers. The SQL Assist feature is a GUI-driven tool to help you build SQL statements like SELECT , INSERT , UPDATE , and DELETE . Simply by picking and clicking SQL Assist will build syntactically correct SQL statements. Refer to Figure 2.2 for an example of SQL Assist. It is accessible from the following "products":

Control Center
DB2 Development Center (a.k.a Stored Procedure Builder)
Data Warehouse Center

Figure 2.2. SQL Assist graphically guides SQL creation.

SQL Assist can be used to help developers build correct SQL statements, but it does not provide SQL performance advice. Using SQL Assist, however, can help to promote using a standard format for SQL statements within your organization.

< Day Day Up >

Pretest All Embedded SQL

Use EXPLAIN

Use All PLAN_TABLE Columns Available

Use the DSN_STATEMNT_TABLE

Use DSN_FUNCTION_TABLE to Explain User-Defined Function Usage

Enable EXPLAIN for Auto Rebind

When Data Sharing Specify COEXIST for Auto Rebind

Utilize Visual Explain and Query Analysis Tools

Figure 2.1. Visual Explain graphically depicts an EXPLAIN .

Avoid SELECT *

Limit the Data Selected

Do Not Ask for What You Already Know

Singleton SELECT Versus the Cursor

Avoid Singleton SELECT When Modifying Data

Use FOR READ ONLY

Control Distributed Query Blocks Using OPTIMIZE FOR n ROWS

Use DISTINCT with Care

Consider Using Multiple DISTINCT Clauses

Code Predicates on Indexed Columns

Use ORDER BY When the Sequence Is Important

Limit the Columns Specified in ORDER BY

Favor Stage 1 and Indexable Predicate

Table 2.1. Predicate Stage and Indexability

Reformulate SQL to Achieve Indexability

Try to Avoid Using NOT (Except with EXISTS )

Use Equivalent Data Types

Consider BETWEEN Instead of <= and >=

Consider IN Instead of LIKE

Formulate LIKE Predicates with Care

Specify Appropriate Host Variable Values with LIKE

Code Most Restrictive Predicate First (Within Predicate Type)

Use Predicates Wisely

Be Careful with Arithmetic Precision

Use Column Renaming with Arithmetic Expressions and Functions

Decimal Precision and Scale

Avoid Arithmetic in Column Expressions

Use the Dummy Table to Select Data not in a DB2 Table

Use Date and Time Arithmetic with Care

Use Built-in Functions Where Available

Limit the Use of Scalar Functions in WHERE Clauses

Specify the Number of Rows to Be Returned

Disable List Prefetch Using OPTIMIZE FOR 1 ROW

Disable Index Access

Consider Other Forms of Query Tweaking

Consider Using REOPT to Change Access Paths

Be Aware of Table Space Partitioning Key Ranges

Specify Isolation Level for Individual SQL Statements

Consider KEEP UPDATE LOCKS to Serialize Updates

Use SQL Assist to Help Build SQL Statements

Figure 2.2. SQL Assist graphically guides SQL creation.

Use `EXPLAIN`

Use All `PLAN_TABLE` Columns Available

Use the `DSN_STATEMNT_TABLE`

Use `DSN_FUNCTION_TABLE` to Explain User-Defined Function Usage

Enable `EXPLAIN` for `Auto Rebind`

When Data Sharing Specify `COEXIST` for `Auto Rebind`

Figure 2.1. Visual Explain graphically depicts an `EXPLAIN` .

Avoid `SELECT *`

Singleton `SELECT` Versus the Cursor

Avoid Singleton `SELECT` When Modifying Data

Use `FOR READ ONLY`

Control Distributed Query Blocks Using `OPTIMIZE FOR` `n` `ROWS`

Use `DISTINCT` with Care

Consider Using Multiple `DISTINCT` Clauses

Use `ORDER BY` When the Sequence Is Important

Limit the Columns Specified in `ORDER BY`

Try to Avoid Using `NOT` (Except with `EXISTS` )

Consider `BETWEEN` Instead of `<=` and `>=`

Consider `IN` Instead of `LIKE`

Formulate `LIKE` Predicates with Care

Specify Appropriate Host Variable Values with `LIKE`

Limit the Use of Scalar Functions in `WHERE` Clauses

Disable List Prefetch Using `OPTIMIZE FOR 1 ROW`

Consider Using `REOPT` to Change Access Paths

Consider `KEEP UPDATE LOCKS` to Serialize Updates