Query Prepping | SQL Performance Tuning

In this part of the chapter, we will discuss how to "prep" a querythat is, how to pave the way, create the right environment, and optimize the conditions so that the query will run well. We'll follow a query through all its phases: making the statement, getting metadata, setting result set flags, fetching, and using the getXXX methods .

We know this is a lot of trouble for one little SELECT. For many situations you'd be better off following this adviceJust do it! Our painstaking approach is an illustration of all the possible optimizations at each stage, rather than an order that you must actually follow for your own queries. Once you've scanned through our illustration, though, you'll find many occasions to employ a subset of these steps.

Query Statement

We begin our illustration with the step of writing an SQL statement. You can make a SELECT statement with Statement and execute it with executeQuery . Here's an example:

 Statement stmt1 = con.CreateStatement(); ResultSet rs1 = stmt1.executeQuery(                 "SELECT * FROM Table1"); ... Statement stmt2 = con.CreateStatement(); ResultSet rs2 = stmt2.executeQuery(                 "SELECT * FROM Table2");

Stop. Why did we make two stmt s? Are they independent?

If stmt2 doesn't have to follow stmt1 , then you can set up a second thread and run it in parallel. The idea of "Use two threads" is applicable in practically any case; we'll just mention it once and leave repetitions to your imagination from now on. Note, however, that not all ODBC drivers are reentrant"Use two threads" is inappropriate if you're using a JDBC-ODBC bridge.

If stmt2 does have to follow stmt1 , then you should not make stmt2 at all. Instead, just reuse stmt1 , like this:

 Statement stmt1 = con.CreateStatement(); ResultSet rs1 = stmt1.executeQuery(                 "SELECT * FROM Table1"); ... ResultSet rs2 = stmt1.executeQuery(                 "SELECT * FROM Table2");

We're more concerned about reusing objects than a Java programmer normally is for two reasons:

Garbage collection won't do a complete cleanup because createStatement causes objects to be made outside the program, in the driver and in the DBMS.
By re-using stmt1 , we've caused an automatic close of the existing result set. A close must happen; otherwise , memory leaks are possible.

Query Syntax

Use consistent, standard, trick-free SQL syntax. You can, of course, use {escape} additions for portability. Some drivers try to cache recently used SQL statements, so it's handy if they can find the times that the syntax is duplicated .

getBestRowIdentifier

Several methods in the DatabaseMetaData class can be called "catalog functions" because each one has an exact analog among the ODBC catalog functions. The two most interesting methods are getIndexInfo , which is equivalent to the ODBC function SQLStatistics , and getBestRowIdentifier , which is equivalent to ODBC's SQLSpecialColumns .

We discussed the importance of SQLSpecialColumns in Chapter 12, "ODBC," so we'll just add two remarks about the getBestRowIdentifier method here.

It's possible to over-depend on getBestRowIdentifier . The best search will depend not only on the identified column, but also on the desired value. For example, sometimes the optimizer will decide that an index called Index1 is the best indexbut if you search for a very common value, then the optimizer may decide not to use the index. The getBestRowIdentifier data doesn't reflect such transient considerations.
getBestRowIdentifier is expensive because all catalog functions are expensive.

The general tips for all catalog functions are:

Avoid search patterns or blank string arguments.
Cache the results.
Make sure you need to know the information before using a catalog function.

If you use getBestRowIdentifier in order to give the user a choice of columns to use for searching, you will probably have to call getIndexInfo and getColumns as well. But you can get most of the data using a single call if you know how the DBMS stores catalog metadata.

Remember that the best row identifier is often the DBMS-specific ROWID pseudocolumn. This will not appear in a list created by getColumns .

ResultSetMetaData

You should retrieve information about a result set before you get the result set itself. You can do this with a prepared statement. Here's an example:

 PreparedStatement pstmt = con.preparedStatement(                          "SELECT * FROM Table1"); ResultSetMetaData rsmd = pstmt.getMetaData(); int i = rsmd.getColumnCount();

This example would be a failure if "preparing" and "executing" took the same amount of time. Luckily, most DBMSs will prepare only when you ask them to prepare. So this goes quickly.

Once you have an rsmd and a column count, you can check each individual column to see whether there should be some change in the strategy. The important methods for this purpose are:

getColumnType

Retrieves a number indicating the data type. For example, if the column is VARCHAR, the return is 12 .
getPrecision

For character and binary columns, this call retrieves the column size .
isNullable

Returns true if the column can contain NULL.

The decisions you can make with the type, precision, and nullable information include:

Is this column really necessary during the initial display?

This criterion is important especially if the column is a BLOB or other large object data type. Remember that even a small BLOB requires extra access time if it's not stored with the rest of the row. Some Web applications will retrieve all the non-BLOBs, then go back and get the BLOBs later if the job is idle or the user clicks for more.
Is the entire value of the column needed?

This criterion is important especially if the column is long. If your first display is on an 80-column-wide screen and you see that column1 is defined as VARCHAR(2000), it's best to retrieve something like:
```
 SELECT SUBSTRING(column1 FROM 1 FOR 80)    FROM Table1 
```
rather than:
```
 SELECT column1    FROM Table1 
```
If the column is a CHAR but still looks a bit large, a TRIM function is appropriate because DBMSs like Oracle will otherwise return the entire columnincluding trailing spaces. If you don't want to redo the query but you perceive that lengths are too great, you can limit the returnable size for all character or binary columns with:
```
 pstmt.setMaxFieldSize(512); 
```
Do this before attaching any parameters to the pstmt .
Is it necessary to test for NULLs?

The isNullable method will tell you whether you have to test for NULLs every time you use a getXXX method later, or whether you can skip that step because the column is non-nullable anyway. This is the only valid test for nullable columns. Do not depend on information returned by the getColumns catalog functiona column's defined nullability is irrelevant if the query contains an outer join. Besides, any call to a catalog function is going to be more expensive than a call to ResultSetMetaData .

Bottom line: There is still time to abort or revise the query at this stage. All that's happened so far is a prepare. If you decide to continue, keep the rsmd . There are other rsmd methods that are useful for displays.

Dummy Queries

Some DBMSs cannot do a "prepare" alone. They have to go through an "execute" phase as well in order to get result set metadata. That would wreck our planwe're not ready to look at a result set yet. The traditional work around for this problem is to execute a dummy query. A dummy query is a SELECT statement that you run solely in order to retrieve the ResultSetMetaData information. It is easy to spot because it contains an always- false WHERE clause, for example:

 SELECT column1, column2    FROM Table1    WHERE 0 = 1

It might help to replace the literal with an indexed column, like this:

 SELECT column1, column2    FROM Table1    WHERE column1 < <some impossible value>

But the best thing you can do is replace dummy queries with real ones. With any of the Big Eight, you can skip the dummy-query phase and prepare the SQL statement that you really intend to execute. (Unfortunately we can't say the same for drivers. We know of one commercial driver that has trouble using a prepared Statement for getting ResultSetMetaData .)

Query Settings

Some settings apply to stmt or pstmt objects. These settings affect the way that result sets work. In this section, we'll look at three of them: Scroll Type , Concurrency Type , and FetchSize .

Scroll Type

The Scroll Type setting has three options:

 con.createStatement("SELECT * FROM Table1",                     ResultSet.TYPE_FORWARD_ONLY); con.createStatement("SELECT * FROM Table1",                     ResultSet.TYPE_SCROLL_INSENSITIVE); con.createStatement("SELECT * FROM Table1",                     ResultSet.TYPE_SCROLL_SENSITIVE);

The default scroll type is TYPE_FORWARD_ONLY, and that's definitely the way to leave it if you can tolerate going through the result set from start to end, without skipping or retreating. The other two scroll types are scroll cursors and are both expensive and hard to implement. You will find that scroll cursors are troublesome with some DBMSs.

INSENSITIVE cursors are for high concurrency situations, and their performance is good if the transactions are read-only. What happens behind the scenes is that somebody (probably the JDBC driver) either creates a temporary local table on the server, or fetches the entire result set into a local cache off the server. The effect is that the initial query execution, or possibly the first fetch after the query execution, will be slow. On the other hand, all subsequent fetches will be blazingly fast because it's all coming out of a nearby cache. Don't use an insensitive cursor for a single-row query, because the driver will waste time copying to a cache when it could send directly to you. And don't use an insensitive cursor if the query contains LOBs, because caches lose their value if they overflow the available memory.

SENSITIVE cursors are for low concurrency situations when you want to see whether any other user changes a row while you have it. What happens behind the scenes is that the driver retrieves a list of all the best row identifier values (probably ROWIDs or primary keys), and stores them in cache. When the application asks for a fetch of, for example, Row #14, the driver looks up the value associated with that rowsay it's ROWID #123456. Then the driver selects a single row from the server with:

 ... WHERE ROWID = 123456

The server returns that row, and the driver passes it on to the application. Thus you can watch a screen change dynamically as other users change the rows you selected. You should not use a sensitive cursor if a call to getBestRowIdentifier tells you that there is no fast, persistent, unique column lookup possible for the table from which you're selecting.

Concurrency type

The Concurrency Type setting has two options:

 con.createStatement("SELECT * FROM Table1",                     ResultSet.CONCUR_READ_ONLY); con.createStatement("SELECT * FROM Table1",                     ResultSet.CONCUR_UPDATABLE);

The default concurrency type is CONCUR_READ_ONLY. The CONCUR_UPDATABLE concurrency type should be used only if you intend to allow updates of rows that have been fetched , because indicating that a result set is "updatable" will affect the DBMS's locking strategy. Specifically, a DBMS that supports update locks will use them; see Chapter 15, "Locks."

FetchSize

The FetchSize setting has one option:

 stmt.setFetchSize(n);

The overhead for a network message is high, so there's a gain if somebody fetches, say, 25 rows at a time, rather than 1 row at a time. The "somebody" in this case has to be the driver and the DBMS, acting in concert. The way to change the number of rows fetched at once is:

 boolean b = stmt.setFetchSize(25);   // or rs.setFetchSize(25)

There are two fetch sizes: the default FetchSize that the DBMS starts off with, and the DBMS's maximum FetchSize . If you ask for more than the maximum, then boolean b becomes false . The default FetchSize for each of the Big Eight is one row; the maximum FetchSize is 999 rows.

The standard advice is that you should not call setFetchSize because the DBMS must have a reason for setting the FetchSize default where it is. And that is true. However, at the time that the DBMS is installed, there is no way of knowing how large your rows are. The important factor that affects speed is the number of bytes retrieved, not the number of rows. So if your result set has very small or very large rows, then you have a reason to change the number of rows per fetch.

A block fetch is a fetch that gets multiple rows at once. When using block fetches, keep this in mind: If you fetch 25 rows at a time and some other user changes row #24 while you're working, row #24 is out of date by the time your application sees it. You can use the refreshRow method to compensate for this.

The Bottom Line: Query Prepping

You can make a SELECT statement with Statement and execute it with execute Query . Using two threads is applicable in practically all cases. But not all ODBC drivers are reentrantusing two threads is inappropriate when you're using a JDBC-ODBC bridge.

If a statement stmt2 has to follow another statement stmt1 then don't make stmt2 at all. Just reuse stmt1 .

It's important to reuse objects because (a) garbage collection won't do a complete cleanup and (b) reusing an object causes an automatic close of the existing result set. A close must happen, otherwise, memory leaks are possible.

Use consistent, standard, trick-free SQL syntax. Some drivers try to cache recently used SQL statements, so it's handy if they can find the times that the syntax is duplicated.

The general tips for all catalog functions are (a) avoid search patterns or blank string arguments, (b) cache the results, and (c) make sure you need to know the information before using a catalog function.

Remember that the best row identifier is often the DBMS-specific ROWID pseudocolumn. This will not appear in a list created by getColumns .

Get information about a result set before you get the result set itself, using ResultSetMetaData .

The default scroll type is TYPE_FORWARD_ONLY. Leave the setting there unless you can't tolerate going through the result set from start to end, without skipping or retreating.

Scroll cursors are both expensive and hard to implement. You will find that scroll cursors are troublesome with some DBMSs.

Use insensitive cursors for high concurrency situations. Their performance is especially good for read-only transactions.

Don't use an insensitive cursor for a single-row query.

Don't use an insensitive cursor if the query contains LOBs.

Use sensitive cursors for low concurrency situations when you want to see whether any other user changes a row while you have it.

Don't use a sensitive cursor if a call to getBestRowIdentifier tells you that there is no fast, persistent, unique column lookup possible for the table from which you're selecting.

The default concurrency type is CONCUR_READ_ONLY. Leave the setting there unless you intend to allow updates of rows that have been fetched.

Saying that a result set is updatable will affect the DBMS's locking strategy. Specifically, a DBMS that supports update locks will use them.

Fetch multiple rows at once (a block fetch) whenever possible.

If your result set has very small or very large rows, you have a reason to change the number of rows per block fetch.

If you fetch 25 rows at a time and some other user changes row #24 while you're working, row #24 is out of date by the time your application sees it. Use the refreshRow method to compensate for this.