What Should the Client Do? | SQL Performance Tuning

"The processing power of computer hardware grows exponentially with the price."

Grosch's Law

A client is a terrible thing to waste. There are so many things that a 1GHz computer can do besides accepting user input or formatting pretty screens. Unfortunately the logic is made of iron.

Thesis: Clients can process data in parallel with the server.
Antithesis: But in order to get the data, they must ask for more than the user actually needs.
Synthesis: Therefore the time saved in computing is wasted on communicating.

For example, suppose you decide to implement the DISTINCT operation on the client. That is, instead of sending Query #1 to the server, you send Query #2.

 Query #1: SELECT DISTINCT column1, column2   FROM Table1   ORDER BY column2 Query #2: SELECT column1, column2   FROM Table1   ORDER BY column2

Then, for each row that gets fetched, you compare the row with all the rows already fetched , and if the new row is equal, you discard it. This has an advantage: the server doesn't have to sort by column1 to find the duplicates. Unfortunately, even if only 10% of the rows are duplicates, this scheme turns out to be slower than passing SELECT DISTINCT to the server (GAIN: 7/8 if 10% or more duplicates).

That's not to say that you can't do any result-set processing on the client. There are all sorts of things you can do with result sets: capitalize, count, duplicate, group , sort, store, and so on. What you shouldn't do is filter.

Client Model

For a model, consider what Microsoft Access, a popular front end, does.

First, Access accepts user input. The front end is smart enough to understand SQL-like queries, so it knows quickly whether a table can be accessed from a local (i.e., on the client machine) database, or whether the query must be passed on to the server. Access also knows how to reformat the query so that the server will understand it. For example, the query

 SELECT Date() FROM [Table1]       -- Access syntax

will be transformed to

 SELECT Getdate() FROM Table1      -- SQL Server syntax

In any case, anything passed to the server will be canonical; that is, it will be shorn of any stylistic or personal quirks that cause SQL statements to look different when they're logically the same.

At this point, Access offers a choice of query methods . The obvious one is the Pass Through Query (PTQ). With PTQs, the plan is to SELECT and fetch all the rows immediately, then close the cursor on the server side. The client can then take all the time in the world to play with the fetched results, because the server isn't tied up. On the downside, a PTQ result set must be read-only, because the transaction's over as far as the server is concerned .

The trickier query method is the locally processed query. With locally processed queries, the plan is to SELECT and then fetch all the row identifiers (primary keys or RIDs), but not the actual data. This can save space compared to a PTQ, because a list of row identifiers is almost certain to be smaller than a list of the row contents themselves . Whenever fetches are necessaryfor example, because the user scrolls the screen displaya new SELECT statement is sent to the server.

 SELECT * FROM Table1   WHERE <row identifier = x>      OR <row identifier = y>

Because the row identifiers are known, data changes are possible, too.

 UPDATE Table1 SET column1 ...   WHERE <row identifier = x>

However, some care must be employed in case another user is updating the same rows. This becomes a matter for locking; see Chapter 15, "Locks." The important thing here is that Access is able to decide when a transaction can be terminated on the server side. Usually it can terminate the transaction very quickly without tying up fetched rows, as an unsophisticated front end would do.

Sometimes this sophistication goes too far. For example, suppose you use a function in your query that Access understands, but the server doesn't. In that case, Access will bring in all the rows so that it can process them locally. Moral: Never use a non-standard scalar function!

Finally, Access will handle all on-screen formatting of the result set: the scroll bars, the pictures, the beeps, the image transformations for BLOB images, and all the other details that go into a GUI application. Once again, this activity takes place after Access has signalled the server that it is finished with the query.

This has been an idealized picture of what a front-end program running on a client should do. The logic is not that you should go out and buy Access (Microsoft doesn't pay us for testimonials), but that you should study how Access works and emulate its activities as a good client.

The Bottom Line: Client Tips

Seventeen ways to make your application a better client (with apologies to Paul Simon):

There are all sorts of things you can do with result sets on the client: capitalize, count, duplicate, group, sort, store, and so on. What you shouldn't do is filter.

Clients should transform input and transform output, but not try to take over basic server functions.

The best clients will close result sets early so that the server can release locks. The most realistic objective is not to shove tasks from the server to the client, but to make the client smart enough that it can figure out the quickest way to deal with the server.

Use client threads. Of course any client can fork (i.e., start a separate thread) and do low-priority INSERTs on the other thread. And any client can save up its UPDATEs and let a timer send the requests later on the same thread. The trouble is (because it's not the server doing the accumulations), a connection failure can lose the transaction permanently.

Canonize SQL statements. At least remove trail spaces or multiple spaces (provided they're not inside quotes!), and change capitalization to a consistent scheme. (Do this only if middleware is absent. Generally, formatting statements for the server is a job for the middle tier .)

Use compound statements. By packing several SQL statements into one message, you reduce the network traffic.

Fetch several rows at once. The usual recommendation is that database rows should be short; they are therefore probably shorter than the packet size, or the most efficient size for network transfer. To fetch several rows at once you must persuade the DBMS to increase the rowset size , and you must use a special variant of fetch. These possibilities are available in both ODBC and JDBC.

Use stored queries. This is important enough that there's a chapter on the subject; see Chapter 11, "Stored Procedures."

Set a limit to the number of rows you want to retrieve. There are several ways to set limits. In ODBC you use an SQLSetStmtAttr function, in JDBC you use setFetchSize , in SQL statements you can use a non-standard SQL-extension like Microsoft's SELECT TOP or Informix's SELECT FIRST (see Chapter 6, "Subqueries").

Reject any user queries that begin with SELECT * or at least use the DBMS's catalog/metadata to discover whether " * " implies that BLOBs will be fetched. If so, eliminate them.

Choose servers. For example, you can tell what the probable home branch is from the IP or the client-ID. Then you know which server to prefer.

Retrieve timestamps when you fetch. A timestamp will tell you when the row was inserted or last updated. If your query is read-only, and you retrieved the timestamp and all columns except the BLOB column, and the timestamp is from long ago, you can fetch the BLOB from a replicated server instead of going back to the original one.

Cache information in the client memory. This is particularly good for metadata information, because catalog functions are quite time consuming. It's not perfect, because metadata information is slightly volatile, but you're safe if some other user deletes a table; you'll just get an error message when you try to use it.

Run an error test on a non-urgent server. If there's a location where you can store a duplication of the catalog information (without the data), then put an alternate server at that location so you can parse statements on it. Again, if the duplicated catalog becomes out of date, you're only risking an error message.

Keep statistics. If there are several queries that are similar but not quite the same, there is a candidate for parameterization or reuse.

Trim. Most DBMSs are going to truncate the spaces in a VARCHAR anyway, so don't bother sending them.

Metamorphose. There are standard ways to retrieve information about the DBMS, including its product identification and version number. Adjust the syntax of your statement to match what that DBMS does best.