Query Optimization

The SQL Server Query Optimizer is the database engine component of SQL Server 2000. As the database engine, it oversees all data-related interaction. It is responsible for generating the execution plans for any SQL operation. In diagnosing a query, the optimizer must decide on the most efficient means of executing the query and interacting with the database objects.

SQL Server has a cost-based optimizer that can be extremely sensitive to the information provided by statistics. Without accurate and up-to-date statistical information, SQL Server has a great deal of difficulty in determining the best execution plan for a particular query.

SQL Server goes through a considerable process when it chooses one execution plan out of several possible methods of executing a given operation. This optimization is one of the most important components of a SQL Server database system. Although some overhead is incurred by the optimizer's analysis process, this overhead is saved in execution.

The optimizer uses a cost-based analysis procedure. Each possible method of execution has an associated cost. This cost is determined in terms of the approximated amount of computing resources used in execution. The Query Optimizer must analyze the possible plans and choose the one that has the lowest estimated cost.

It is not uncommon for some complex SELECT statements to have thousands of possible plans. Of course, in this case the optimizer does not analyze every possible combination. It uses a complex series of processes to find a plan that has a cost reasonably close to the minimuma minimum that is only theoretical and unlikely to be achieved.

The Query Optimizer relies on up-to-date statistical information that is maintained within the metadata of a database system. This information is collected and updated based on changes to the index and data structure. Proper maintenance should ensure that the statistical data is maintained and accurately reflects the current database environment.

The primary tool available for the interaction with the Query Optimizer is the SQL Query Analyzer, which is an all-in-one T-SQL editor, debugging environment, and object viewer.

The SQL Query Analyzer

At this juncture in the book, and in your own experience with SQL Server, you should have begun to master using this tool to enter SQL. The object browser and templates, color-coded entry environment, variety of display formats, and powerful toolset all make the tool a mandatory element for use by an administrator or a programmer.

Now that you are familiar with the tool, it is time to turn it into one of the most important elements of the application diagnostic and performance-tuning framework. Capable of reaching into individual commands and objects, this tool provides for the finest level of granularity and deepest interaction with the SQL Server architecture.

When viewing an execution plan, you have the capability of representing the activity that the optimizer needs to perform with a graphic set of icons. Accompanying the graphic is a set of statistics that goes along with that portion of the operation.

Query Execution Plans

The execution plan options graphically display the data retrieval methods chosen by the Query Optimizer. The graphical execution plan uses icons to represent the execution of specific statements and queries in SQL Server, rather than the tabular text representation produced by the SET SHOWPLAN_ALL or SET SHOWPLAN_TEXT statements. Analysis of a query plan is a very important process that helps you understand the performance characteristics of a query.

An execution plan is particularly useful in determining the steps a complex query needs to take. See Figure 7.1 for a sample query and execution plan displays for invoice information. The query combines data from six separate tables. It is possible to see the work involved by looking at the plan SQL Server uses.

Figure 7.1. Execution plan for an invoice query.

The graphical plan is read from right to left and from top to bottom. By placing your mouse over any of the icons in the plan, you can acquire information about the processes being performed. Each query in the batch that is analyzed is displayed, including the cost of each query as a percentage of the total cost of the batch.

Reading Execution Plans

There are actually two techniques for generating an execution plan. The graphical execution represents an execution plan with icons, whereas the SET SHOWPLAN_ALL or SET SHOWPLAN_TEXT statements use a tabular representation. Each of these mechanisms enables you to look under the hood of the server to see what is actually occurring. The graphical plan is much easier to use and is what you will be expected to master.

The exam will expect you to be able to look at any of these outputs and decipher what is happening. You will also have to provide recommendations for improvement(s). Being able to see behind the scenes is very useful for understanding the performance characteristics of a query.

The graphical output illustrates all the processes within the engine using pictorial representations. These icons illustrating the many potential processes are shown in Figure 7.2. Some of these processes are discussed in full in this chapter because they pertain to material you are likely to find on the exam. For information on other processes, you should refer to SQL Server Books Online.

Figure 7.2. The graphical showplan icons.

Each of the icons specifies a logical and physical operation. An execution plan will have any number of icons depending on the number of operations needed to perform the query. A single query, even a simple one, will have a number of icons, each used to represent a part of the query or statement.

Each node is related to a parent node. All nodes with the same parent are drawn in the same column. Rules with arrowheads connect each node to its parent. Recursive operations are shown with an iteration symbol, and operators are shown as symbols related to a specific parent. When the query contains multiple statements, multiple query execution plans are drawn.

It is easy to see that a simple change to the database design can make a drastic difference in the execution plan. The simple addition of an index can reduce reads and greatly improve performance. The next two graphics illustrate the difference made in the plan when an index is added. Notice in Figure 7.3 that the rightmost icon is a scan procedure that sequentially reads the data.

Figure 7.3. The execution plan before index placement.

After a nonclustered index is created against the Country field, the server can more efficiently answer the query through the use of an index seek that reads only the data needed to resolve the conditions of the query, as shown in Figure 7.4.

Figure 7.4. The execution plan after an index was placed on the Country column.

Execution Plan Caching and Sharing

The memory used by SQL Server is divided among a number of processes. A pool of this memory is divided between the storage of execution plans and allocation of data buffers. The actual division between the two varies significantly depending on the system use. The portion of this memory pool that is used to store execution plans is referred to as the procedure cache. The data buffers are similarly called the data cache. In any given execution plan there are two main components: the plan itself and the context on which the query was executed.

Most of the plan is a read-only structure that can be used by any number of applications, connections, and users. No actual user context is stored within the plan itself. The execution context is a data structure stored for each user currently running the query associated with the plan. This area stores any data specific to a single use, such as parameter values.

When a query is executed, the server first looks through the procedure cache to verify that a plan already exists. If present, the plan is reused. This reuse of execution plans saves on the overhead associated with recompiling the query. If no plan exists, the server generates a new one. SQL Server 2000 has integrated logic that enables a plan to be shared between connections without requiring that an application prepare the statement.

In rare instances you may be able to apply index and other optimizer hints to change the way a query is performed. The optimizer in most cases produces the best plan, and attempts at bettering the plan don't usually provide any gains.

Optimizer Hints

A "hint" is a method of providing information in the SQL syntax that tells the Query Optimizer to use a specific plan rather than the one it may normally choose. Index, join, locking, query, table, and views all can be supplied with hint information to be passed to the Query Analyzer. Table 7.1 lists the options available for join, query, and table hints.

Table 7.1. Optimizer Hints in SQL Server
Type	Option	Description
Join	LOOP, HASH, MERGE, REMOTE	Indicates how to join the rows of two tables.
Query	HASH, ORDER, GROUP	Indicates hashing or ordering in GROUP BY and COMPUTE aggregations.
Query	MERGE, HASH, CONCAT, UNION	Indicates how UNION operations are performed.
Query	FAST integer	Optimizes for retrieval of the specified number of rows.
Query	FORCE ORDER	Performs joins in the order in which the tables appear.
Query	ROBUST PLAN	Accommodates maximum potential row size.
Table	FASTFIRSTROW	Has the same effect as specifying FAST 1.
Table	INDEX =	Instructs SQL Server to use the specified index(es).
Table	HOLDLOCK, SERIALIZABLE, REPEATABLEREAD, READCOMMITTED, READUNCOMMITTED, NOLOCK	Specifies the isolation level.
Table	ROWLOCK, PAGLOCK, TABLOCK, TABLOCKX, NOLOCK	Specifies the locking granularity.
Table	READPAST	Skips locked rows.
Table	UPDLOCK	Takes update locks instead of shared locks. Cannot be used with NOLOCK or XLOCK.
Table	XLOCK	Takes an exclusive lock. Cannot be used with NOLOCK or UPDLOCK.

Hints can provide additional control over the individual operations performed by the optimizer in the act of retrieving data with a query. Though not necessarily used to improve performance, hints are used for changing the action that would normally be taken. Hints can provide additional functionality that enables system-level applications to be designed that may need access to data beyond the capabilities of a standard data read. In most cases, the use of hints is not recommended because the optimizer usually selects the best mechanism to retrieve the data from a given query.

Join Hints

Join hints are not frequently used. The SQL Server Optimizer rarely makes a poor choice in how to join sets of records together. If you run a few queries and analyze the execution plans, you will see subqueries that are converted to joins and other join selections based on the specifics of the query.

In some rare instances you may be able to improve the performance of a complex query by altering the way the join is made. If you want to get this granular with a particular operation, run tests with different internal approaches to the join and measure their execution. This will seldom improve performance and you should concentrate optimization efforts in other areas.

Query Hints

Query hints are also rarely used and seldom yield any gain in functionality of performance. A Query hint can be used within the OPTION clause of a query, but in most cases will not provide for any performance gain. As with Join hints, if you want to perform some experiments, you may find rare instances in which hints will produce better performance.

Table Hints

Table hints are more frequently used because they can provide necessary functionality needed in some specialized circumstances. Table hints, if used, must be specified following the FROM clause using a WITH clause and must be enclosed in parentheses. Any of the Table hints can be specified using the SELECT, INSERT, UPDATE, and DELETE statements to instruct the server as to the type of locks to use.

Table hints are used when you need to have a greater degree of control over the types of locks acquired on an object. These locking hints, when used, will override the current transaction isolation level for the session. Table 7.2 illustrates these lock hints and provides a short description of the functionality.

Table 7.2. Locking Hints Used Within SQL Server
Locking Hint	Description
HOLDLOCK	Will hold a shared lock until completion of the transaction. HOLDLOCK is equivalent to SERIALIZABLE.
NOLOCK	Do not issue shared locks and do not honor exclusive locks. Dirty reads of uncommitted transactions are possible.
PAGLOCK	Use page locks where a single table lock would be taken.
READCOMMITTED	(Default) Running at the READ COMMITTED isolation level.
READPAST	Causes a transaction to skip rows locked by other transactions that would ordinarily appear rather than block the transaction.
READUNCOMMITTED	Equivalent to NOLOCK.
REPEATABLEREAD	Perform a scan with the same locking semantics as a transaction running at REPEATABLE READ isolation level.
ROWLOCK	Use row-level locks instead of page- and table-level locks.
SERIALIZABLE	Perform a scan with the same locking semantics as a transaction running at the SERIALIZABLE isolation level.
TABLOCK	Use a table lock instead of the row- or page-level locks. SQL Server holds this lock until the end of the statement. However, if you also specify HOLDLOCK, the lock is held until the end of the transaction.
TABLOCKX	Use an exclusive lock on a table. This lock prevents others from reading or updating the table.
UPDLOCK	Use update locks instead of shared locks while reading a table, and hold locks. UPDLOCK has the advantage of allowing you to read data without blocking other readers and update it later with the assurance that the data has not changed.
XLOCK	Use an exclusive lock that will be held until the end of the transaction.

You need to exercise caution with the use of any of these hints because you will be changing the default behavior and your procedure should document the behavior chosen and reasons for the implementation choice.