Chapter 14: Performance Tuning T-SQL | Professional SQL Server 2005 Administration (Wrox Professional Guides)

Performance tuning T-SQL is very interesting, but also quite frequently frustrating. It is interesting because there is so much involved in tuning that knowledge of the product's architecture and internals plays a very large role in doing it well. Of course, knowledge alone is not sufficient without the right tools, which you will learn about in this chapter. If you have tuned a query and reduced its runtime, you may have jumped up and down, but sometimes you cannot achieve that result even after losing sleep for many nights, and that's when you get frustrated. In this chapter you will learn how to gather the data for query tuning, the tools for query tuning, the stages a query goes through before execution, and how to analyze the execution plan. It is very important to understand which stages a query passes through before actually being executed by the execution engine, so we'll start with physical query processing.

Physical Query Processing

SQL Server performs two main steps when a query is fired to produce the desired result. As you would guess, the first step is query compilation, which generates the query plan; and the second step is the execution of the query plan. The compilation phase in SQL Server 2005 goes through three steps: parsing, algebrization, and optimization. In SQL Server 2000, there was a normalization phase, which has been replaced with the algebrization piece in SQL Server 2005. The SQL Server team has spent much effort to re-architect and rewrite several parts of SQL Server. Of course, the goal is to redesign logic to serve current and future expansions of SQL Server functionality. Having said that, after the three steps just mentioned are completed, the compiler stores the optimized query plan in the plan cache. The execution engine takes over after that, copies the plan into its executable form, and of course executes the steps in the query plan to produce the desired result. If the same query or stored procedure is executed again, the compilation phase is skipped and the execution engine uses the same cached plan to start the execution.

You can see this in action with an example. The goal is to determine whether the stored procedure plan is reused or not. We will also use SQL Profiler and the DMV sys.dm_exec_cached_plans to examine some interesting details. In order to determine whether a compiled plan is reused or not, you have to monitor the events SP:CacheMiss, SP:CacheHit, and SP:CacheInsert under the Stored Procedures event class. Figure 14-1 shows these stored procedure plan compilation events in SQL Profiler.

image from book
Figure 14-1

Download the sample code for this chapter from this book's web page at www.wrox.com. Open the sample solution QueryPlanReUse and then open the script ExecuteSP.sql. Connect to the SQL Server on which you have the AdventureWorks database. Please don't do this exercise on a production server! Compile the stored procedure TestCacheReUse in AdventureWorks. The code is as follows:

 USE AdventureWorks GO IF OBJECT_ID('dbo.TestCacheReUse') IS NOT NULL    DROP PROC dbo.TestCacheReUse GO CREATE PROC dbo.TestCacheReUse AS SELECT EmployeeID, LoginID, Title FROM HumanResources.Employee WHERE EmployeeID = 109 GO

Connect the Profiler to your designated machine and start the SQL Profiler after selecting the events, as shown in Figure 14-1. Now execute the stored procedure TestCacheReUse as follows:

 USE AdventureWorks GO EXEC dbo.TestCacheReUse

Note that in SQL Server Profiler you will find the SP:CacheMiss and SP:CacheInsert events, as shown in Figure 14-2.

image from book
Figure 14-2

As you can see in Figure 14-2, the SP:CacheMiss event indicates that the compiled plan is not found in the plan cache. The stored procedure plan is compiled and inserted into the plan cache indicated by SP:CacheInsert, and then the procedure TestCacheReUse is executed.

Execute the same procedure again. This time SQL Server finds the query plan in the plan cache, as shown in Figure 14-3.

image from book
Figure 14-3

As you can see in Figure 14-3 the plan for the stored procedure TestCacheReUse was found in the plan cache, which is why you see the event SP:CacheHit.

The DMV sys.dm_exec_cached_plans also provides information about the plans that are currently cached, along with some other information. Open the script DMV_cachePlanInfo.sql from the solution, shown here (note that the syntax in the following query is valid only when the database is in 90 compatibility mode):

 SELECT bucketid, (SELECT Text FROM sys.dm_exec_sql_text(plan_handle)) AS SQLStatement, usecounts ,size_in_bytes, refcounts FROM sys.dm_exec_cached_plans WHERE cacheobjtype = 'Compiled Plan'   AND objtype = 'proc'

Run this script; you should see output similar what is shown in Figure 14-4.

image from book
Figure 14-4

In this script, we used the DMF sys.dm_exec_sql_text to get the SQL text from plan_handle. In Figure 14-4, you can see that the compiled plan for the stored procedure TestCacheReUse that you executed earlier is cached. The column UseCounts in the output shows how many times this plan has been used since its inception. The first inception of the plan for this stored procedure was created when the SP:CacheInsert event happened, as shown in Figure 14-2. You can also see the number of bytes consumed by the cache object in Figure 14-4. In this case, the cache plan for the stored procedure TestCacheReUse has consumed 40KB in the plan cache. If you run DBCC FREEPROCCACHE now, and then run the query in the DMV_cachePlanInfo.sql script, you will notice that the rows returned are 0 because DBCC FREEPROCCACHE cleared the procedure cache.

If you use WITH RECOMPILE in the stored procedure TestCacheReUse (CREATE PROC TestCacheReUse WITH RECOMPILE AS ...) and run the stored procedure, you will notice that the plan will not be cached in the procedure cache because you are telling SQL Server (with the WITH RECOMPILE option) not to cache a plan for the stored procedure, and to recompile the stored procedure every time you execute it. Use this option wisely, because compiling a stored procedure every time it executes could be very costly, and compilation eats many CPU cycles.

Compilation

As we have discussed, before a query, batch, stored procedure, trigger, or dynamic SQL statement begins execution on a SQL Server, the batch is compiled into a plan. The plan is then executed for its effects or to produce results. The flowchart in Figure 14-5 displays the steps in the compilation process in SQL Server 2005.

image from book
Figure 14-5

When a batch starts, the compilation process will try to find the cached plan in the plan cache. If it finds a match, the execution engine will take over. If a match is not found, the parser starts parsing (explained later in this section). The new plan guide match feature in SQL Server 2005 determines whether an existing plan guide for a particular statement exists (you will learn how to create the plan guide later). If it exists, it uses the plan guide for that statement for execution. (The concepts of forced autoparam and simple autoparam are described later in the chapter.) If a match is found, the parse tree is sent to the algebrizer (explained later in this section), which creates a logical tree for input to the optimizer (also explained later in this section). The plan is then cached in the plan cache. Of course, not all the plans are cached; for example, when you create a stored procedure with WITH RECOMPILE, the plan will not be cached.

Often, the cache plan needs to be recompiled because it is not valid for some reason. Suppose that a batch has been compiled into a collection of one or more query plans. SQL Server checks for validity (correctness) and optimality of that query plan before it begins executing any of the individual query plans. If one of the checks fails, the statement corresponding to the query plan or the entire batch is compiled again, and a possibly different query plan is produced. Such compilations are known as recompilations.

Note that when a batch is recompiled in SQL Server 2000, all of the statements in the batch are recompiled, not just the one that triggered the recompilation. In SQL Server 2005, if a statement in a batch causes the recompilation, then only that statement will be recompiled, not the whole batch. This statement-level recompilation has some advantages. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and obtains fewer compile locks. In addition, if you have a long stored procedure, then you do not need to break it into small chunks just to reduce compile time (as you would probably do in SQL Server 2000).

The reasons for recompilation can be broadly classified in two categories:

Correctness-Related Reasons

If the query processor decides that the cache plan would produce incorrect results, it will recompile that statement or batch. There are two possible reasons why the plan would produce incorrect results.

Schemas of Objects

Your query batch could be referencing many objects such as tables, views, UDFs, or indexes; and if the schemas of any of the objects referenced in the query have changed since your batch was last compiled, your batch must be recompiled for statement-correctness reasons. Schema changes could include many things - for example, adding an index on a table or in an indexed view, or adding or dropping a column in a table or view.

In SQL Server 2005, manually dropping or creating a statistic on a table causes recompilation. Manually updating statistics does not change the schema version. This means that queries that reference the table but do not use the updated statistics are not recompiled. Only queries that use the updated statistics are recompiled.

Batches with unqualified object names result in non-reuse of query plans. For example, in "SELECT * FROM MyTable", MyTable may legitimately resolve to Aish.MyTable if Aish issues this query and she owns a table with that name. Similarly, MyTable may resolve to Joe.MyTable. In such cases, SQL Server does not reuse query plans. If, however, Aish issues "SELECT * FROM dbo.MyTable", there is no ambiguity because the object is uniquely identified, and query plan reuse can happen. (See the uid column in sys.syscacheobjects. It indicates the user ID for the connection in which the plan was generated.

Only query plans with the same user ID are candidates for reuse. When uid is set to -2, it means that the query does not depend on implicit name resolution, and can be shared among different user IDs.)

SET Options

Some of the SET options affect query results. If the setting of a SET option affecting plan reuse is changed inside of a batch, a recompilation happens. The list of SET options that affect plan reusability is ANSI_NULLS, ANSI_NULL_DFLT_ON, ANSI_PADDING, ANSI_WARNINGS, CURSOR_CLOSE_ON_COMMIT, IMPLICIT_TRANSACTIONS, and QUOTED_IDENTIFIER.

These SET options affect plan reuse because SQL Server 2000 and SQL Server 2005 perform constant folding, evaluating a constant expression at compile time to enable some optimizations, and because the settings of these options affect the results of such expressions.

Note

To avoid SET option-related recompilations, establish SET options at connection time, and ensure that they do not change for the duration of the connection.

The following example demonstrates how SET options cause recompilation:

 USE AdventureWorks go CREATE RecompileSetOption AS SET ANSI_NULLS OFF SELECT s.CustomerID, COUNT(s.SalesOrderID) FROM Sales.SalesOrderHeader s GROUP BY s.CustomerID HAVING COUNT(s.SalesOrderID) > 8 GO exec RecompileSetOption    -- causes a recompilation GO exec RecompileSetOption    -- does not cause a recompilation GO

By default, the SET ANSI_NULLS option is ON, so when you compile this stored procedure it will compile with SET ANSI_NULLS ON. Inside the stored procedure, you have set ANSI_NULLS to OFF, so when you begin executing this stored procedure the compiled plan is not valid, and will recompile with SET ANSI_NULLS OFF. The second execution does not cause a recompilation because the cached plan is compiled with "ansi_nulls" set to OFF.

Plan Optimality-Related Reasons

SQL Server is designed to generate the optimal query execution plan as data changes in your database. As you know, data distributions are tracked with statistics (histograms) in the SQL Server query processor. The table content changes because of INSERT, UPDATE, and DELETE operations. Table contents are tracked directly using the number of rows in the table (cardinality), and indirectly using statistics (histograms) on table columns (we explain this in detail later in the chapter). The query processor checks the threshold to determine whether it should recompile the query plan or not, using the following formula:

 [ Colmodctr (snapshot) - Colmodctr (current) ] >= RT

In SQL Server 2005, the table modifications are tracked using Colmodctr (explained shortly). In SQL Server 2000, the counter was Rowmodctr. The Colmodctr is stored for each column, so changes to a table can be tracked with a finer granularity in SQL Server 2005. Note that Rowmodctr is available in SQL Server 2005 servers, but its value is always 0.

In this formula, colmodctr (snapshot) is the value stored at the time the query plan was generated, and colmodctr (current) is the current value. If the difference between these counters as shown in the formula is greater or equal to RT (Recompilation Threshold), then recompilation happens for that statement. RT is calculated as follows for permanent and temporary tables. Note that n refers to a table's cardinality (the number of rows in the table) when a query plan is compiled.

For a permanent table, the formula is as follows:

 If n <= 500, RT = 500 If n > 500, RT = 500 + 0.20 * n

For a temporary table, the formula is as follows:

 If n < 6, RT = 6 If 6 <= n <= 500, RT = 500 If n > 500, RT = 500 + 0.20 * n

For a table variable, RT does not exist. Therefore, recompilations do not happen because of changes in cardinality to table variables. It is interesting to note that if you are using a table variable and you add a large number of rows, the query plan generated may be not optimal if you have already cached the plan before adding a large number of rows.

The following table shows how the Colmodctr is modified in SQL Server 2005 because of different DML statements.

Open table as spreadsheet

Statement	Colmodctr
INSERT	All colmodctr += 1 (colmodctr is incremented by 1 for each column in the table for each insert.)
DELETE	All colmodctr += 1
UPDATE	If the update is to non-key columns: colmodctr += 1 for all of the updated columns. If the update is to key columns: colmodctr += 2 for all of the columns.
Bulk Insert	Like n INSERTs. All colmodctr += n. (n is the number of rows bulk inserted.)
Table truncation	Like n DELETEs. All colmodctr += n. (n is the table's cardinality.)

Tools and Commands for Recompilation Scenarios

You can use the following tools to observe or debug the recompilation-related events.

SQL Profiler

Capture the following events under the event classes Stored Procedure and TSQL to see the recompilation events. Be sure to select the column EventSubClass to view what caused the recompilation:

SP:Starting
SP:StmtCompleted
SP:Recompile
SP:Completed
SP:CacheInsert
SP:CacheHit
SP:CacheMiss

You can also select the AutoStats event under the Performance event class to detect recompilations related to statistics updates.

Sys.syscacheobjects Virtual Table

Although this virtual table exists in the resource database, you can access it from any database. Note that the resource database is a new system database introduced in SQL Server 2005. The resource database is a read-only database that contains all the system objects included with SQL Server 2005. SQL Server system objects, such as sys.objects, are physically persisted in the resource database, but they logically appear in the sys schema of every database. The resource database does not contain user data or user metadata. The cacheobjtype column of this virtual table is particularly interesting. When cacheobjtype = "Compiled Plan", the row refers to a query plan. When cacheobjtype = "Executable Plan", the row refers to an execution context. Note that each execution context must have its associated query plan, but not vice versa. The objtype column indicates the type of object whose plan is cached (for example, "proc" or "Adhoc"). The setopts column encodes a bitmap indicating the SET options that were in effect when the plan was compiled. Sometimes, multiple copies of the same compiled plan (that differ in only their setopts columns) are cached in a plan cache. This indicates that different connections are using different sets of SET options (an undesirable situation). The usecounts column stores the number of times a cached object has been reused since the time the object was cached.

DBCC FREEPROCCACHE

This command will clear the cached query plan and execution context. We recommend using this only in development or test environment. Avoid running it in a production environment.

DBCC FLUSHPROCINDB (db_id)

This command is the same as DBCC FREEPROCCACHE except it only clears the cached plan for a given database. The recommendation for use is the same as DBCC FREEPROCCACHE.

Parser and Algebrizer

Parsing is the process of checking the syntax and transforming the SQL batch into a parse tree. Parsing includes, for example, whether a nondelimited column name starts with a digit or not. Parsing does not check whether the columns you have listed in a WHERE clause really exist or not in any of the tables you have listed in the FROM clause. That is taken care of by the binding process (algebrizer). Parsing turns the SQL text into logical trees. One logical tree will be created per query.

The algebrizer is a new component in SQL Server 2005. This component replaces the normalizer in SQL Server 2000. The output of the parser - a parse tree - is the input to the algebrizer. The major function of the algebrizer is binding, so sometimes the whole algebrizer process is referred as binding. The binding process checks whether the semantics are correct. For example, if you are trying to JOIN table A with trigger T, then the binding process will error this out even though it may be parsed successfully. Other tasks performed by the algebrizer are covered in the following sections.

Name Resolution

The algebrizer perform the tasks of checking whether every object name in the query (the parse tree) actually refers to a valid table or column that exists in the system catalog, and whether it is visible in the query scope.

Type Derivation

The algebrizer determines the type for each node in the parse tree. For example, if you are issuing a UNION query, the algebrizer figures out the type derivation for the final data type. (The columns' data types could be different when you are unioning queries.)

Aggregate Binding

The algebrizer binds the aggregate to the host query and it makes its decisions based on query syntax. Consider the following query:

 SELECT s.CustomerID FROM Sales.SalesOrderHeader s GROUP BY s.CustomerID HAVING EXISTS(SELECT * FROM Sales.Customer c               WHERE c.TerritoryID > COUNT(s.ContactID))

In this query, although the aggregation is done in the inner query that counts the ContactID, the actual operation of this aggregation is performed in the outer query. See the query plan in Figure 14-6.

image from book
Figure 14-6

You can see in the query plan shown in Figure 14-6 that the aggregation is done on the result from the SalesOrderHeader table, although the aggregation is performed in the inner query. The outer query is converted something like this:

 SELECT s.CustomerID, COUNT(s.ContactID) FROM Sales.SalesOrderHeader s GROUP BY s.CustomerID

Grouping Binding

This is probably the obvious one. Consider this query:

 SELECT s.CustomerID, SalesPersonID, COUNT(s.SalesOrderID) FROM Sales.SalesOrderHeader s GROUP BY s.CustomerID, s.SalesPersonID

If you do not add the CustomerID and SalesPersonID columns in the GROUP BY list, the query will error out. The grouped queries have different semantics than the nongrouped queries. All non-aggregated columns or expressions in the SELECT list of a query with GROUP BY must have a direct match in the GROUP BY list. The process of verifying this via the algebrizer is known as grouping binding.

Optimization

Optimization is probably the most complex and important piece to processing your queries. The logical tree created by the parser and algebrizer is the input to the optimizer. The optimizer needs the logical tree, metadata about objects involved in the query, such as columns, indexes, statistics, and constraints, and hardware information. The optimizer uses this information optimizer to create the compiled plan, which is made of physical operators. Note that the logical tree includes logical operators that describe what to do, such as "read table," "join," and so on. The physical operators produced by the optimizer specify algorithms that describe how to do, such as "index seek," "index scan," "hash join," and so on. The optimizer tells SQL Server how to exactly carry out the steps in order to get the results efficiently. Its job is to produce an efficient execution plan for each query in the batch or stored procedure. Figure 14-7 shows this process graphically.

image from book
Figure 14-7

As you can see in Figure 14-7, parsing and the algebrizer describe "what to do," and the optimizer describes "how to do it." SQL Server's optimizer is a cost-based optimizer, which means it will come up with the plan that costs the least. Complex queries may have millions of possible execution plans, so the optimizer does not explore them all, but tries to find an execution plan that has a cost reasonably close to the theoretical minimum, because it has to come up with a good plan in a reasonable amount of time. Keep in mind that the lowest estimated cost doesn't mean the lowest resource cost. The optimizer chooses the plan to get the results quickly to the users. Suppose the optimizer chooses a parallel plan for your queries that uses multiple CPUs, which typically uses more resources than the serial plan but offers faster results. Of course, the optimizer cannot always come up with the best plan, and that's why we have a job - for query tuning.

Optimization Flow

The flowchart in Figure 14-8 explains the steps involved in optimizing a query. These steps are simplified for explanation purposes. We don't mean to oversimplify the state-of-the-art optimization engine written by the SQL Server development team.

image from book
Figure 14-8

As you can see in Figure 14-8, the input to the optimizer is a logical tree produced by the algebrizer. The query optimizer is a transformation-based engine. These transformations are applied to fragments of the query tree. Three kinds of transformation rules are applied, discussed here.

Simplification

The simplification process creates an output tree that is better than the input tree. For example, it might push the filter down in the tree, reduce the group by columns, or perform other transformations. Figure 14-9 shows an example of simplification transformation (filter pushing).

image from book
Figure 14-9

In Figure 14-9, you can see that the logical tree on the left has the filter after the join. The optimizer pushes the filter further down in the tree to filter the data out from Orders table with a predicate on O_OrderPriority. This optimizes the query by performing the filtering early in the execution.

Figure 14-10 is another example of the simplification transformation (aggregate reduction). It includes reducing the number of group by columns in the execution plan.

image from book
Figure 14-10

Figure 14-10 shows that the C_Name is removed from the Group By clause because it contains the column C_Custkey (refer to Figure 14-7 for the T-SQL statement), which is unique on the Customer table, so there is no need to include the C_Name column in the Group By clause. That's exactly what the optimizer does.

Exploration

As mentioned previously, SQL Server uses a cost-based optimizer implementation. Therefore, the optimizer will look at alternative options to come up with the cheapest plan. It makes a global choice using the estimated cost (see Figure 14-11).

image from book
Figure 14-11

The optimizer explores the options related to which table should be used for inner versus outer. This is not a simple determination because it depends on many things, such as the size of the table, the available indices, and operators higher in the tree. It is not a clear choice like the examples you saw in the simplification transformation.

Implementation

The third transformation is implementation. Refer to Figure 14-12 for an example.

image from book
Figure 14-12

As explained earlier, the implantation transforms the "what to do" part into the "how to do" part. In this example, the JOIN logical operation is transformed into a HASH JOIN physical operation. The query cost is derived from physical operators based on model-of-execution algorithms (IO and CPU) and estimations of data distribution, data properties, and size. Needless to say, the cost also depends on hardware such as the number of CPUs and the amount of memory available at the time of optimization.

Refer back to figure 14-8. If the optimizer compared the cost of every valid plan and chose the least costly one, the optimization process could take a very long time, and the number of valid plans can be huge. Therefore, the optimization process is broken up into three search phases. As discussed earlier, a set of transformation rules is associated with each phase. After each phase, SQL Server evaluates the cost of the cheapest query plan to that point. If the plan is cheap enough, then the optimizer stops there and chooses that query plan. If the plan is not cheap enough, the optimizer runs through the next phase, which has an additional set of rules that are more complex.

The first phase of the cost-based optimization, Phase 0, contains a limited set of rules. These rules are applied to queries with at least four tables. Because JOIN reordering alone generates many valid plans, the optimizer uses a limited number of join orders in Phase 0 and it considers only hash and loop joins in this phase. In other words, if this phase finds a plan with an estimated cost below 0.2 (internal cost unit), the optimization ends there.

The second phase, Phase 1, uses more transformation rules and different join orders. The best plan that costs less that 1.0 would cause optimization to stop in this phase. Note that until Phase 1, the plans are nonparallel (serial query plans). What if you have more than one CPU in your system? In that case, if the cost of the plan produced in Phase 1 is more than the cost threshold for parallelism (see sp_configure for this parameter, the default value is 5), then Phase 1 is repeated to find the best parallel plan. Then the cost of the serial plan produced earlier is compared with the new parallel plan, and the next phase, Phase 2 (the full optimization phase), is executed for the cheaper of the two plans.