Query Optimization and Execution | Microsoft SQL Server 2005: Changing the Paradigm (SQL Server 2005 Public Beta Edition)

Query processing, which involves parsing, compiling, optimizing, and execution, is one of the most critical functions of a relational database engine. SQL Server 2005 continues to build on the award-winning cost-based optimizer of SQL Server 2000, and it introduces several new features to improve the overall query performance. The following sections discuss the new enhancements made to SQL Server's query optimization and execution.

Statistics Management in SQL Server 2005

SQL Server automatically creates and updates statistics. The query optimizer uses these statistics during query execution to determine things such as whether using an index would speed up a query and to develop an optimal query execution plan. The statistics include information such as cardinality (number of unique values), density (uniqueness of values within a dataset), selectivity (number of rows that will be returned by a particular query criterion), and so on, for one or more columns. In SQL Server 2000, the statistics were saved in the statblob column in the sysindexes system table. In SQL Server 2005, these statistics are now saved in an internal system table, sys.sysobjvalues. Therefore, the statblob column in the sys.sysindexes backward-compatibility view is now always returned as NULL.

Note

You cannot access system internal tables in multiuser mode. You have to start SQL Server in single-user mode, make the Resource database (discussed later) the current database, and then use the three-part naming convention (for example, AdventureWorks.sys.sysobjvalues) to view the system table contents.

Missing or out-of-date statistics can have a negative impact on the query optimizer's ability to choose the most efficient plan. Out-of-date or missing statistics are indicated as warnings (with the table name in red text) when the execution plan of a query is graphically displayed in SQL Server Management Studio.

SQL Server 2005 introduces several enhancements that improve the management of statistics and also help the query optimizer choose the most efficient query plan for a broader range of queries. Here are some of these enhancements:

Column-level change tracking In SQL Server 2000, statistics update was determined by the number of row changes. SQL Server 2005, on the other hand, tracks the data changes at the column level and avoids the automatic update of statistics on columns that have not changed enough to warrant a statistics update. This reduces the instances of automatic update statistics and hence may improve query performance.
The AUTO_UPDATE_STATISTICS_ASYNC database option In SQL Server 2000, if a query caused an automatic statistics update, the query was blocked until the statistics were refreshed. This sometimes caused query timeout errors and unpredictable query response times. For such scenarios, SQL Server 2005 provides a databasewide option called AUTO_UPDATE_STATISTICS_ASYNC that, when turned on, results in statistics being updated in the background, without blocking the query. The query that caused the statistics update proceeds with the old statistics, while SQL Server starts a new thread to update the statistics. This provides more predictable query response time for some workloads. The AUTO_UPDATE_STATISTICS_ASYNC database option is turned off by default. It is recommended that you perform enough analysis and testing before turning this option on to ensure that it does not adversely affect the query performance or any other part of the engine. You can use the is_auto_update_stats_async_on column in the sys.databases catalog view to determine whether the AUTO_UPDATE_STATISTICS_ASYNC database option is turned on or off.
Parallelism SQL Server 2005 supports creating full scan statistics in parallel for both partitioned and non-partitioned tables. This can potentially lead to much faster statistics creation times.
String statistics For character columns, SQL Server now gathers information about the frequency distribution of substrings. In other words, SQL Server 2005 gathers a new kind of statistics called "tries" that aid the optimizer in better estimating the selectivity of conditions that use the LIKE operator. In addition, SQL Server now allows columns of large object types such as text, ntext, image, nvarchar(max), varchar(max), and varbinary(max) to be specified as statistics columns.
Computed columns SQL Server 2000 only partially supported (and did not document) creating and updating statistics on computed columns. SQL Server 2005 now fully supports and documents computed column statistics.
A change in the sampling size formula As in earlier releases, to minimize the cost of automaticstatistical update, SQL Server samples the data instead of analyzing all the data. For tables that have more than 1,024 pages (that is, tables over 8MB), SQL Server 2005 now ensures that a minimum of 8MB of data is sampled during statistics gathering.
Statistics on up to 32 columns The limit on the number of columns in a statistics object has been increased to 32, from 16.
Enhanced DBCC SHOW_STATISTICS results The DBCC SHOW_STATISTICS statement provides the option to restrict the output to header, density vector, histogram, or a combination of these. If no options are specified, all three result sets are returned. In addition, the output has been enhanced to display the name of statistics object being displayed and whether it is a string index.

Statement-Level Recompilation

Before a query, batch, stored procedure, trigger, prepared statement, or dynamic SQL statement (henceforth, "batch") begins execution on a SQL Server database, the batch is compiled into a plan. The plan is then executed for its effects or to produce results. Such compiled plans are stored in a part of SQL Server's memory called the plan cache. If SQL Server is later able to reuse a compiled plan from the plan cache, it avoids the compilation costs, which improves the overall performance.

Note

The area of memory where compiled plans are cached is sometimes referred as the procedure cache. However, the term plan cache is more accurate because this cache stores plans of other queries in addition to those of stored procedures.

When a cacheable batch is submitted to SQL Server 2005 for execution, it is compiled, and a query plan for it is put in the plan cache. A query plan is a read-only reentrant structure that is shared by multiple users. There are at most two instances of a query plan at any time in the plan cache: One for all the serial executions and one for all the parallel executions. Each user concurrently executing a batch has an execution context that holds data (such as parameter values) specific to his or her execution. Although execution contexts are reused, they are not reentrant (that is, they are single threaded). That is, at any point of time, an execution context can be executing only one batch submitted by a session, and while the execution is happening, the context is not given to any other session or user. A query plan and multiple associated execution contexts can coexist in the plan cache. However, just an execution context (without an associated query plan) cannot exist in the plan cache. Whenever a query plan is removed from the plan cache, all the associated execution contexts are also removed along with it. Also, query plan reuse does not necessarily imply execution context reuse. Execution contexts for parallel plans are not cached. An execution plan is a combination of a query plan and an execution context. You can query the sys.dm_exec_cached_plans DMV or sys.syscacheobjects backward-compatibility view to obtain information about the query execution plans that are cached by SQL server.

Before executing a query plan, SQL Server checks the correctness and optimality of that query plan. If one of the checks fails, the statement corresponding to the query plan or the entire batch is compiled again, and possibly a different query plan is produced. This process is known as recompilation. Recompilations are often performed for good reasons. However, sometimes excessive recompilations can degrade performance. In such cases, it becomes necessary to analyze the reasons for recompilations and try to reduce the occurrences of recompilations.

When a batch was recompiled in SQL Server 2000, all the statements in the batch were recompilednot just the one that triggered the recompilation. SQL Server 2005 improves on this behavior by compiling only the statement that caused the recompilationnot the entire batch. This statement-level recompilation feature improves SQL Server 2005's recompilation behavior when compared to that of SQL Server 2000. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and it obtains fewer compile locks. You can use a new Profiler trace event called SQL:StmtRecompile under the T-SQL event class to trace statement-level recompilations.

Calculating Query Plan Cost

In addition to stored procedures and triggers, SQL Server can cache the query plans for dynamic SQL, prepared queries, sp_executesql queries, ad hoc queries, auto-parameterized queries, and other batches. Every query plan and execution context has a cost associated with it. When the cost reaches 0, the plan or context becomes a candidate for deletion from the plan cache. In other words, the cost partially controls how long the plan or context lives in the plan cache. These costs are calculated and manipulated differently in SQL Server 2005 than they were in SQL Server 2000.

In SQL Server 2000, costs were calculated and manipulated as follows:

Query plan cost c = f(cpuTime, pagesRead, pagesWritten) / pagesUsedInMem

In SQL Server 2000, the query plan cost was a mathematical function that used four factors: CPU time spent generating the plan, number of pages read from disk, number of pages written to disk, and number of memory pages occupied by the query plan of the batch. In SQL Server 2000, the lazy writer thread occasionally swept through the plan cache and decremented costs by dividing them by 4. In case of memory pressure, query plans and execution contexts with costs of 0 were deleted from the plan cache. When a query plan or an execution context was reused, its cost was reset back to its compilation (or execution context generation) cost.

In SQL Server 2005, costs are calculated and manipulated as follows:

Query plan cost c = I/O cost + context switch cost + memory cost

In SQL Server 2005, the query plan cost is calculated in terms of the number of ticks, with a maximum of 31 ticks. It is a sum of the following:

I/O cost Two I/Os cost 1 tick, with a maximum of 19 ticks
Context switch cost Two context switches cost 1 tick, with a maximum of 8 ticks
Memory cost Sixteen memory pages (128KB) cost 1 tick, with a maximum of 4 ticks

In SQL Server 2005, the lazy writer does not decrement costs. Instead, as soon as the size of the plan cache reaches 50% of the BPool size, the next plan cache access decrements the ticks of all the plans by 1 each. If the sum of the sizes of all the caches in SQL Server 2005 reaches or exceeds 75% of the BPool size, a dedicated resource monitor thread gets activated, and it decrements the tick counts of all the objects in all the caches. As in SQL Server 2000, query plan reuse causes the query plan cost to be reset to its initial value.

Parallel Query Processing

Depending on factors such as available memory, type and estimated cost of query, number of schedulers on the server, load on the schedulers, and so on, SQL Server might decide to execute a query in parallel. You can use the max degree of parallelism sp_configure option to limit the number of processors to use in parallel plan execution. The default value of 0 uses all available processors. In addition to queries, SQL Server 2005 considers parallel execution plans for index DDL operations and static and keyset-driven cursor population. You can override the max degree of parallelism value in queries by specifying the MAXDOP query hint in the query statement.

The relational engine has been enhanced to perform more types of tasks in parallel. As mentioned earlier, SQL Server can now update statistics in parallel. Another example of improved support for parallelism is parallel index operations. In SQL Server 2005 Enterprise Edition, indexes can be created or rebuilt in parallel. DDL statements such as CREATE/ALTER/DROP INDEX and ALTER TABLE now accept a MAXDOP setting that you can provide to override the max degree of parallelism value.

SQL Server 2005 avoids the intra-query deadlock bug (see Microsoft Knowledge Base article 837983) present in SQL Server 2000. SQL Server 2000 Service Pack 3 first fixed this problem (see Microsoft Knowledge Base article 315662) by detecting a deadlock that involves the threads that are used by the same parallel query and returning error message 8650. SQL Server 2005 is designed so that intra-query deadlock will not happen.