Query processing, which involves parsing, compiling, optimizing, and execution, is one of the most critical functions of a relational database engine. SQL Server 2005 continues to build on the award-winning cost-based optimizer of SQL Server 2000, and it introduces several new features to improve the overall query performance. The following sections discuss the new enhancements made to SQL Server's query optimization and execution.
Statistics Management in SQL Server 2005
SQL Server automatically creates and updates statistics. The query optimizer uses these statistics during query execution to determine things such as whether using an index would speed up a query and to develop an optimal query execution plan. The statistics include information such as cardinality (number of unique values), density (uniqueness of values within a dataset), selectivity (number of rows that will be returned by a particular query criterion), and so on, for one or more columns. In SQL Server 2000, the statistics were saved in the statblob column in the sysindexes system table. In SQL Server 2005, these statistics are now saved in an internal system table, sys.sysobjvalues. Therefore, the statblob column in the sys.sysindexes backward-compatibility view is now always returned as NULL.
You cannot access system internal tables in multiuser mode. You have to start SQL Server in single-user mode, make the Resource database (discussed later) the current database, and then use the three-part naming convention (for example, AdventureWorks.sys.sysobjvalues) to view the system table contents.
Missing or out-of-date statistics can have a negative impact on the query optimizer's ability to choose the most efficient plan. Out-of-date or missing statistics are indicated as warnings (with the table name in red text) when the execution plan of a query is graphically displayed in SQL Server Management Studio.
SQL Server 2005 introduces several enhancements that improve the management of statistics and also help the query optimizer choose the most efficient query plan for a broader range of queries. Here are some of these enhancements:
Before a query, batch, stored procedure, trigger, prepared statement, or dynamic SQL statement (henceforth, "batch") begins execution on a SQL Server database, the batch is compiled into a plan. The plan is then executed for its effects or to produce results. Such compiled plans are stored in a part of SQL Server's memory called the plan cache. If SQL Server is later able to reuse a compiled plan from the plan cache, it avoids the compilation costs, which improves the overall performance.
The area of memory where compiled plans are cached is sometimes referred as the procedure cache. However, the term plan cache is more accurate because this cache stores plans of other queries in addition to those of stored procedures.
When a cacheable batch is submitted to SQL Server 2005 for execution, it is compiled, and a query plan for it is put in the plan cache. A query plan is a read-only reentrant structure that is shared by multiple users. There are at most two instances of a query plan at any time in the plan cache: One for all the serial executions and one for all the parallel executions. Each user concurrently executing a batch has an execution context that holds data (such as parameter values) specific to his or her execution. Although execution contexts are reused, they are not reentrant (that is, they are single threaded). That is, at any point of time, an execution context can be executing only one batch submitted by a session, and while the execution is happening, the context is not given to any other session or user. A query plan and multiple associated execution contexts can coexist in the plan cache. However, just an execution context (without an associated query plan) cannot exist in the plan cache. Whenever a query plan is removed from the plan cache, all the associated execution contexts are also removed along with it. Also, query plan reuse does not necessarily imply execution context reuse. Execution contexts for parallel plans are not cached. An execution plan is a combination of a query plan and an execution context. You can query the sys.dm_exec_cached_plans DMV or sys.syscacheobjects backward-compatibility view to obtain information about the query execution plans that are cached by SQL server.
Before executing a query plan, SQL Server checks the correctness and optimality of that query plan. If one of the checks fails, the statement corresponding to the query plan or the entire batch is compiled again, and possibly a different query plan is produced. This process is known as recompilation. Recompilations are often performed for good reasons. However, sometimes excessive recompilations can degrade performance. In such cases, it becomes necessary to analyze the reasons for recompilations and try to reduce the occurrences of recompilations.
When a batch was recompiled in SQL Server 2000, all the statements in the batch were recompilednot just the one that triggered the recompilation. SQL Server 2005 improves on this behavior by compiling only the statement that caused the recompilationnot the entire batch. This statement-level recompilation feature improves SQL Server 2005's recompilation behavior when compared to that of SQL Server 2000. In particular, SQL Server 2005 spends less CPU time and memory during batch recompilations, and it obtains fewer compile locks. You can use a new Profiler trace event called SQL:StmtRecompile under the T-SQL event class to trace statement-level recompilations.
Calculating Query Plan Cost
In addition to stored procedures and triggers, SQL Server can cache the query plans for dynamic SQL, prepared queries, sp_executesql queries, ad hoc queries, auto-parameterized queries, and other batches. Every query plan and execution context has a cost associated with it. When the cost reaches 0, the plan or context becomes a candidate for deletion from the plan cache. In other words, the cost partially controls how long the plan or context lives in the plan cache. These costs are calculated and manipulated differently in SQL Server 2005 than they were in SQL Server 2000.
In SQL Server 2000, costs were calculated and manipulated as follows:
Query plan cost c = f(cpuTime, pagesRead, pagesWritten) / pagesUsedInMem
In SQL Server 2000, the query plan cost was a mathematical function that used four factors: CPU time spent generating the plan, number of pages read from disk, number of pages written to disk, and number of memory pages occupied by the query plan of the batch. In SQL Server 2000, the lazy writer thread occasionally swept through the plan cache and decremented costs by dividing them by 4. In case of memory pressure, query plans and execution contexts with costs of 0 were deleted from the plan cache. When a query plan or an execution context was reused, its cost was reset back to its compilation (or execution context generation) cost.
In SQL Server 2005, costs are calculated and manipulated as follows:
Query plan cost c = I/O cost + context switch cost + memory cost
In SQL Server 2005, the query plan cost is calculated in terms of the number of ticks, with a maximum of 31 ticks. It is a sum of the following:
In SQL Server 2005, the lazy writer does not decrement costs. Instead, as soon as the size of the plan cache reaches 50% of the BPool size, the next plan cache access decrements the ticks of all the plans by 1 each. If the sum of the sizes of all the caches in SQL Server 2005 reaches or exceeds 75% of the BPool size, a dedicated resource monitor thread gets activated, and it decrements the tick counts of all the objects in all the caches. As in SQL Server 2000, query plan reuse causes the query plan cost to be reset to its initial value.
Parallel Query Processing
Depending on factors such as available memory, type and estimated cost of query, number of schedulers on the server, load on the schedulers, and so on, SQL Server might decide to execute a query in parallel. You can use the max degree of parallelism sp_configure option to limit the number of processors to use in parallel plan execution. The default value of 0 uses all available processors. In addition to queries, SQL Server 2005 considers parallel execution plans for index DDL operations and static and keyset-driven cursor population. You can override the max degree of parallelism value in queries by specifying the MAXDOP query hint in the query statement.
The relational engine has been enhanced to perform more types of tasks in parallel. As mentioned earlier, SQL Server can now update statistics in parallel. Another example of improved support for parallelism is parallel index operations. In SQL Server 2005 Enterprise Edition, indexes can be created or rebuilt in parallel. DDL statements such as CREATE/ALTER/DROP INDEX and ALTER TABLE now accept a MAXDOP setting that you can provide to override the max degree of parallelism value.
SQL Server 2005 avoids the intra-query deadlock bug (see Microsoft Knowledge Base article 837983) present in SQL Server 2000. SQL Server 2000 Service Pack 3 first fixed this problem (see Microsoft Knowledge Base article 315662) by detecting a deadlock that involves the threads that are used by the same parallel query and returning error message 8650. SQL Server 2005 is designed so that intra-query deadlock will not happen.