Troubleshooting and Maintaining Query Performance | MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (Exam 70-444) Study Guide

In any large SQL Server 2005–based database solution, there are typically quite a large number of queries, potentially executing at any given point in time. Likewise, there might be hundreds of tables per database. Consequently, it can be difficult to troubleshoot query performance.

Again, some structured approach or methodology is required when troubleshooting your query performance. Of course, a lack of resources, such as processors or memory, can be the cause of poor query performance, but we covered this in Chapter 1. What we are interested in talking about here are the causes for poor query performance internal to the SQL Server 2005 database engine. So, let’s go through a list of the possible causes of poor query performance that you need to check:

T-SQL code Poor query performance can obviously be attributed to the T-SQL code that has been written. In certain cases, such as whether to use the IN versus OR construct, there is no difference; in other cases, such as using set-based queries versus cursors, there will be. Believe it or not, database developers should write efficient T-SQL code!

Note
Inefficient T-SQL code is commonly detected by looking at the query plan.
Indexes Perhaps the main cause of poor query performance will be a lack of indexes or an inappropriate indexing strategy. This invariably means that the query optimizer will be generating query plans that needlessly perform expensive operations such as table scans, hash joins, excessive bookmark lookups, and/or sorts.

Note
As with inefficient T-SQL code, inappropriate indexing is usually determined by examining the query plan.
Database options This represents a subtler category, but you should not forget to check the database options as part of your methodology for troubleshooting query performance. Certain database options such as AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS will have a direct impact on query performance within your database solution. If either of these options has been turned off, the query optimizer might not be generating optimal query plans. Other database options, such as READ_ONLY or SINGLE_USER, can improve performance because the lock manager does not have to maintain locks for that particular database.
Statistics Inaccurate or out-of-date statistics can dramatically affect the performance of queries in SQL Server 2005. Without accurate statistics, the query optimizer may not generate optimal query plans for your queries. Consequently, it is important to check whether the statistics are up-to-date as part of your query troubleshooting methodology.

Note

For completeness sake, we should add that you can also potentially improve query performance by taking advantage of persisted computed columns and indexed views. Obviously there are other techniques involving redesigning the database’s schema; however, these fall under the auspice of changing the database design, not optimizing query performance for the purposes of this chapter.

The first step, of course, is to identify potentially poorly performing queries. Microsoft has certainly made that process a lot easier in SQL Server 2005, as we will discuss in a moment.

Database Tuning Advisor

Why do things the hard way? The Database Tuning Advisor (DTA) is a replacement for the Index Tuning Wizard that you might have experience with from SQL Server 2000. The DTA can analyze SQL Server Profiler traces or a T-SQL workload script against your database solution and can recommend various performance tuning enhancements:

Creating additional indexes
Dropping existing indexes

Implementing indexed views
Implementing a partitioning strategy

The DTA in SQL Server 2005 can recommend a number of performance enhancements across a number of databases simultaneously from a single trace/workload file. You can limit DTA in a number of ways:

Time spent tuning your database solution
Partitioning strategy to use
- Aligned partitions
- Full partitions

Physical design structures to use in the database
- Indexes
- Indexed views
- Nonclustered indexes

Physical design structures to keep in the database

The DTA generates a list of recommendations that can be converted into an XML script or T-SQL scripts. It also produces a number of reports that summarize different aspects of its analysis of your database solution. You can evaluate these recommendations and decide whether you want to implement them.

Simple!

In fact, you’ll now go through the DTA and see how it works in Exercise 2.1.

Exercise 2.1: Using the Database Tuning Advisor

The DTA represents an easy way to tune your SQL Server 2005 database solution, quickly and easily. Don’t be under the illusion that you can do better than the DTA. Although that might be the case, depending on your skill and knowledge of the environment, we still recommend running the DTA to see whether its recommendations mirror yours.

This exercise assumes you have captured a SQL Server Profiler/SQL Trace workload file, as discussed in Chapter 1.

Use the Windows Start menu, and choose All Programs Microsoft SQL Server 2005 Performance Tools Database Engine Tuning Advisor.
Connect to your SQL Server 2005 instance using Windows authentication.

Type in a session name and the location of the previously captured workload file. Click the database that you want to tune, and filter out any tables as appropriate.
Click the Tuning Options tab. The Tuning Options tab allows you to further refine what you want the DTA to perform during the tuning session. Review and configure the DTA tuning options appropriately. If in doubt, leave the default settings.

Click the Advanced Options button. The Advanced Tuning Options dialog box allows you to further refine the tuning options. Review and configure the DTA advanced tuning options appropriately. If in doubt, leave the default settings. Click the OK button when you have finished.
Click the Start Analysis button on the toolbar. The DTA should start the tuning process.

After the DTA has completed, it will generate a Recommendations tab where you can analyze the indexing and partitioning recommendations it has made. Notice the DTA will also generate an estimated improvement in performance. You can ignore recommendations by deselecting the appropriate check box.

Note: Remember that the quality of the DTA’s recommendations is based on the “quality” of the workload file you have captured. So, ensure that the workload file is a correct representation of the database activity expected.
Click the recommendations hyperlink, located at the far right of the Recommendations tab, to see the T-SQL script that would implement the recommendation. Click the Close button when you have finished.

Click the Reports tab. The Reports tab will show you a summary of this DTA session. Click the Select Report drop-down list to see what tuning reports are available.
Review the various tuning reports available.
You can save the results of the DTA’s session through the File menu for later analysis as required.
When you have finished, you can exit the DTA.

Identifying Poorly Performing Queries

Traditionally, DBAs would have to rely on user feedback to identify poorly performing queries. They would wait until users of a database solution complained about a specific performance problem and then investigate whether it was a true performance issue or a contention issue.

Not anymore! SQL Server 2005 offers some innovative new techniques that allow you to identify poorly performing queries that are being run in your environment. This allows you to tune these poorly performing queries without having to rely on user complaints. An additional benefit is that, once tuned, these queries typically have less of an impact on the other queries being executed, so the overall database solution performs better.

Otherwise, you still can use the more “traditional” tools to identify poorly performing queries in SQL Server 2005. We’ll run through a summary of these techniques in the following sections.

Query Plan

Once a query has been identified for potential performance tuning, be it through user experience or its relative importance in the SQL Server 2005 database solution, the main technique you would use to analyze its performance is to analyze its query plan.

There are also a few other metrics to consider:

STATISTICS IO
STATISTICS TIME
STATISTICS PROFILE

A new feature of SSMS allows you to the capture and display execution metrics of your queries. These metrics can be compared between multiple executions and averaged, as shown in Figure 2.1, which can be useful to exclude environmental anomalies when testing.

image from book
Figure 2.1: Client statistics

To generate these client statistics, you need to click the Include Client Statistics button in the Standard toolbar before executing the query.

Note

These client statistics are not automatically reset for a given user connection in SSMS. If you change your query, don’t forget to reset the client statistics via the Reset Client Statistics option in the Query menu.

But as we indicated, analysis of the query’s query plan would allow you to determine whether the query is performing optimally.

We will be discussing query plans and their analysis in more detail in a moment.

SQL Server Traces

In Chapter 1, we covered how you can capture a trace of the queries being executed on your SQL Server 2005 instance either through SQL traces or through SQL Server Profiler.

The captured trace information obviously provides a means of identifying potentially poorly performing queries. When analyzing your SQL traces, look for events that indicate the following:

Excessive recompilations/compilations
Expensive operations:
- Table scans
- Hash operations
- Join operations
- Sort operations

You can capture quite a number of SQL Server Profilers trace events for your analysis:

Errors and Warnings event classes
- Execution Warnings
- Hash Warning
- Missing Column Statistics
- Missing Join Predicate
- Sort Warnings

Stored Procedures event classes
- SP:Recompile

TSQL event classes
- SQL:StmtRecompile

Note

Of course, you can also create a trace based on the TSQL and Stored Procedure event classes. You can then analyze your trace based on the longest Duration values-the slowest queries, or, rather, the queries that are taking the longest to run. But in our book, time is always an imprecise measurement. For obvious reasons, the queries that are taking the longest to run might not necessarily be inefficient. But they can be. So we suppose it’s a worthwhile method to adopt.

Dynamic Management Views

SQL Server 2005 offers a number of DMVs that you can use to identify poorly performing queries. We will go through them in more detail as required, but it is worth listing them all here.

A query will not be performing optimally if the indexes that it uses are heavily fragmented or otherwise inefficient. To monitor index fragmentation, usage, overhead, and hotspots, you can take advantage of the following DMVs:

sys.dm_db_index_usage_stats
sys.dm_db_index_operational_stats
sys.dm_db_index_physical_stats

We will cover the sys.dm_db_index_physical_stats DMV and go through an example in more detail later in this chapter.

A particularly exciting set of DMVs in SQL Server 2005 provides information about potentially missing indexes that could enhance query performance:

sys.dm_db_missing_index_group_stats
sys.dm_db_missing_index_groups
sys.dm_db_missing_index_details
sys.dm_db_missing_index_columns

The sys.dm_db_missing_index_details DMV returns detailed information about missing indexes, whereas the sys.dm_db_missing_index_columns(index_handle) DMV returns information about database table columns that are missing an index. Potentially missing indexes are applicable to the following types of query types:

Equality predicates
Inequality predicates (which represent any operator other than an equality)
Included columns (for covering queries)

These missing DMVs are not intended as fine-tuning mechanisms, and they don’t provide an order for the columns of an index. They are intended to be used as guidelines for the indexes that the SQL Server 2005 database engine considers you need for a database. Be particularly careful with inequality predicates recommendations.

Note

Because DMVs are memory-only structures, the missing index information is deleted upon the SQL Server 2005 instance being shut down. Consequently, you should export the missing index information if required before shutting down your SQL Server 2005 instance.

The following example queries the missing index DMVs to determine whether there are any missing indexes that would result in poorly performing queries:

 SELECT * FROM   sys.dm_db_missing_index_details AS mid CROSS APPLY        sys.dm_db_missing_index_columns (mid.index_handle) JOIN   sys.dm_db_missing_index_groups AS mig ON     mig.index_handle = mid.index_handle

The following set of execution-related DMVs can also be queried to help you identify potentially poorly performing queries:

sys.dm_exec_query_stats
- sys.dm_exec_query_plan
- sys.dm_exec_sql_text

sys.dm_exec_cached_plans

The sys.dm_exec_query_stats DMV is particularly useful because it returns aggregate performance metrics about the cached execution plans. This allows you to quite easily query your SQL Server 2005 instance for the queries that have taken up the most resources, be they logical reads, physical writes, CLR time, and so forth.

For example, you might decide to investigate the most processor-intensive queries that have been executed on your SQL Server 2005 instance. In this case, you could query the sys.dm_exec_query_stats DMV:

 SELECT * FROM   sys.dm_exec_query_stats AS eqs CROSS APPLY        sys.dm_exec_query_plan(eqs.plan_handle) ORDER BY        total_worker_time DESC ;

Alternatively, you might decide to look for queries that use processor-intensive operators such as hash matches and sorts. In this case, you could query the sys.dm_exec_cached_plans DMV filtering for either 'Hash Match' or 'Sort' on the query_plan column from the follow-ing base query:

 SELECT * FROM   sys.dm_exec_cached_plans AS ecp CROSS APPLY        sys.dm_exec_query_plan(ecp.plan_handle) ;

The sys.dm_os_wait_stats DMV returns processes that are currently waiting and thus can be queried to help diagnose potential query performance problems. We will cover the sys.dm_ os_wait_stats DMV in more detail later in this chapter.

The sys.dm_tran_locks DMV returns the database resources that are currently locked, along with requests for those same resources. Likewise, we will cover the sys.dm_tran_locks DMV in more detail later.

There are other DMVs of course, so make sure you are comfortable with them. Welcome to the brave new world!

Analyzing a Query Plan to Detect Inefficiencies in Query Logic

A query plan basically describes how the query optimizer executed a T-SQL statement within a batch. The query plan shows the different types of operations that needed to be performed and the order in which they were performed. It also shows the data access method used to retrieve data from the tables, be it an index scan, index seek, or table scan. It shows which steps consumed the most resources and/or time within both the T-SQL statement and the batch.

Note

The query plan is more commonly referred to as the execution plan.

The SQL Server Management Studio (SSMS) environment has the ability to display the exe-cution plan that SQL Server 2005’s query optimizer used in the execution of your T-SQL batch. This ability can help you determine whether your queries are executing efficiently.

If you decide they are not efficient, you can take corrective measures, such as rewriting the query or redesigning your indexing strategy. Sometimes you will have to override the optimizer through query/table hints.

Generating Query Plans

One of the features we really like in SQL Server 2005 is its ability to easily generate query plans. Understanding them might be another matter, of course! There is no longer any mystery as to how the database engine is executing a particular query, unlike many other relational database products. Of course, the amount of information and metrics that query plans return can be a bit daunting at first.

You can generate a query plan in two basic ways in SQL Server 2005. The first way, through T-SQL, which will generate a text version of the query plan, is to use one of the following SET options:

SHOWPLAN_TEXT   The SET SHOWPLAN_TEXT ON statement will show the execution plan for the T-SQL statements executed. Note that the T-SQL will not actually be executed.

SHOWPLAN_ALL   The SET SHOWPLAN_ALL ON statement will return the same information that the SHOWPLAN_TEXT option does with additional information about estimated resource utilization. Again, the T-SQL will not actually be executed.

SHOWPLAN_XML   The SET SHOWPLAN_XML ON statement will return the same information that the SHOWPLAN_TEXT option does, but in an XML format. Again, the T-SQL will not actually be executed.

Note
The query_plan column of the sys.dm_exec_query_plan DMV returns the same information as the SHOWPLAN_XML option.

STATISTICS PROFILE   The SET STATISTICS PROFILE ON statement will return the same information that the SET SHOWPLAN_ALL ON does as well as execution statistics. The difference is that the T-SQL will be executed.

STATISTICS XML   The SET STATISTICS XML ON statement will return the same information that the STATISTICS PROFILE option does, but in an XML format. Again, the T-SQL will be executed.

These SET options are designed mainly for database developers and/or DBAs who want to send an execution plan in a text format to someone else, as in the case of an email.

The second and more popular way, as shown in Figure 2.2, is graphically through the SSMS environment. SSMS has the ability to display both the actual and estimated execution plans. With the estimated execution plan, your T-SQL script is only parsed and an execution plan estimated based on the best efforts of the query optimizer. The actual execution plan, on the other hand, can be generated only when your T-SQL script is actually executed.

image from book
Figure 2.2: Query plan

Warning

Be careful of drawing conclusions from the estimated execution plan because SQL Server 2005 does not guarantee that it will be the same as the actual execution plan at runtime. Database developers typically use the estimated execution plan as an indication of how their T-SQL query is going to perform without consuming the resources of the SQL Server instance, which could have a dramatic impact on performance in a production environment.

To get more information about a particular operation of the query plan, you can move your mouse over that icon until a notepad comes up, as shown in Figure 2.3.

Note

The graphical execution plan contains a lot of information that is often overlooked. For example, the width of the arrows linking the nodes indicates the amount of data that was passed across. You can determine the number of rows passed across by holding the mouse over the arrow.

image from book
Figure 2.3: Additional query plan operation information

Tip

For even more detailed information about a particular operation of the execution plan, you can view the Properties window by pressing F4.

You’ll now look at the different elements that make up the graphical execution plan. Note that not all of these elements will be visible for every operation in an execution plan.

Physical Operation The physical operation performed by the query, such as a bookmark lookup, hash join, nested loop, and so on. Physical operators correspond to an execution algorithm and have costs associated with them.

Tip
You should watch out for physical operators in the query execution plan that are displayed in red, because they typically indicate some sort of a problem, such as missing statistics.
Logical Operation The relational algebraic operation used that matches the physical operation; typically, various physical operators can implement logical operation.
Actual Number of Rows The actual number of rows returned by this operation.
Estimated I/O Cost The estimated cost of all I/O resources for this operation.

Tip
The estimated I/O cost should be as low as possible.
Estimated CPU Cost The estimated cost of all CPU resources for this operation.
Estimated Operator Cost The estimated cost of performing this operation.

Note
The estimated operator cost is also represented as a percentage of the overall cost of the query in parentheses.
Estimated Subtree Cost The estimated cost of performing this operation and all preceding operations in its subtree.
Estimated Number of Rows The estimated number of rows returned by the operation.

Tip
You should watch out for a large discrepancy between the Estimated Number of Rows value and the Actual Number of Rows value.
Estimated Row Size The estimated size of the rows, in bytes, retrieved by the operation
Actual Rebinds/Actual Rewinds The number of times the physical operator needed to initialize itself and set up any internal data structures. A rebind indicates that the input parameters changed and a reevaluation was done. The rewind indicates that existing structures were used.
Ordered Whether the rows returned by this operation are ordered.
Node ID A unique identifier for the node.
Object/Remote Object The database object that this operation accessed.
Output List The list of outputs for this particular operation.

Note

For more information about the different types of logical operators, search for the topic “Graphical Execution Plan Icons (SQL Server Management Studio)” in SQL Server 2005 Books Online.

Understanding the execution plan comes with experience, so start using them as soon as you can. We always advise students to have the option to see them in SSMS turned on and observe them as you are developing queries. It’s all about exposure to them.

You’ll examine the different ways in which you can generate an execution plan in Exercise 2.2.

Exercise 2.2: Generating Query Plans

In this exercise, you will examine the different ways in which you can generate an execution plan for analysis.

Open SQL Server Management Studio, and connect to your SQL Server 2005 instance using Windows authentication.

Click the New Query button, connect to your SQL Server 2005 instance, and choose the AdventureWorks database.

Type the following query into the query pane:

 SELECT MIN(SalesOrderID), MAX(SalesOrderID) FROM Sales.SalesOrderHeader

Click the Include Actual Execution Plan button on the toolbar.
Execute the query.
Click the Execution Plan tab in the bottom pane. Examine the various components of the execution plan, as shown here.
When you have finished, deselect the Include Actual Execution Plan button on the toolbar.

Modify the query as follows:

 SET SHOWPLAN_ALL ON GO SELECT MIN(SalesOrderID), MAX(SalesOrderID) FROM Sales.SalesOrderHeader

Execute the query.
Examine the various columns of information in the execution plan.
Exit SQL Server Management Studio when you have finished.

Analyzing Query Plans

It’s all about picking your battles! In other words, you could almost argue that every query will have a bottleneck or some component that is the slowest. Consequently, you need to know where to look and what to look for. You should start with analyzing the query plans of the most resource expensive and/or the slowest queries of your SQL Server 2005 database solution.

So, you should follow this methodology:

Identify the most expensive batches (or queries) that are running on your SQL Server 2005 instance. Identify the batches (or queries) that are running the slowest on your SQL Server 2005 instance.
Identify what queries are taking up the highest percentage of the resources of the batches identified previously through the query plan.
Identify the needless or expensive operations within the query plan of the queries identified previously.

When analyzing the query plans, which can for obvious reasons be quite complex, you can watch out for the following potentially expensive and/or needless operations:

Table scans Table scans on large tables can be a particularly expensive operation in a SQL Server 2005 instance, especially where memory is a limited resource. Don’t forget that if you can avoid using table scans, you free up the buffer pool’s memory for more “useful” data.

Tip

You should not bother with table scans on smaller tables. In fact, in certain cases table scans can be more efficient than index seeks. So, do not get carried away with eliminating table scans, especially if you are confident that you have an appropriate indexing strategy on that table.

Hash operations In certain cases, typically where no useful indexes exist, SQL Server 2005 will execute an operation such as a JOIN or a GROUP BY by using a hash operation. Hash operations are particularly computer-intensive. They can also consume a lot of memory for the internal structures generated. Hash recursion or hash bailout will further reduce performance.

Tip
Hash operations typically suggest the need for an index.

Bookmark lookups Excessive bookmark lookups can potentially contribute to poor query performance. A bookmark lookup is where SQL Server 2005 in the execution of a query used a book-mark (row ID or clustering key) from a nonclustered index to look up the corresponding data row in the table (heap or clustered index). If you have a lot of bookmark lookups for a query, this jumping between the leaf level of the nonclustered index and the table can be inefficient.

Tip

Excessive bookmark lookups typically suggest the need for a covering index.

Note

As an aside, bookmark lookups are no longer technically called that in SQL Server 2005. They are now called either clustered index seeks or RID lookups. To confuse you further, in SQL Server 2005 Service Pack 2, the key lookup operator replaces the clustered index seek.

Sorts Sorting operations can likewise needlessly waste your server resources. SQL Server 2005 has to stream the query’s result set into the tempdb system database, sort it, and then stream it to the query.
Filtering As with sorting, filtering operations can needlessly consumer server resources.

Once you have analyzed the query plan and identified potentially inefficient operations, you will need to resolve these potential performance issues.

Resolving Query Performance Issues

Resolving query performance issues can be particularly challenging, especially for the DBA. Not only do you have to know SQL Server 2005’s architecture and how it works, but you also need to know two crucial points:

How your database users are using the database solution
The nature of the data being queried:
- Statistics
- Selectivity
- Density

These two points are critical to know in SQL Server 2005 because it uses a cost-based optimization model. Unfortunately, the user and data patterns are typically more part of the developer’s (or business analyst’s) domain.

In any case, you should consider the following techniques of resolving your query performance issues.

Run the Database Tuning Advisor

This should be obvious by now.

Implement Appropriate Indexes

Perhaps the most common reason for poor query performance is a poor indexing strategy- or a complete lack of an indexing strategy, as we have witnessed so many times. (We just saw it again at a television channel where one of us was performing a SQL Server health check last week.)

As a DBA, you will most likely have to liaise with database developers and database users so as to be able to implement an appropriate indexing strategy.

For completeness sake, we should also mention that you should check to see whether indexes are disabled (a new feature in SQL Server 2005), in which case you will have to enable them.

Rewrite the Query

Another obvious technique is to rewrite the query. This generally falls on the database developer, though, so go and harass him or her! As a DBA, it is usually sufficient to identify the poorly performing queries.

You can employ all sorts of considerations and techniques when writing T-SQL queries: use joins versus subqueries, appropriately use user-defined functions, avoid cursors, and so on.

Update Statistics

Another common cause of poor query performance that is commonly overlooked is inaccurate/ out-of-date statistics. Accurate statistical information about your data is almost as important as an appropriate indexing strategy. Without accurate statistics, the query optimizer can choose suboptimal query plans.

You can run UPDATE STATISTICS on the appropriate table, index, or view. Otherwise, you can execute the sp_updatestats system stored procedure, which will update statistics for all the appropriate objects in the database.

Note

You can use the sp_autostats system stored procedure to display or change the automatic updating of statistics for tables and indexes.

In particular, do not forget to check the database options as part of your troubleshooting methodology:

AUTO_CREATE_STATISTICS
AUTO_UPDATE_STATISTICS

We will cover statistics in more detail when we look at maintaining and optimizing indexes later in this chapter.

Use the RECOMPILE Option

In certain cases, SQL Server 2005’s ability to cache and reuse execution leads to suboptimal performance. Your queries might be using atypical or temporary values for their parameters for which the cached execution plan is suboptimal.

In this case, you can use the RECOMPILE option. You can force SQL Server 2005 to recompile the query when it is next executed at a number of different levels:

As part of the stored procedure’s definition so that the stored procedure’s cached execu-tion plan is never reused
As part of the EXECUTE statement
As an optimizer hint (which we will cover shortly)
Through the sp_recompile system stored procedure

Note

Don’t forget to document your reasons for adding the RECOMPILE option.

The following example shows the [uspGetBillOfMaterials] stored procedure being altered in the AdventureWorks database so that its query plan will never be reused:

 USE AdventureWorks ; GO ALTER PROCEDURE [dbo].[uspGetBillOfMaterials]     @StartProductID [int],     @CheckDate [datetime] WITH RECOMPILE AS BEGIN     SET NOCOUNT ON; ...

Statement-Level Compilation

Statement-level recompilation is a new feature in SQL Server 2005. When SQL Server 2005 recompiles stored procedure, it recompiles only the statement that caused the recompilation and not the entire stored procedure (or batch).

The obvious benefit is that SQL Server 2005 has less work to do in the case of a recompilation being triggered. It is important to realize that you cannot compare recompilation counts between SQL Server 2000 and SQL Server 2005.

In SQL Server 2000, the SP:Recompile event class indicated that a stored procedure, trigger, or user-defined function had been recompiled, so it was at the batch level. In SQL Server 2005, the same SP:Recompile event class indicates a recompilation within the batch at the statement level.

Because of this change, the SP:Recompile event class is being deprecated. You should use the SQL:StmtRecompile event class instead.

Use Query Hints

When all else fails, you should consider using query hints. Query hints basically allow you to override the query optimizer by telling it what strategy it should use in a component of the execution plan.

We cannot stress enough that using query hints is considered a last resort, so you should investigate why SQL Server 2005 is executing a query in a particular way that you consider to be incorrect. With all due respect, the chances are that you, not the query optimizer, will be wrong. So often in the field we have encountered scenarios where people have utilized query hints that were not optimal, and this was primarily because of the developers not understanding indexes and how SQL Server 2005 really works internally. Having said that, there are always exceptions.

Note

Don’t forget to document your reasons for adding the query hint.

You can direct a number of query hints at the query optimizer that control how the query optimizer will process the query:

 <query_hint > ::= { { HASH | ORDER } GROUP   | { CONCAT | HASH | MERGE } UNION   | { LOOP | MERGE | HASH } JOIN   | FAST number_rows   | FORCE ORDER   | MAXDOP number_of_processors   | OPTIMIZE FOR ( @variable_name = literal_constant [ , ...n ] )   | PARAMETERIZATION { SIMPLE | FORCED }   | RECOMPILE   | ROBUST PLAN   | KEEP PLAN   | KEEPFIXED PLAN   | EXPAND VIEWS   | MAXRECURSION number   | USE PLAN N'xml_plan' }

We’ll go through some of the more commonly used query hints:

The { HASH |ORDER } GROUP query hint indicates that aggregations generated by operations such as GROUP BY and DISTINCT should use hashing or ordering.
The { CONCAT | HASH | MERGE } UNION query hint indicates that UNION operations should either concatenate, hash, or merge the UNION sets.
The { LOOP | MERGE | HASH } JOIN query hint indicates the type of join to be performed by the query.
The FAST number of rows query hint indicates that the query should be optimized for the first number_rows number of rows.

Note
The FASTFIRSTROW table hint is equivalent to OPTION (FAST 1).
The FORCE ORDER query hint indicates that the tables should be joined in the order that they appear in the query.
The MAXDOP query hint indicates the maximum number of processors to use when executing the query.
The OPTIMIZE FOR ( @variable_name = literal_constant [ , ...n ] ) query hint indicates that the query should be optimized for the specific parameter value specified.
The EXPAND VIEW query hint indicates that the indexed view should not be considered when executing the query.
The RECOMPILE query hint indicates that any cached execution plans should be ignored so that a new and (ideally) optimal query plan can be generated.

Tip
The RECOMPILE query hint is particularly useful for queries (or stored procedures) that execute with greatly different parameter values.

Warning
Be careful with using the RECOMPILE query hint because recompilations consume processor resources.
The USE PLAN query hint indicates that a particular query plan (specified in XML) should be used when executing the query.

You can also tell the query optimizer to use a particular index. The INDEX ( index_val [ ,... n ] ) table hint indicates that a particular index should be used based on its name or index_id value (which can be determined from sys.indexes).

Note

A table hint of INDEX(0) indicates that the query should use a table scan.

Note

A table hint of INDEX(1) indicates that the query should use the clustered index.

We will cover the rest of the table hints in more detail in the “Troubleshooting Concurrency Issues” section later in this chapter, because the topic is more appropriate there.

Tip

Don’t forget that poor query performance might also be attributed to inappropriate optimizer hints. So, you might have to rewrite the queries and remove the query hints.

Use Plan Guides

What about the scenario where you cannot change the query in your SQL Server 2005 database solution because it is a third-party solution and you do not have access to the source code? More and more independent software vendors are using SQL Server as their database engine for their software solutions-unfortunately, with little knowledge of SQL Server itself. So, in a lot of cases, these third-party solutions are giving SQL Server 2005 a bad reputation!

In these circumstances when you cannot or do not want to change the query code, you can take advantage of plan guides in SQL Server 2005.

Note

You can take advantage of plan guides in SQL Server 2005 in other circumstances as well, such as when you want a query to behave in a consistent manner, as is the case for development or benchmarking purposes.

A plan guide is a mechanism that allows you to provide query hints to queries as SQL Server 2005 is executing them. So when a query is executed, if SQL Server 2005 cannot find a cached execution plan, it will generate a new one taking into account the query hints provided by the execution plan.

Note

Plan guides are available in the Developer, Standard, and Enterprise Editions of SQL Server 2005.

There are three different scopes for plan guides in SQL Server 2005:

OBJECT An OBJECT plan guide is used for queries that execute as T-SQL stored procedures, scalar functions, multistatement table-valued functions, or T-SQL DML triggers.
SQL A SQL plan guide is used for ad hoc T-SQL statements or batches.
TEMPLATE A TEMPLATE plan guide is used for stand-alone parameterized queries.

SQL Server 2005 plan guides support the following query hints:

{HASH | ORDER} GROUP
{CONCAT | HASH | MERGE} UNION
{LOOP | MERGE | HASH} JOIN
FAST number_rows
FORCE ORDER
MAXDOP number_of_processors
OPTIMIZE FOR ( @variable_name = literal_constant ) [ ,...n ]
RECOMPILE
ROBUST PLAN
KEEP PLAN
KEEPFIXED PLAN
EXPAND VIEWS
MAXRECURSION number
USE PLAN <xmlplan>

Plan guides have the following restrictions in SQL Server 2005:

Plan guides cannot be created against encrypted objects.
Plan guides cannot be created against DDL triggers.
Plan guides apply only within the database in which they were created.
The query text matching used by the SQL and TEMPLATE scopes has to be exact, including comments and whitespace.

SQL Server 2005 provides two stored procedures to create and manage plan guides:

sp_create_plan_guide The sp_create_plan_guide system stored procedure is used to create a plan guide.
sp_control_plan_guide The sp_control_plan_guide system stored procedure is used to enable, disable, or drop a plan guide.

The following example shows a SQL plan guide being created for the SELECT TOP 10 PERCENT * FROM Production.Product ORDER BY ListPrice DESC T-SQL statement. In this case, you want to ensure that the T-SQL statement will never use more than one processor because it is not an important query:

 EXEC sp_create_plan_guide     @name = 'PlatypusPlanGuide',     @stmt = 'SELECT TOP 10 PERCENT *               FROM Production.Product               ORDER BY ListPrice DESC',     @type = 'SQL',     @module_or_batch = NULL,     @params = NULL,     @hints = 'OPTION (MAXDOP 1)' ;

SQL Server 2005 provides the sys.plan_guides database catalog view to show what plan guides exist in a database.

Note

For more information about plan guides, search for the “Optimizing Queries in Deployed Applications by Using Plan Guides” topic in SQL Server 2005 Books Online.

Maintaining and Optimizing Indexes

Another SQL Server book…another section about indexes. The fundamentals of indexes have not changed since SQL Server 7.0 (since SQL Server 6.0 really, although that can be debated because there have been some minor architectural changes). But you really should focus on the physical side of indexing because you are predominantly concerned about the maintenance and optimization of the index structures.

Note

Of course, SQL Server 2005 has new XML indexes, improved full-text indexes, and enhancements to nonclustered indexes (included columns), but they are beyond the scope of this book.

Index Architecture

Indexes are fundamentally structures internal to SQL Server, called balanced trees (B-trees), which predominantly improve performance for data access because they allow the database engine to access the data quicker by traversing the B-tree. They are also the mechanism that SQL Server uses to enforce uniqueness. It’s as simple as that!

It helps to understand what a B-tree is, so Figure 2.4 shows an index on the [CustomerID] field of a [Customers] table. The top of our B-tree is known as a root, and the bottom-most level is known as the leaf level.

image from book
Figure 2.4: SQL Server index (B-tree)

The disadvantages of indexes is that that they consume space and can potentially slow down the performance of your Data Manipulation Language (DML) operations, especially if you create too many of them, because SQL Server has to “maintain” the B-tree structures in real time.

So, optimizing indexes is about finding a balance between the performance gains and overhead on the database engine. Ultimately, and not knowing this is a common mistake, the most important factors are your users and the data usage patterns. We know you didn’t want to hear this, but the point is, you might have created what you think is the world’s best indexing strategy, but it’s based on assumptions on what data you think your users will be accessing.

Note

Generally, you should not be overly concerned about indexes during your database design. You really should be determining your indexing strategy after determining your users’ data usage patterns.

In the real world this rarely happens, so a basic indexing strategy is usually incorporated into the initial database design.

The syntax for creating an index is as follows:

 CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name     ON <object> ( column [ ASC | DESC ] [ ,...n ] )     [ INCLUDE ( column_name [ ,...n ] ) ]     [ WITH ( <relational_index_option> [ ,...n ] ) ]     [ ON { partition_scheme_name ( column_name )          | filegroup_name          | default          }     ] [ ; ] <object> ::= {     [ database_name. [ schema_name ] . | schema_name. ]         table_or_view_name } <relational_index_option> ::= {     PAD_INDEX  = { ON | OFF }   | FILLFACTOR = fillfactor   | SORT_IN_TEMPDB = { ON | OFF }   | IGNORE_DUP_KEY = { ON | OFF }   | STATISTICS_NORECOMPUTE = { ON | OFF }   | DROP_EXISTING = { ON | OFF }   | ONLINE = { ON | OFF }   | ALLOW_ROW_LOCKS = { ON | OFF }   | ALLOW_PAGE_LOCKS = { ON | OFF }   | MAXDOP = max_degree_of_parallelism }

Note

You can create an index incorporating multiple columns, but you still have the 16-column limit. This has not changed from previous versions.

For completeness sake, the syntax for creating an XML index is as follows:

 CREATE [ PRIMARY ] XML INDEX index_name     ON <object> ( xml_column_name )     [ USING XML INDEX xml_index_name         [ FOR { VALUE | PATH | PROPERTY } ]     [ WITH ( <xml_index_option> [ ,...n ] ) ] [ ; ] <object> ::= {     [ database_name. [ schema_name ] . | schema_name. ]         table_name } <xml_index_option> ::= {     PAD_INDEX  = { ON | OFF }   | FILLFACTOR = fillfactor   | SORT_IN_TEMPDB = { ON | OFF }   | STATISTICS_NORECOMPUTE = { ON | OFF }   | DROP_EXISTING = { ON | OFF }   | ALLOW_ROW_LOCKS = { ON | OFF }   | ALLOW_PAGE_LOCKS = { ON | OFF }   | MAXDOP = max_degree_of_parallelism }

The main option to be aware of is the type of index you’re creating, clustered or nonclus-tered; you will look at this shortly.

Realistically, you’ll probably never have to implement this, but you should be familiar with what the FILLFACTOR option does. Basically, the FILLFACTOR option stipulates how much of the 8KB pages of the B-tree that SQL Server uses to store rows during the index creation.

Note

To get a deeper understanding of where to use the FILLFACTOR option (and page splits), search for the “Fill Factor” topic in SQL Serve Books Online.

The reason not to use all of the available space in a page during the initial index creation is to lessen the impact on the performance of future DML operations. So, you would implement a FILLFACTOR setting only for indexes on volatile tables where you anticipate or are experiencing heavy OLTP activity and suspect that is the cause of performance problems.

Note

A FILLFACTOR setting of 0 or 100 is identical.

The PAD_INDEX option applies to the FILLFACTOR option and basically dictates whether the fill factor setting should also be applied to the nonleaf levels of the B-tree.

SQL Server 2005 includes a new option when creating indexes to improve performance. We will discuss this INCLUDE option shortly when we cover nonclustered indexes.

Note

Don’t forget that a fill factor will internally fragment your data. We discuss that in more detail in Chapter 5, “Designing a Strategy to Maintain a Database Solution.”

Clustered Indexes

Creating a clustered index on a table has the effect of rearranging the data in your table so that the index order and the physical order are one and the same. Structurally this is because the leaf level of your clustered index is the actual data rows that make up the table.

Note

A table without a clustered index on it is known as a heap.

So if you create a clustered index based on the [LastName] field on your [Customer] table, your table will be physically stored in order of the customer’s surnames. Alternatively, if you create a clustered index on the [FirstName] field, all the customers will be stored in order of the customer’s names. Simple!

Because the clustered index determines the physical order in which the table is stored, you can have only one clustered index per table. After all, the clustered index is the table.

Clustered indexes work well where the data is highly selective but also where the data is dense.

Note

Think of selectivity as the degree of uniqueness of your data. So, the [LastName] field should have a higher selectivity when compared to the [FirstName] field.

Note

Density refers more to the number of duplicates in your data. So, the “Smith” value probably will be denser than the “Isakov” value in the [LastName] field.

This functionally translates into the fact that clustered indexes work well for point queries, range queries, join operations (which are really either a point or range query), or pretty much most queries.

The following example shows a clustered index being created on the [LastName] field of the [Customer] table:

 CREATE CLUSTERED INDEX [CLI_LastName] ON Customer(LastName)

Nonclustered Indexes

Nonclustered indexes are the default in SQL Server 2005 when you are creating indexes. They are separate B-tree structures from the table. Consequently, you can have more than one non-clustered index. In fact, you should be able to create 249 nonclustered indexes on a table (though we’ve never tried it).

Tip

Don’t create 249 nonclustered indexes on a table!

Nonclustered indexes work well for where the data is highly selective, but they have limited value when the data is dense. So, an inappropriate nonclustered index might have all of the overhead with no performance benefits.

Functionally, nonclustered indexes work well in point queries but have limited value for range queries. Why? Well, simply put, the data is not “clustered” together but all over the place, unlike with clustered indexes.

The following example shows a nonclustered index being created on the [FirstName] field of the [Customer] table:

 CREATE NONCLUSTERED INDEX [NCI_FirstName] ON Customer(FirstName)

You also have the capability of creating nonclustered indexes on multiple columns. They are commonly referred to as composite or compound indexes. There are two primary reasons for doing this. The first is to facilitate queries that have search arguments based on multiple columns from the same table. The second is to reduce the overall number of indexes that SQL Server 2005 has to maintain.

So, instead of creating a nonclustered index on the [LastName] and [FirstName] fields, as they are frequently searched, we should consider creating a composite index on the [LastName, FirstName] combination. The following example shows a nonclustered index being created on multiple fields of the [Customer] table:

 CREATE NONCLUSTERED INDEX [NCI_CompundIndex] ON Customer(LastName, FirstName)

Tip

When creating nonclustered indexes, remember that the more columns you add, the less likely the index will be used because it is getting too wide. Keep the index-to-table-size ratio in mind. Of course, there are exceptions-called covering indexes-and you will examine them shortly.

SQL Server 2005 has a new index option that allows you to improve nonclustered indexes by including nonkey columns in the index DDL. The columns that are included by the INCLUDE clause are stored only in the leaf levels of the index and consequently are not subject to the 16-column limit. This allows for the creation of larger covering indexes.

Here are the other restrictions:

One key column must be defined.
A maximum of 1,023 columns can be included.
Included columns cannot be repeated in the INCLUDE list.
Columns cannot be defined in both the nonclustered index and the INCLUDE list.

The following example shows a nonclustered index being created on the [Country] field of the [Customer] table, which includes three more fields:

 CREATE NONCLUSTERED INDEX [NCI_PassportNumber] ON Customer(Country) INCLUDE (Address, Region, Postcode)

Did we mention covering indexes earlier? They represent the real art of indexing in SQL Server. The idea behind a covering index is to create an index that can cover important queries, thereby avoiding the need for the queries to go to the underlying table.

Real World Scenario-Covering Indexes

To illustrate the concept of covering indexes, let’s assume you have some sort of a [Customers] table that has a large number of columns, say more than 500, that contain demographical and other statistical information. So, the partial table definition would be as follows:

 CREATE TABLE [Customer] (     [CustomerID]      INT             NOT NULL,     [Name]            VARCHAR(20)     NULL,     [SurName]         VARCHAR(50)     NULL,     [PhoneNumber]     VARCHAR(20)     NULL,     [FaxNumber]       VARCHAR(20)     NULL,     [PassportNumber]  CHAR(8)         NULL,     [Address]         VARCHAR(50)     NULL,     [Region]          VARCHAR(20)     NULL,     [PostCode]        VARCHAR(10)     NULL,     [Country]         VARCHAR(20)     NULL,     [Sex]             CHAR(1)         NULL,     [MarriageStatus]  CHAR(1)         NULL,     [Salary]          MONEY           NULL,     [Dogs]            TINYINT         NULL,     [Cats]            TINYINT         NULL,     [Platypus]        BIGINT          NULL,     [Kids]            TINYINT         NULL,     ... CONSTRAINT [PK_CustomerNumber] PRIMARY KEY CLUSTERED ([CustomerID]) ) ; GO

This table has more than 20,000,000 records, so as you would expect, queries against this table are going to be slow. Call-center personnel frequently want to call up a particular customer, so run the following queries:

 SELECT * FROM [Customer] WHERE CustomerID = @ID ; SELECT * FROM [Customer] WHERE Surname = @Surname ; SELECT * FROM [Customer] WHERE Surname = @Surname AND Name = @Name ;

You want to improve performance so you can create two separate indexes for the [Surname] and [Name] fields. To reduce the number of indexes, a better choice might be to create a single index that includes the [Name] and [Surname] fields:

 CREATE INDEX [NCI_Surname_Name] ON Customer(Surname, Name) ; GO

This will most likely improve performance for their original set of queries. But you can substantially improve performance further by reworking the queries to the following:

 SELECT ID, Name, Surname, Phone FROM [Customer] WHERE ID = @ID ; SELECT ID, Name, Surname, Phone FROM [Customer] WHERE Surname = @Surname ; SELECT ID, Name, Surname, Phone FROM [Customer] WHERE Surname = @Surname AND Name = @Name ;

Why? Because you have now created a covering index!

In other words, all the “unknowns” requested by the query, in this case the telephone number, are located within the B-tree of the nonclustered index. There is no need for SQL Server to go through the additional step of traversing the table.

Considering the underlying size of the table, you have substantially improved performance because your queries can be serviced by a much smaller B-tree.

So, that’s SQL Server 2005 indexes in a nutshell. It’s quite a large nutshell! Now we’ll cover the maintenance and optimization issue.

Index Optimization

More than 10 years ago, there was a common misconception that you needed to reindex your indexes because they were not up-to-date. Nothing can be further from the truth because SQL Server always keeps the indexes up-to-date in real time. (That’s why you can degrade performance of your online transaction processing [OLTP] databases if you overindex them.) The real reason has to do with fragmentation and its impact on performance.

Table/Index Fragmentation

Fragmentation basically exists when the logical ordering of pages in the leaf level of the B-tree does not match the physical ordering of the pages in the database data file. The leaf pages of the index have a pointer to the next and the previous page in their headers (as do intermediate pages). Figure 2.5 shows this doubly linked list, which is the logical order that we are talking about.

image from book
Figure 2.5: Doubly linked list at the leaf level of the index (B-tree)

As data in your tables is modified through DML operations, the data pages of the index will have to be split to accommodate new records or records that have increased in size. This process of page splitting involves SQL Server’s storage engine allocating a new page to this particular index, which might not be located on a contiguous extent, let alone a contiguous page. So after a period of time, the physical order of the doubly linked list from Figure 2.5 might look something like Figure 2.6.

image from book
Figure 2.6: Physical location of pages that make up the doubly linked list in Figure 2.5

Because of this fragmentation, queries that invoke table scans or partial scans can be substantially slower. This is because the physical order of the doubly linked list does not match the logical order, so disk throughput will degrade because the disk drive heads must jump back and forth reading the various extents that the next page is located on, instead of being able to sequentially read all of the pages if they were contiguous.

Note

This external fragmentation affects disk I/O performance only. It does not adversely affect queries whose data pages reside in the SQL Server’s buffer pool. We discuss the difference between external and internal fragmentation, and how to determine it, in Chapter 5.

When you rebuild the index, SQL Server 2005 basically creates a new B-tree before wiping out the old version of it, so effectively everything becomes contiguous again. Figure 2.7 shows the result of such an operation.

image from book
Figure 2.7: Physical location of pages after the index has been rebuilt

Detecting Fragmentation

As we indicated earlier, you should no longer be using the DBCC SHOWCONTIG command to determine the level of fragmentation that exists, because the command is being deprecated. You should use the sys.dm_db_index_physical_stats DMV to determine the level of fragmentation for an index or table, as the case might be.

Table 2.1 describes the important columns of the sys.dm_db_index_physical_stats DMV to watch.

Table 2.1: Description of `sys.dm_db_index_physical_stats` Columns
Open table as spreadsheet
Column	Description
avg_fragmentation_in_percent	Percent of logical fragmentation. (Pages out of order.)
fragment_count	The number of fragments in the index. (A fragment is a set of physically consecutive leaf pages.)
avg_fragment_size_in_pages	Average number of pages in one fragment in an index.

The following example shows the sys.dm_db_index_physical_stats DMV being queried for the [Production].[Product] table in the AdventureWorks database:

 USE AdventureWorks ; GO SELECT i.index_id,        name,        avg_fragmentation_in_percent,        avg_fragment_size_in_pages,        fragment_count FROM   sys.dm_db_index_physical_stats       (DB_ID(),        OBJECT_ID('Production.Product'),        NULL,        NULL,        NULL) AS dmv JOIN   sys.indexes AS i ON     dmv.object_id = i.object_id AND    dmv.index_id = i.index_id ; GO

It has the following output:

 index_id name                  avg_fragmentation_in_percent -------- --------------------- ---------------------------- 1        PK_Product_ProductID  23.0769230769231 2        AK_Product_ProductNum 50.0 3        AK_Product_Name       66.6666666666667 4        AK_Product_rowguid    50.0 (4 row(s) affected)

Of course, the million-dollar question is, What constitutes a bad level of fragmentation? Well, that’s a bit of a tough one, because it depends.

However, Microsoft recommends the following guidelines. If the avg_fragmentation_ in_percent value is greater than 30 percent, you should rebuild the index. If the avg_ fragmentation_in_percent value is between 5 percent and 30 percent, you should reorganize the index. If the avg_fragmentation_in_percent value is less than 5 percent, don’t bother….

You’ll now look at the difference between reorganizing and rebuilding the index.

Maintaining Indexes

Basically, you can employ three techniques to solve your index fragmentation problem:

Reorganize the index.
Rebuild the index.
Drop and re-create the index.

Tip

To defragment a heap, you have to create a clustered index on the heap and then drop the clustered index.

In SQL Server 2005, the statement you generally use now to perform the majority of these actions (well, two of them anyway) is the ALTER INDEX statement. The syntax for altering the index is as follows:

 ALTER INDEX { index_name | ALL }     ON <object>     { REBUILD         [ [ WITH ( <rebuild_index_option> [ ,...n ] ) ]           | [ PARTITION = partition_number                               [ WITH ( <single_partition_rebuild_index_option>                         [ ,...n ] )                 ]             ]         ]     | DISABLE     | REORGANIZE         [ PARTITION = partition_number ]         [ WITH ( LOB_COMPACTION = { ON | OFF } ) ]         | SET ( <set_index_option> [ ,...n ] )     } [ ; ] <object> ::= {     [ database_name. [ schema_name ] . | schema_name. ]         table_or_view_name } <rebuild_index_option > ::= {     PAD_INDEX  = { ON | OFF }   | FILLFACTOR = fillfactor   | SORT_IN_TEMPDB = { ON | OFF }   | IGNORE_DUP_KEY = { ON | OFF }   | STATISTICS_NORECOMPUTE = { ON | OFF }   | ONLINE = { ON | OFF }   | ALLOW_ROW_LOCKS = { ON | OFF }   | ALLOW_PAGE_LOCKS = { ON | OFF }   | MAXDOP = max_degree_of_parallelism } <single_partition_rebuild_index_option> ::= {     SORT_IN_TEMPDB = { ON | OFF }   | MAXDOP = max_degree_of_parallelism } <set_index_option>::= {     ALLOW_ROW_LOCKS= { ON | OFF }   | ALLOW_PAGE_LOCKS = { ON | OFF }   | IGNORE_DUP_KEY = { ON | OFF }   | STATISTICS_NORECOMPUTE = { ON | OFF } }

REORGANIZING THE INDEX

Reorganizing an index basically defragments the leaf level of the index by physically reordering the leaf-level pages to match the logical order of the leaf-level pages. Reorganizing also compacts the index pages based on the fill factor setting.

Reorganizing an index is an online operation and uses the least system resources. It does not hold long-term blocking locks, so it doesn’t block running queries or updates.

The REORGANIZE clause is used in the ALTER INDEX statement to reorganize an index.

Warning

You should no longer use the DBCC INDEXDEFRAG statement because it is being deprecated by Microsoft.

The following example shows the [AK_Product_Name] index in the AdventureWorks database being reorganized:

 USE AdventureWorks ; GO ALTER INDEX [AK_Product_Name] ON [Production].[Product] REORGANIZE ;

REBUILDING THE INDEX

Rebuilding an index basically drops the index and creates a new index in its place. The new index will initially be completely contiguous, which, as discussed, will translate to good disk performance.

The REBUILD clause is used in the ALTER INDEX statement to reorganize an index.

Warning

You should no longer use the DBCC DBREINDEX statement because it is being deprecated by Microsoft.

The following example shows the [AK_Product_Name] index in the AdventureWorks database being rebuilt using the REBUILD clause with a new fill factor setting of 69:

 USE AdventureWorks ; GO ALTER INDEX [AK_Product_Name] ON [Production].[Product] REBUILD WITH (FILLFACTOR = 69) ;

Alternatively, you can rebuild an index by using the CREATE INDEX statement with the DROP_EXISTING clause. The difference between using this technique to the previous one includes the following:

The CREATE INDEX ... WITH DROP_EXISTING statement allows you to change the index definition:
- Add key columns.
- Remove key columns.
- Change the column order.
- Change the column sort order.

The CREATE INDEX ... WITH DROP_EXISTING statement allows you to move the index to another file group.
The CREATE INDEX ... WITH DROP_EXISTING statement allows you to repartition a partitioned index.
The ALTER INDEX ... REBUILD statement allows you to rebuild more than one index in a single transaction.
The ALTER INDEX ... REBUILD statement allows you to rebuild a single index partition.

The following example shows the [AK_Product_Name] index in the AdventureWorks database being rebuilt using the DROP_EXISTING clause with a new fill factor setting of 69:

 USE AdventureWorks ; GO CREATE INDEX [AK_Product_Name] ON [Production].[Product](Name) WITH (FILLFACTOR = 69, DROP_EXISTING = ON) ;

DROPPING AND RE-CREATING THE INDEX

For completeness sake, you can also drop an index using the DROP INDEX statement and re-create it with the CREATE INDEX statement. Microsoft does not recommend this technique for a number of reasons, including the resources required and locks acquired.

The following example shows the [AK_Product_Name] index in the AdventureWorks database being re-created using this nonrecommended technique:

 USE AdventureWorks ; GO DROP INDEX [AK_Product_Name] ON [Production].[Product] ; GO CREATE INDEX [AK_Product_Name] ON [Production].[Product](Name)

It’s important for you to thoroughly understand the various index maintenance commands, so you’ll examine them in more detail through Exercise 2.3.

Exercise 2.3: Maintaining Indexes

As we’ve said, it is important to get some exposure to what the various index maintenance commands do. The first step in this exercise is to look at the properties of the index and its potential level of fragmentation.

Open SQL Server Management Studio, and connect to your SQL Server 2005 instance using Windows authentication.

Expand the Databases, AdventureWorks, Tables, Production.Product, and Indexes folders in Object Explorer. Right-click the AK_Product_ProductNumber index, and choose Properties. Click the Options page. Examine and record the fill factor setting. You should see a window similar to the one shown here.
Click the Fragmentation page to examine the level of fragmentation. Examine and record the number of pages taken up by the [AK_Product_ProductNumber] nonclustered index. You should see a dialog box similar to the one shown here. Close the Index Properties window.

Back in SQL Server Management Studio, click the New Query toolbar button to open a new query window.
Let’s first rebuild the index using the ALTER INDEX statement. Use a new fill factor setting of 30 percent. Type the following T-SQL code, execute it, and observe the results:
```
 USE AdventureWorks ; GO ALTER INDEX [AK_Product_ProductNumber] ON [Production].[Product] REBUILD WITH (FILLFACTOR = 30)
```
Examine the Fragmentation page again by following steps 2 and 3 again. You should see a dialog box similar to the one shown here. Examine the new value for the number of pages taken up by the [AK_Product_ProductNumber] nonclustered index. You should see that the nonclustered index now consumes more pages because the 30 percent fill factor setting used. Close the Index Properties window.

Let’s now rebuild the index using the CREATE INDEX statement. You will use the original fill factor setting of 100 percent. Type the following T-SQL code, execute it, and observe the results:
```
 USE AdventureWorks ; GO CREATE UNIQUE NONCLUSTERED INDEX [AK_Product_ProductNumber] ON [Production].[Product](ProductNumber) WITH (FILLFACTOR = 100, DROP_EXISTING = ON)
```
If you want to check how many pages the index now takes up, you know what to do now.
Now, you’ll update the statistics for the [Product] table using a full scan. Type the following T-SQL code, execute it, and observe the results:
```
 USE AdventureWorks ; GO UPDATE STATISTICS [Production].[Product] WITH FULLSCAN ;
```
Finally, you’ll examine the statistics for the [AK_Product_Product] nonclustered index. Type the following T-SQL code, execute it, and observe the results:
```
 USE AdventureWorks ; GO DBCC SHOW_STATISTICS ('[Production].[Product]', [AK_Product_ProductNumber])
```
As you would expect, you have high selectivity/low density for the [AK_Product_ ProductNumber] nonclustered index. After all, it is unique! You should also see the Updated date reflecting when you updated the statistics a couple of seconds ago.

Optimizing Statistics

Another potentially important component of your index maintenance strategy is to ensure that statistics are up-to-date. We are certainly not going to discuss statistics in detail here, but suffice it to say that accurate statistics are critical to the query performance in SQL Server 2005. Up-to-date statistics allow SQL Server’s query optimizer to accurately determine a high-quality execution plan.

You can view the current distribution statistics with the DBCC SHOW_STATISTICS command. Good luck with understanding it all, though!

The syntax for the DBCC SHOW_STATISTICS command is as follows:

 DBCC SHOW_STATISTICS ( 'table_name' | 'view_name' , target ) [ WITH [ NO_INFOMSGS ] < option > [ , n ] ] < option > :: =     STAT_HEADER | DENSITY_VECTOR | HISTOGRAM

The following example shows the statistics for the [AK_Product_Name] unique nonclus-tered index in the [Production].[Product] table of the AdventureWorks database:

 USE AdventureWorks; GO DBCC SHOW_STATISTICS ('Production.Product', 'AK_Product_Name') ; GO

This has the example output shown here.

image from book

Note

Eric Hanson has an excellent paper on statistics titled “Statistics Used by the Query Optimizer in Microsoft SQL Server 2005” that is located at http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx.

In most cases, you will not have to bother about updating statistics because they are usually automatically updated as required by the query optimizer. However, in certain cases, you might decide to turn off this behavior through the database option:

 ALTER DATABASE database_name SET AUTO_UPDATE_STATISTICS OFF

Why would you do this? Well, in some rare cases, you would rather that SQL Server did not update statistics in the middle of your work hours, because that can potentially impact performance. Alternatively, you might want to fine-tune the sampling rate that is being used. In any case, what you would do in these cases is to schedule the updating of statistics to ensure optimal performance. We have, for example, seen some environments where statistics are updated nightly and indexes are rebuilt on the weekend when the maintenance window is longer.

Tip

Don’t make the mistake of scheduling a reindex and an update of statistics on the same index at the same time. Reindexing by its nature will ensure that the statistics are up-to-date. You will not believe how many times we have seen this in the field!

The syntax for updating statistics is as follows:

 UPDATE STATISTICS table | view     [         {             { index | statistics_name }           | ( { index |statistics_name } [ ,...n ] )                 }     ]     [    WITH         [             [ FULLSCAN ]             | SAMPLE number { PERCENT | ROWS } ]             | RESAMPLE             | <update_stats_stream_option> [ ,...n ]         ]         [ [ , ] [ ALL | COLUMNS | INDEX ]         [ [ , ] NORECOMPUTE ]     ] ; <update_stats_stream_option> ::=     [ STATS_STREAM = stats_stream ]     [ ROWCOUNT = numeric_constant ]     [ PAGECOUNT = numeric contant ]

Here’s how the syntax breaks down:

The FULLSCAN option indicates that all rows in table should be read to gather statistics.
The SAMPLE option indicates that a percentage of the table or number of rows should be read to gather statistics.

Note
The default behavior for SQL Server 2005 is to take a sample based on the size of the table.
The NORECOMPUTE option indicates that statistics that become out-of-date should not be automatically updated (recomputed).

The following example shows all statistics being updated for the [Production].[Product] table by scanning all rows:

 USE AdventureWorks ; GO UPDATE STATISTICS [Production].[Product] WITH FULLSCAN ; GO

AUTO_UPDATE_STATISTICS_ASYNC DATABASE OPTION

SQL Server 2005 supports a new AUTO_UPDATE_STATISTICS_ASYNC database option that controls how statistics are automatically updated. Usually when a query triggers an automatic updating of statistics, the query has to wait until the statistics are updated before continuing. In other words, it is a synchronous process.

You can use the AUTO_UPDATE_STATISTICS_ASYNC database option to turn off the wait of the query. So, the query does not wait until the statistics are updated before continuing on with its execution.

However, it will be using out-of-date statistics and consequently might not be using an optimal execution plan, unlike subsequent queries.

Note

The AUTO_UPDATE_STATISTICS_ASYNC database option has no effect if the AUTO_UPDATE_STATISTICS database option is off.

Enforcing Appropriate Stored Procedure Logging and Output

We’re a bit confused about what the “Enforce appropriate stored procedure logging and out-put” exam objective is getting at. Is it talking about the logging of a stored procedure execu-tion, in which case you can look at the relevant SQL trace event classes? Or is it talking about the logging of stored procedures internally, in which case this is done programmatically? In any case, enforcing appropriate stored procedure logging, whatever that might be, is another matter altogether.

Cristian Lefter’s suggestion of the new TRY/CATCH construct in SQL Server 2005 is worth investigating. So, let’s get at it: SQL Server 2005 has a new structured exception handling technique through the TRY/CATCH construct. The BEGIN TRY/END TRY block contains the T-SQL code where an error might be generated. The BEGIN CATCH/END CATCH block contains the exception handler.

Note

The CATCH block should immediately follow the TRY block.

The following errors cannot be handled by the CATCH/TRY construct:

Compile errors
Statement-level recompilation errors

The following functions are available in the CATCH block:

ERROR_NUMBER()
ERROR_MESSAGE()
ERROR_SEVERITY()
ERROR_STATE()
ERROR_LINE()
ERROR_PROCEDURE()

The following pseudocode shows an example of how to use the CATCH/TRY construct:

 ... BEGIN TRY     ...     -- T-SQL code representing what you are trying to do     ... END TRY BEGIN CATCH     IF (ERROR_NUMBER() = 1)     BEGIN         -- Perform appropriate action     END     ELSE IF (ERROR_NUMBER() = 2)     BEGIN         -- Perform appropriate action     END     ELSE IF (ERROR_NUMBER() = 3)     BEGIN         -- Perform appropriate action     END     ELSE     BEGIN         -- Perform appropriate action     END END CATCH ...

So, of course your CATCH block could log certain exceptions and write them to an error log or send some sort of notification. Enforcing this, as we discussed, is another matter.

An Iterative Process

We’d like to finish this section of the chapter by saying that troubleshooting and maintaining query performance are iterative processes. Obviously through a database solution’s life cycle, the tables and/or queries might change, which will necessitate reinvestigation.

But don’t forget that SQL Server 2005’s query optimizer uses a cost-based model, which means that over a period of time, as the data in your database solution changes and grows your previous indexing strategy, optimizer hints, and other optimization techniques might no longer be applicable. They could in fact be causing performance problems.

This should also highlight the need to document what you have done previously and maintain a change management log.

But, again, an iterative process is required.