Common Query Optimization Problems

So you've written the query and examined the query plan, and performance isn't what you expected. It might appear that SQL Server isn't choosing the appropriate query plan that you expect. Is something wrong with the query or with the optimizer? Before we delve into a detailed discussion about how to debug and analyze query plans (covered in detail in the next chapter), this section takes a look at some of the more common problems and SQL-coding issues that can lead to poor query plan selection.

Out-of-Date or Unavailable Statistics

Admittedly, this is not as big a problem as it was in SQL Server releases prior to 7.0. Back in those days, the first question asked when someone was complaining of poor performance was, "When did you last update statistics?" If the answer was "Huh?" we usually found the culprit.

With the Statistics Auto-Update and Auto-Create features in SQL Server 2000, this problem is not as prevalent as it used to be. If a query detects that statistics are out of date or missing, it will cause them to be updated or created and then optimize the query plan based on the new statistics.

NOTE

If statistics are missing or out of date, the first query to run that detects this condition might run a bit more slowly as it updates or creates the statistics first, especially if the table is relatively large, and also if it has been configured for FULLSCAN when indexes are updated.

However, do not assume that you always have up-to-date statistics. One suggestion floating around the newsgroups is that having these options on degrades performance in online transaction processing (OLTP) environments, and many DBAs are disabling these options and running update statistics periodically via scheduled jobs (or at least you hope they are). I don't recommend turning these options off arbitrarily; use SQL Profiler or the trace flags described in Chapter 34 to confirm that these options are causing a performance problem before turning them off and doing performance testing to see if performance improves . AutoStats was implemented to ensure that you have up-to-date statistics all the time and does serve a useful purpose.

Another case where you could have bad statistics is if the sample used when the statistics were generated wasn't large enough. Depending on the nature of your data and the size of the table, the statistics might not accurately reflect the actual data distribution and cardinality. If you suspect this is the case, update statistics specifying the FULLSCAN option or a larger sample size so it examines more records to derive the statistics.

Poor Index Design

Poor index design is another reason ”often a primary reason ”why your queries might not optimize as you expect them to. If no supporting indexes exist for the query, or the query contains non-optimizable SARGs that can't use any available indexes, SQL Server will end up either performing a table scan, an index scan, or another hash or merge join strategy that is less efficient. If this appears to be the problem, you need to re-evaluate your indexing decisions or rewrite the query so it can take advantage of an available index. For more information on designing useful indexes, see Chapter 34.

Search Argument Problems

It's the curse of SQL that there are a number of different ways to write a query and get the same resultset. Some queries, however, might not be as efficient as others. A good understanding of the query optimizer can help you avoid writing search arguments that SQL Server can't optimize effectively. This section highlights some of the common "gotchas" encountered in SQL Server SARGs that can lead to poor or unexpected query performance.

Using NonSARGable Expressions

As mentioned previously in the section "Identifying Search Arguments," the query optimizer uses search arguments to help it narrow down the set of rows to evaluate. The search argument is in the form of a WHERE clause that equates a column to a constant, as shown in the following example:

 SELECT column1     FROM table1    WHERE column1 = 123

The query optimizer can use the search argument to examine an index to find the qualifying rows, avoiding having to perform a full table scan. The following is an example of a nonSARGable expression to watch out for, as it cannot be optimized with an index:

 SELECT column1     FROM table1    WHERE column1 / 10 = 123

No SARGs

Watch out for queries in which the SARG might have been left out inadvertently, such as this:

 select title_id from titles

A SQL query with no search argument (that is, no WHERE clause) always performs a table scan unless a nonclustered index can be used to cover the query (see Chapter 34 for a discussion of index covering). If you don't want the query to affect the entire table, be sure to specify a valid SARG that matches an index on the table to avoid table scans .

Negative Logic

Any negative logic (for example, != , <> , not in ) nearly always results in a table scan being performed, unless index covering can be applied (that is, all columns and information referenced in the query can be found within the leaf level of a nonclustered index).

Operations on a Column in `WHERE` Clauses

Any operation on the column side of a WHERE clause causes it to be treated as a nonSARGable expression by SQL Server. Therefore, the optimizer can't match the SARG with an index; typically, a table scan will be performed to satisfy the query unless an index covers the query.

Unknown Values in `WHERE` Clauses

Watch out for expressions where the value cannot be known until runtime. These are expressions that contain local variables or subqueries that can be materialized to a single value. The SQL Server treats these expressions as SARGs but can't use the statistics histogram to estimate the number of matching rows because it doesn't have a value to compare against the histogram values during query optimization. The values for the expressions won't be known until the query is actually executed. What the optimizer does in this situation is use the index density information. The optimizer generally is able to better estimate the number of rows affected by a query when it can compare a known value against the statistics histogram than when it has to use the index density to estimate the average number of rows that match an unknown value. This is especially true if the data in a table isn't distributed evenly. When you can, try to avoid using constant expressions that can't be evaluated until runtime, so that the statistics histogram can be used rather than the density value.

To avoid using constant expressions that can't be evaluated until runtime in WHERE clauses, consider putting the queries into stored procedures and passing in the constant expression as a parameter. Because the optimizer evaluates the value of a parameter prior to optimization, SQL Server evaluates the expression prior to optimizing the stored procedure.

So for best results when writing queries inside stored procedures, use stored procedure parameters rather than local variables in your SARGs whenever possible. This strategy allows the optimizer to optimize the query by using the statistics histogram, comparing the parameter value against the statistics histogram to estimate the number of matching rows. If you use local variables as SARGs in stored procedures, the optimizer is restricted to using index density, even if the local variable is assigned the value of a parameter.

Datatype Mismatches

Another common problem is datatype mismatches. If you attempt to join tables on columns of different datatypes, the query optimizer might not be able to effectively use indexes to compute the join. This can result in a less-efficient join strategy because the SQL Server will have to convert all values first before it can process the query. Avoid this situation by maintaining datatype consistency across your join key columns in your database.

Triggers

If you are using triggers on INSERT , UPDATE , or DELETE , it is possible that your triggers can cause performance problems. You might think that the INSERT , UPDATE , or DELETE is performing poorly when actually it is the trigger that needs to be tuned . Additionally, you might have triggers that fire other triggers. If you suspect you are having performance problems with the triggers, you can monitor the SQL they are executing and the response time, as well as execution plans generated for statements within triggers using SQL Profiler. For more information on monitoring performance with SQL Profiler, see Chapter 7, "Using the SQL Server Profiler."