Step 1: Query Analysis

The first step in query optimization is to analyze each table in the query to identify all search arguments (SARGs), OR clauses, and join clauses. The SARGs, OR clauses, and join clauses will be used in the second step, index selection, to select useful indexes to satisfy a query.

Identifying Search Arguments

A search argument (SARG) is defined as a WHERE clause comparing a column to a constant. The format of a SARG is as follows :

  Column operator constant_expression  [and...]

SARGs provide a way for the optimizer to limit the rows searched to satisfy a query. The general goal is to match a SARG with an index to avoid a table scan. Valid operators for a SARG are any one of = , > , < , >= , and <= , BETWEEN , and sometimes LIKE . Multiple SARGs can be combined with the AND clause (a single index might match some or all of the SARGs AND ed together). Following are examples of optimizable search arguments:

flag = 7
salary > 100000
city = 'Saratoga' and state = 'NY'
price between $10 and $20 (is the same as price > = $10 and price <= $20 )
100 between lo_val and hi_val (is the same as lo_val <= 100 and hi_val >= 100 )
au_lname like 'Sm%' (is the same as au_lname >= 'Sm' and au_lname < 'Sn' )

In some cases, the column in a SARG might be compared with a constant expression rather than a single constant value. The constant expression can be an arithmetic operation, a built-in function, a string concatenation, a local variable, or a subquery result. As long as the left side of the SARG contains a column alone, it's still a SARGable expression.

The LIKE clause will be treated as a SARG only if the first character in the string is a constant. The following statement wouldn't be treated as a SARG:

 au_lname like '%son'

A LIKE clause with a wildcard as the first character is not considered a SARGable expression because it doesn't limit the search. In other words, every row would have to be examined to determine if it were a match. The inequality operator ( != or <> ) isn't a valid operator for a SARG for this same reason ”an index can only help you find matches for a specific value, not everything that doesn't match. Additionally, if any operation is performed on the column, such as a function, it's not considered SARGable, either. Some examples of nonSARGable expressions are as follows:

gender != 'M'
lname = fname (comparison against a column, not a constant expression)
upper(city) = 'POULSBO' (function performed on the column)

TIP

If you have a search clause with a nonSARGable expression, try to rewrite it as a SARG so that it can be optimized by the query optimizer. For example, consider the following query:

 select title from titles where price != 0

If a business rule is enforced on the table to prevent any rows from having a price less than zero, the query could be rewritten as follows:

 select title from titles where price > 0

and still return the same result set. The difference is that the second version contains a SARG that the optimizer will recognize and consider for matching with an index to resolve the query. Although it still might result in a table scan, it at least gives the optimizer the opportunity to consider using an index; it wouldn't have done so with the inequality operator.

Another example of a nonSARG that can be rewritten as a SARGs is as follows:

 where substring(name, 1, 1) = 'A'

can be rewritten as

 where name like 'A%'

TIP

If you cannot avoid using a function on a column in the search expression, consider creating a computed column on the table and indexing it. This will materialize the function result into an additional column on the table, which can be indexed for faster searching because the index statistics can be used on the computed column. An example of using this approach would be for a query that has to find the number of orders placed in a certain month, regardless of the year. The following is a possible solution:

 select count(*)     from sales    where datepart(month, ord_date) = 6

This query gets the right resultset but ends up having to do so with a table scan because the function on the ord_date column prevents it from using any index that might exist on the ord_date column.

If this query is used frequently in the system and quick response time is critical, you could create a computed column on the function and index it as follows:

 alter table sales add ord_month as datepart(month, ord_date)  create index nc_sales_ordmonth on sales(ord_month)

Now, when you run the query on the table again, specify the computed column in the WHERE clause and it can use the index on the computed column to find the rows and avoid a table scan.

 select count(*)     from sales    where ord_month = 6

Even if the query still ends up using a table scan, it will now at least have statistics available to know how many rows it can expect to match where the month matches the value specified.

Some expressions involving computations on a column might be treated as SARGs during optimization if SQL Server can simplify the expression into a SARG. For example, the SARG

 ytd_sales/12 = 1000

can be simplified to

 ytd_sales = 12000.

The simplified expression is used only during optimization to determine an estimate of the number of matching rows and the usefulness of the index. During actual execution, the conversion is not done while traversing the index tree as it won't be able to do the repeated division by 12 for each row while searching through the tree. However, doing the conversion during optimization and getting a row estimate from the statistics helps the optimizer decide on other strategies to consider, such as index scanning versus table scanning, or it might help to decide an optimal join order if it's a multitable query.

When tuning performance of your system, keep an eye out for expressions that cannot be treated as SARGs. They're a common cause of poor performance because they prevent an index from being used to resolve the query. Many times, queries containing nonoptimizable SARGs can be rewritten with optimizable SARGs that will return the same resultset faster.

NOTE

The presence of a nonSARGable expression in a query does not guarantee that a table scan will be performed. Other SARGs might be present that will result in an index being used; or, on occasion, an index will cover the query if the index contains all needed columns and the nonSARGable expression can be evaluated by scanning the index instead of the entire table. Index covering is discussed in more detail in Chapter 34.

Identifying `OR` Clauses

The next statements the optimizer looks for in the query are OR clauses. OR clauses are SARGable expressions combined with an OR statement rather than an AND statement and are treated differently than a standard SARG. The format of an OR clause is as follows:

  SARG  or  SARG  [or ...]

with all columns involved in the OR belonging to the same table.

The following IN statement

 column in (  constant1, constant2,  ...)

is also treated as an OR clause, becoming

 column =  constant1  or column =  constant2  or ...

Some examples of OR clauses are as follows:

 where au_lname = 'Smith'  or au_fname = 'Fred'  where (type = 'business' and price > ) or pub_id = "1234" where au_lname in ('Smith', 'Jones', 'N/A')

An OR clause is a disjunction; all rows matching either of the two criteria appear in the resultset. Any row matching both criteria should appear only once.

The main issue is that an OR clause cannot be satisfied by a single index. Consider the first example just presented:

 where au_lname = 'Smith'  or au_fname = 'Fred'

An index on au_lname and au_fname will help us find all the rows where au_lanme = 'Smith' AND au_fname = 'Fred' , but searching the index tree will not help us find all the rows where au_fname = 'Fred' , but the last name is something other than 'Smith' . Unless an index on au_fname exists as well, the only way to find all rows with au_fname = 'Fred' is to search every row in the table.

An OR clause can be resolved by either a table scan or by using the OR strategy. Using a table scan, SQL Server reads every row in the table and applies each OR criteria to each row. Any row that matches any one of the OR criteria is put into the resultset.

A table scan is an expensive way to process a query, so the optimizer looks for an alternative for resolving an OR . If an index can be matched against all SARGs involved in the OR clause, SQL Server evaluates the possibility of applying the index intersection strategy described later in this chapter in the "Using Multiple Indexes" section.

Identifying Join Clauses

The last type of statement for which the query optimizer looks during the query analysis phase is the join clause. A join condition is specified in the FROM clause using the JOIN keyword as follows:

 FROM table1 JOIN table2  on table1.column = table2.column

Alternatively, join conditions can also be specified in the WHERE clause using the old-style join syntax, as shown in the following example:

  Table1.Column Operator Table2.Column

A join clause always involves two tables, except in the case of a self-join, but even in a self-join , you must specify the table twice in the query:

 select employee = e.LastName + ', ' + e.FirstName,         manager = m.LastName + ', ' + m.FirstName     from Northwind..Employees e left outer join Northwind..Employees m     on e.ReportsTo = m.EmployeeID    order by 2, 1

SQL Server will treat a self-join just like a normal join between two different tables.

Subquery Processing

Depending on how a subquery is written, SQL Server will optimize them differently. For example, SQL Server will attempt to flatten certain subqueries into joins when possible, to allow the optimizer to select the optimal join order rather than be forced to process the query inside out. This section examines the different types of subqueries and how they are optimized by SQL Server.

`IN` , `ANY` , or `EXISTS` Subqueries

In SQL Server, any query containing a subquery introduced with an IN , = ANY , or EXISTS predicate is flattened into an existence join unless the outer query also contains an OR clause or unless the subquery is correlated or contains one or more aggregates.

An existence join is optimized the same way as a regular join, with one exception. With an existence join, as soon as a matching row is found in the inner table, the value TRUE is returned and SQL Server stops looking for further matches for that row in the outer table and moves on to the next row. A normal join would continue processing to find all matching rows. The following query is an example of a subquery that would be converted to an existence join:

 select pub_name from publishers     where pub_id in (select pub_id from titles where type = "business')

Materialized Subqueries

If the outer query is comparing a column against the result of a subquery using any of the comparison operators ( = , > , < , >= , <= , != ), and the subquery is not correlated, the results of the subquery must be resolved ”that is, materialized ”before comparison against the outer table column. For these types of queries, the optimizer must process them inside out.

An example of this type of query is as follows:

 select title from titles     where total_sales = (select max(total_sales) from titles)

The subquery must be resolved first to find the value to compare against total_sales in the outer query. Although this is an optimizable SARG, the value to be compared with the total_sales column will not be known until the query actually runs.

Correlated Subqueries

A correlated subquery contains a reference to an outer table in a join clause in the subquery. The following is an example of a correlated subquery:

 SELECT au_lname, au_fname  FROM authors WHERE 100 IN     (SELECT royaltyper     FROM titleauthor     WHERE titleauthor.au_ID = authors.au_id)

Because correlated subqueries depend on values from the outer query for resolution, they cannot be processed independently. Instead, SQL Server processes correlated subqueries repeatedly, once for each qualifying outer row.