The first step in query optimization is to analyze each table in the query to identify all search arguments (SARGs), OR clauses, and join clauses. The SARGs, OR clauses, and join clauses will be used in the second step, index selection, to select useful indexes to satisfy a query.
Identifying Search Arguments
A search argument (SARG) is defined as a WHERE clause comparing a column to a constant. The format of a SARG is as follows :
Column operator constant_expression [and...]
SARGs provide a way for the optimizer to limit the rows searched to satisfy a query. The general goal is to match a SARG with an index to avoid a table scan. Valid operators for a SARG are any one of = , > , < , >= , and <= , BETWEEN , and sometimes LIKE . Multiple SARGs can be combined with the AND clause (a single index might match some or all of the SARGs AND ed together). Following are examples of optimizable search arguments:
In some cases, the column in a SARG might be compared with a constant expression rather than a single constant value. The constant expression can be an arithmetic operation, a built-in function, a string concatenation, a local variable, or a subquery result. As long as the left side of the SARG contains a column alone, it's still a SARGable expression.
The LIKE clause will be treated as a SARG only if the first character in the string is a constant. The following statement wouldn't be treated as a SARG:
au_lname like '%son'
A LIKE clause with a wildcard as the first character is not considered a SARGable expression because it doesn't limit the search. In other words, every row would have to be examined to determine if it were a match. The inequality operator ( != or <> ) isn't a valid operator for a SARG for this same reason ”an index can only help you find matches for a specific value, not everything that doesn't match. Additionally, if any operation is performed on the column, such as a function, it's not considered SARGable, either. Some examples of nonSARGable expressions are as follows:
Some expressions involving computations on a column might be treated as SARGs during optimization if SQL Server can simplify the expression into a SARG. For example, the SARG
ytd_sales/12 = 1000
can be simplified to
ytd_sales = 12000.
The simplified expression is used only during optimization to determine an estimate of the number of matching rows and the usefulness of the index. During actual execution, the conversion is not done while traversing the index tree as it won't be able to do the repeated division by 12 for each row while searching through the tree. However, doing the conversion during optimization and getting a row estimate from the statistics helps the optimizer decide on other strategies to consider, such as index scanning versus table scanning, or it might help to decide an optimal join order if it's a multitable query.
When tuning performance of your system, keep an eye out for expressions that cannot be treated as SARGs. They're a common cause of poor performance because they prevent an index from being used to resolve the query. Many times, queries containing nonoptimizable SARGs can be rewritten with optimizable SARGs that will return the same resultset faster.
Identifying OR Clauses
The next statements the optimizer looks for in the query are OR clauses. OR clauses are SARGable expressions combined with an OR statement rather than an AND statement and are treated differently than a standard SARG. The format of an OR clause is as follows:
SARG or SARG [or ...]
with all columns involved in the OR belonging to the same table.
The following IN statement
column in ( constant1, constant2, ...)
is also treated as an OR clause, becoming
column = constant1 or column = constant2 or ...
Some examples of OR clauses are as follows:
where au_lname = 'Smith' or au_fname = 'Fred' where (type = 'business' and price > ) or pub_id = "1234" where au_lname in ('Smith', 'Jones', 'N/A')
An OR clause is a disjunction; all rows matching either of the two criteria appear in the resultset. Any row matching both criteria should appear only once.
The main issue is that an OR clause cannot be satisfied by a single index. Consider the first example just presented:
where au_lname = 'Smith' or au_fname = 'Fred'
An index on au_lname and au_fname will help us find all the rows where au_lanme = 'Smith' AND au_fname = 'Fred' , but searching the index tree will not help us find all the rows where au_fname = 'Fred' , but the last name is something other than 'Smith' . Unless an index on au_fname exists as well, the only way to find all rows with au_fname = 'Fred' is to search every row in the table.
An OR clause can be resolved by either a table scan or by using the OR strategy. Using a table scan, SQL Server reads every row in the table and applies each OR criteria to each row. Any row that matches any one of the OR criteria is put into the resultset.
A table scan is an expensive way to process a query, so the optimizer looks for an alternative for resolving an OR . If an index can be matched against all SARGs involved in the OR clause, SQL Server evaluates the possibility of applying the index intersection strategy described later in this chapter in the "Using Multiple Indexes" section.
Identifying Join Clauses
The last type of statement for which the query optimizer looks during the query analysis phase is the join clause. A join condition is specified in the FROM clause using the JOIN keyword as follows:
FROM table1 JOIN table2 on table1.column = table2.column
Alternatively, join conditions can also be specified in the WHERE clause using the old-style join syntax, as shown in the following example:
Table1.Column Operator Table2.Column
A join clause always involves two tables, except in the case of a self-join, but even in a self-join , you must specify the table twice in the query:
select employee = e.LastName + ', ' + e.FirstName, manager = m.LastName + ', ' + m.FirstName from Northwind..Employees e left outer join Northwind..Employees m on e.ReportsTo = m.EmployeeID order by 2, 1
SQL Server will treat a self-join just like a normal join between two different tables.
Depending on how a subquery is written, SQL Server will optimize them differently. For example, SQL Server will attempt to flatten certain subqueries into joins when possible, to allow the optimizer to select the optimal join order rather than be forced to process the query inside out. This section examines the different types of subqueries and how they are optimized by SQL Server.
IN , ANY , or EXISTS Subqueries
In SQL Server, any query containing a subquery introduced with an IN , = ANY , or EXISTS predicate is flattened into an existence join unless the outer query also contains an OR clause or unless the subquery is correlated or contains one or more aggregates.
An existence join is optimized the same way as a regular join, with one exception. With an existence join, as soon as a matching row is found in the inner table, the value TRUE is returned and SQL Server stops looking for further matches for that row in the outer table and moves on to the next row. A normal join would continue processing to find all matching rows. The following query is an example of a subquery that would be converted to an existence join:
select pub_name from publishers where pub_id in (select pub_id from titles where type = "business')
If the outer query is comparing a column against the result of a subquery using any of the comparison operators ( = , > , < , >= , <= , != ), and the subquery is not correlated, the results of the subquery must be resolved ”that is, materialized ”before comparison against the outer table column. For these types of queries, the optimizer must process them inside out.
An example of this type of query is as follows:
select title from titles where total_sales = (select max(total_sales) from titles)
The subquery must be resolved first to find the value to compare against total_sales in the outer query. Although this is an optimizable SARG, the value to be compared with the total_sales column will not be known until the query actually runs.
A correlated subquery contains a reference to an outer table in a join clause in the subquery. The following is an example of a correlated subquery:
SELECT au_lname, au_fname FROM authors WHERE 100 IN (SELECT royaltyper FROM titleauthor WHERE titleauthor.au_ID = authors.au_id)
Because correlated subqueries depend on values from the outer query for resolution, they cannot be processed independently. Instead, SQL Server processes correlated subqueries repeatedly, once for each qualifying outer row.