Managing the Optimizer

Because the optimizer might sometimes make poor decisions as to how to best process a query, you need to know how and when to override the optimizer and force SQL Server to process a query in a specific manner.

How often does SQL Server require manual intervention to execute a query optimally? Considering the overwhelming number of query types and circumstances in which those queries are run, SQL Server does a surprisingly effective job of query optimization in most instances. For all but the most grueling, complex query operations, my own testing and experience has shown that SQL Server's optimizer is quite clever ”and very, very good at wringing the best performance out of any hardware platform. For this reason, you should treat the material covered in this chapter as a collection of techniques to be used only where other methods of getting optimal query performance have already failed.

Before indiscriminately applying the techniques discussed in this section, remember one very important point: Use of these features can effectively hide serious fundamental design or coding flaws in your database, application, or queries. In fact, if you're tempted to use these features (with a few more moderate exceptions), it should serve as an indicator that problems might lie elsewhere in your application.

If you are satisfied that no such flaws exist and that SQL Server is choosing the wrong plan to optimize your query, you can use the methods discussed in this section to override two of the three most important decisions the optimizer makes:

Choosing which index, if any, to resolve the query
Choosing the join strategy to apply in a multitable query

The other decision made by the optimizer is the locking strategy to apply. Using table hints to override locking strategies is discussed in Chapter 38, "Locking and Performance."

Throughout this section, one point must remain clear in the reader's mind: These options should be used only in exception cases to cope with specific optimization problems in specific queries in specific applications. As such, there are no standard or global rules to follow because the application of these features by definition means that normal SQL Server behavior isn't taking place.

The practical result of this idea is that you should test every option in your environment, with your data and your queries, and use the techniques and methods discussed in this chapter and the other performance- related chapters to optimize and fine-tune the performance of your queries. The fastest -performing query wins, so don't be afraid to experiment with different options ”but don't think that these statements and features are globally applicable or fit general categories of problems, either! There are, in fact, only three rules: test, test, and test !

TIP

Personally, I make it a rule to use optimizer and table hints only as a last resort when all other methods to get the optimizer to generate a more efficient query plan have failed. I always try to find other ways to rewrite the queries to encourage the optimizer to choose a better plan. This includes adding additional SARGs, substituting unknown values for known values in SARGS or trying to replace unknown values with known values, breaking up queries, converting subqueries to joins or joins to subqueries, and so on. Essentially , I'll try other variations on the query to get the same result in a different way and try to see if one of the variations ends up using the more efficient query plan that I expect it to.

In reality, about the only time I use these hints is when I'm testing the performance of a query and I want to see if the optimizer is actually choosing the best execution plan. You can enable the various query analysis options, such as STATISTICS PROFILE and STATISTICS IO , and then see how the query plan and statistics change as you apply various hints to the query. Examine the output to determine whether the IO cost improves or gets worse if you force one index over another, or if you force a specific join strategy or join order.

The problem with hard-coding table and optimizer hints into your application queries is that the hints prevent the optimizer from modifying the query plan as the data in your tables changes over time. Also, if subsequent service packs or releases of SQL Server incorporate improved optimization algorithms or strategies, the queries with hard-coded hints will not be able to take advantage of them.

If you find that you must incorporate any of these hints to solve query performance problems, be sure to document which queries and stored procedures contain optimizer and table hints. Periodically, go back and test the queries to determine whether the hints are still appropriate. You might find that over time, as the data values in the table have changed, the query plan generated because of the hints is no longer the most efficient query plan, and the optimizer now generates a more efficient query plan on its own.

Optimizer Hints

You can specify three types of hints in a query to override the decisions made by the optimizer:

Table hints
Join hints
Query hints

The remainder of this section will examine and describe each type of table hints.

Forcing Index Selection with Table Hints

In addition to locking hints that can be specified for each table in a query, SQL Server 2000 allows you to provide table-level hints that enable you to specify the index SQL Server should use for accessing the table. The syntax for specifying an index hint is as follows :

 SELECT  column_list  FROM  talename  WITH (INDEX (  indid   index_name  [, ...]) )

The old style syntax INDEX = index_name is still supported for backward compatibility only.

The new syntax, introduced in SQL Server 7.0, allows you to specify multiple indexes. You can specify the index by name or by ID. It is recommended that you specify indexes by name as the IDs for nonclustered indexes could change if they are dropped and re-created in a different order than they were originally. Specify an index ID of 0 to force a table scan.

When you specify multiple indexes in the hint list, all the indexes listed are used to retrieve the rows from the table, forcing an index intersection or index covering via an index join. If the collection of indexes listed does not cover the query, a regular row fetch is performed after retrieving all the indexed columns .

To get a list of indexes on a table, you can use sp_helpindex . However, the stored procedure doesn't display the index ID. To get a list of all user -defined tables and the names of the indexes defined on them, you can execute a query against the sysindexes table similar to the following, which was run in the pubs database:

 select 'Table name' = object_name(id), 'Index name' = name, 'Index ID' = indid     from sysindexes where id > 99 /* only system tables have id less than 99 */    and indid between 1 and 254   /* do not include rows for text columns                                  or tables without a clustered index*/     /* do not include auto statistics */     and indexproperty(id, name, 'IsAutoStatistics') = 0     order by 1, 3 Table name                     Index name                     Index ID ------------------------------ ------------------------------ -------- authors                        UPKCL_auidind                         1 authors                        aunmind                               2 Clustered_Dupes                Cl_dupes_col1                         1 clustered_nodupes              idxCL                                 1 dtproperties                   pk_dtproperties                       1 employee                       employee_ind                          1 employee                       PK_emp_id                             2 employee                       emp_tel_idx                           3 jobs                           PK__jobs__117F9D94                    1 nc_heap_nodupes                idxNC_heap                            2 pub_info                       UPKCL_pubinfo                         1 publishers                     UPKCL_pubind                          1 recomp_tab                     idx1                                  2 recomp_tab                     idx2                                  3 roysched                       titleidind                            2 sales                          UPKCL_sales                           1 sales                          titleidind                            2 stores                         UPK_storeid                           1 titleauthor                    UPKCL_taind                           1 titleauthor                    auidind                               2 titleauthor                    titleidind                            3 titles                         UPKCL_titleidind                      1 titles                         titleind                              2

An index ID of 1 is for the clustered index, and index IDs 2 “254 are the nonclustered indexes. An index ID of 0 indicates a table with no clustered index, and an index ID of 255 is used if the table has any text or image columns. Remember that every table will have either a 0 or a 1 ”but not both. After you have the index names and IDs, you can use them to specify the index to be used by the query.

SQL Server also supports, for backward compatibility, the FASTFIRSTROW option as a table hint. This has been replaced with the FAST n query processing hint, described in the "Specifying Query Processing Hints" section later in this chapter.

Forcing Join Strategies with Join Hints

Join hints let you force the type of join that should be used between two tables. The join hints correspond with the three types of join strategies, as follows:

LOOP
MERGE
HASH

Join hints can be specified only when you use the ANSI-style join syntax ”that is, when you actually use the keyword JOIN in the query. The hint is specified between the type of join and the keyword JOIN , which means you can't leave out the keyword INNER for an inner join (if you are doing an outer join, the OUTER keyword always has to be specified). Thus, the syntax for the FROM clause when using join hints is as follows:

 FROM  table1  {INNER  OUTER} [LOOP  MERGE  HASH} JOIN  table2

The following is an example of forcing SQL Server to use a hash join:

 select st.stor_name, ord_date, qty     from stores st INNER HASH JOIN sales s on st.stor_id = s.stor_id    where st.stor_id between 'B100' and 'B599'

Specifying Query Processing Hints

SQL Server 2000 enables you to specify additional query hints to control how your queries are optimized and processed . Query hints are specified at the very end of your query using the OPTION keyword. There can be only one OPTION clause per query, but you can specify multiple hints in an OPTION clause, as shown in the following syntax:

 OPTION (  hint1  [, ...  hintn  ])

Query hints are grouped into three categories: GROUP BY , UNION , and miscellaneous.

`GROUP BY` Hints

The GROUP BY hints specify how GROUP BY or COMPUTE operations should be performed. The GROUP BY hints that can be specified are as follows:

HASH GROUP ” This option forces the optimizer to use a hashing function to perform the GROUP BY operation.
ORDER GROUP ” This option forces the optimizer to use a sorting operation to perform the GROUP BY operation.

Only one GROUP BY hint can be specified at a time.

`UNION` Hints

The UNION hints specify how UNION operations should be performed. The UNION hints that can be specified are as follows:

MERGE UNION ” This option forces the optimizer to use a merge operation to perform the UNION operation.
HASH UNION ” This option forces the optimizer to use a hash operation to perform the UNION operation.
CONCAT UNION ” This option forces the optimizer to use the concatenation method to perform the UNION operation.

Only one UNION hint can be specified at a time and must come after the last query in the UNION . The following is an example of forcing concatention for a UNION :

 select stor_id from sales where stor_id like 'B19%'  UNION select title_id from titles where title_id like 'C19%' OPTION (CONCAT UNION)

Miscellaneous Hints

The following miscellaneous hints can be used to override various query operations:

FORCE ORDER ” This option tells the optimizer to join the tables in the order they are listed in the FROM clause and not to determine the optimal join order. This option replaces the SET FORCEPLAN option in SQL Server 6.5 and earlier.
ROBUST PLAN ” This option forces the query optimizer to attempt a plan that works for the maximum potential row size, even if it means degrading performance. If you have very wide VARCHAR columns, some types of query plans might create intermediate tables, and if any of the internal operations need to store and process rows in these intermediate tables, some rows might exceed SQL Server's row size limit. If this happens, SQL Server generates an error during query execution. When the ROBUST PLAN hint is specified, the query optimizer will not consider any plans that might encounter this problem.
MAXDOP number ” This hint overrides the server level configuration setting for max degree of parallelism for the current query in which the hint is specified.
KEEP PLAN ” When this hint is specified, it forces the query optimizer to relax the estimated recompile threshold for a query. In other words, the query is not recompiled as frequently when there are multiple updates to a table. This option is useful primarily for queries whose execution plan stays in memory, such as for stored procedures. An example of when you might want to specify this option is for a stored procedure that does a lot of work with temporary tables, which can lead to frequent recompilations of the execution plan for the stored procedure.
KEEPFIXED PLAN ” This query hint tells the optimizer not to recompile the query plan when there are changes in statistics or modifications to indexed columns used by the query via updates, deletes, or inserts . When this option is specified, the query will be recompiled only if the schema of the underlying tables is changed or sp_recompile is executed against those tables.
EXPAND VIEWS ” The hint tells the query optimizer not to consider any indexed view as a substitute for any part of the query and to force the view to be expanded into its underlying query. This hint essentially prevents direct use of indexed views in the query plan.
FAST n ” This hint instructs SQL Server to optimize the query to return the first rows as quickly as possible, even if the overall throughput is reduced. In other words, it improves response time at the expense of the total query execution time. This option will generally influence the query optimizer to retrieve data using a nonclustered index that matches the ORDER BY clause of a query instead of using a different access method that would require a sort operation first to return rows in the specified order. Unlike the FASTFIRSTROW option, which this option replaces, after n number of rows have been returned, the query continues execution normally to produce its full resultset.

NOTE

Optimizer hints are not always executed. For example, the optimizer will probably ignore a HASH UNION hint for a query using the UNION ALL statement. Because a UNION ALL means to return all rows whether there are duplicates or not, you don't need to hash these values to determine uniqueness and remove duplicates, so the normal concatenation will probably still take place.

Limiting Query Plan Execution with the Query Governor

Another interesting tool available in SQL Server 2000 is the query governor. Because SQL Server uses a cost-based optimizer, the cost of executing a given query is always estimated before the query is actually executed. The query governor enables you to set a cost threshold to prevent certain long-running queries from being executed. This is not so much a tuning tool as it is a performance problem prevention tool.

For example, if you have an application with an English Query front end, you have no way of controlling what the user is going to request from the database and the type of query generated. The query governor will allow you to prevent a runaway query from executing and avoid using up valuable CPU time and memory by processing a poorly formed query.

You can set the query governor cost limit for the current user session by setting the session level property, QUERY_GOVERNOR_COST_LIMIT :

 SET QUERY_GOVERNOR_COST_LIMIT  value

The value specified is the maximum length of time, in seconds, a query is allowed to run. If the optimizer estimates the query would take longer than the specified value, SQL Server will not execute it.

Although the option is specified in seconds, it is a relative value corresponding to the TotalSubtreeCost estimated by the query optimizer. In other words, if you set the query governor cost limit to 100, it will prevent the execution of any queries whose estimated TotalSubtreeCost is greater than 100 seconds. The TotalSubtreeCost time is based on a generic algorithm in SQL Server and might not map exactly to how long the query takes to run on your own system. The actual runtime depends on a number of factors ”CPU speed, IO speed, network speed, the number of rows returned over the network, and so on. You will need to correlate the optimizer runtime estimate to how long the query actually takes to run on your system to set the query governor cost limit to a value related to actual query runtime.

The best way to figure this out is to run your queries with the STATISTICS PROFILE and STATISTICS TIME session settings enabled (these settings are discussed in more detail in the next chapter, "Query Analysis"). Compare the values in the TotalSubtreeCost column for the first row of the STATISTICS PROFILE output with the elapsed time displayed by STATISTICS TIME for your query. Do this for a number of your queries and you might be able to come up with an average correlation of the actual runtimes with the optimizers' estimated query cost. For example, if the average cost estimate is 30 seconds and the actual runtimes are 15 seconds, you would need to double the setting for query governor cost limit to correspond to the actual execution time threshold ”in other words, if you want the threshold to be 60 seconds for this example, you would need to set the query governor threshold to 120.

To configure a query governor threshold for all user connections, you can also set it at the server level. Open Enterprise Manager. Right-click the server and choose Properties from the menu. Next, select the Server Settings tab. In the Server Behavior group box, check the Use Query Governor option and specify a cost threshold (see Figure 35.20). The cost threshold is given in the same TotalSubtreeCost units as specified for the QUERY_GOVERNOR_COST_LIMIT session setting.

Figure 35.20. Configuring the Query Governor settings in the SQL Server Properties dialog box.

graphics/35fig20.jpg

Alternatively, you can configure the serverwide setting using sp_configure:

 sp_configure query governor cost limit, 100

TIP