Lesson 1: Optimizing and Tuning Queries


Lesson 1: Optimizing and Tuning Queries

image from book

Estimated lesson time: 60 minutes

image from book

Optimizing and tuning queries is an important task for a database developer. Throughout this lesson, we will discuss several options for optimizing queries.

Evaluating Query Performance

One of the most important aspects when measuring performance is what to actually measure, that is, what metric to use. In SQL Server, there are three main metrics to consider: query cost, page reads, and query execution time.

Exam Tip 

In the exam, always consider the performance metric against which your solution to a specific problem is measured.

Query Cost

The query cost is typically the best metric to use when comparing query performance. It is an internal metric used in SQL Server that takes into account both CPU and input/output (IO) resources used by the query. The lower the query cost, the better the query performance. You can find it in the graphical execution plan of the query by moving the mouse pointer over the last (the leftmost) operation in the query and examining the value Estimated subtree cost. The query cost is not affected by things such as locking or resource contention. The cost is typically a good metric, but when certain items are used in a query, such as scalar user-defined functions and CLR routines, the cost for these items is not calculated, which renders the query cost lower than the accurate assessment. This is why it is called estimated query cost. The problem with query cost is discussed in greater detail later in this lesson.

Page Reads

Page reads represents the number of 8-kilobyte data pages accessed by the SQL Server storage engine while executing a query. You can retrieve this metric by executing SET STATISTICS IO ON. This will cause each query execution to output something similar to the following in the Messages tab of the query window:

 Table 'Customer'. Scan count 2, logical reads 136, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrderHeader'. Scan count 121, logical reads 822, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

The total page reads of this output is 136 + 822, which is the sum of the values labeled "logical reads." Logical reads are the number of pages read from memory. The logical reads represent the total number of data pages read from any index on the SalesOrderHeader table. The other items tell you how many of the logical reads were read from the hard drive (physical and read-ahead reads), the number of passes through an index or heap it took to respond to the query (scan count), and how many of the page reads were used to retrieve Large OBject (LOB) data (data stored outside of the row for the data types VARCHAR(max), NVARCHAR(max), VARBINARY(max), TEXT, NTEXT, IMAGE, and XML). The metric page reads do not take into account the amount CPU resources used when executing the query. This is why page reads are typically not as accurate a metric as the query cost. This metric also has the same problem with scalar used-defined functions and CLR routines as the query cost, which is that page reads caused by these routines are not included in the output of STATISTICS IO.

Query Execution Time

The execution time of the query is the most volatile metric. It is affected by blocking as well as resource contention on the server. That said, it is particularly important always to include the query execution time metric in performance comparisons because it can help you spot problems with the other performance metrics (page reads and query cost). By executing SET STATISTICS TIME ON, SQL Server will return the execution time in milliseconds for each query execution.

Examining the Theoretical Query Execution Order

It is vital to have a basic understanding of the theoretical execution order of a SELECT when working with query optimization. This helps in understanding what SQL Server actually needs to do to produce the query results.

The theoretical execution order is referred to as such because the query optimizer might change the execution order to optimize performance. This is only done if the query results would be the same were the theoretical execution order used.

The execution order needs to be split into two branches because it differs if the UNION clause is included in the query. A simplified version of these two branches is described in Table 7-1.

Table 7-1: Theoretical Execution Order - Excluding the UNION Clause
Open table as spreadsheet
 

Commands

Results

1.

FROM, JOIN, APPLY and ON

Join execution and first query filter (the ON clause)

2.

WHERE

Second query filter; does not support aggregate functions

3.

GROUP BY and aggregate functions (such as SUM, AVG etc.) that are included in the query

Grouping and aggregation calculations

4.

HAVING

Third query filter; only supports filtering on the results of aggregate functions

5.

SELECT

Determines which columns should be returned by the query

6.

ORDER BY

Result sorting

7.

TOP

Fourth (and last) query filter; causes the query to return only the first X rows from the previous clause

8.

FOR XML

Converts the tabular result returned by the SELECT statement to XML

The execution order shown in Table 7-1 holds true for all queries except those that contain the UNION clause. These queries will use the theoretical execution order shown in Table 7-2.

Table 7-2: Theoretical Execution Order - Including the UNION Clause
Open table as spreadsheet
 

Commands

Results

1.

FROM, JOIN, APPLY and ON

Join execution and first query filter (the ON clause).

2.

WHERE

Second query filter; does not support aggregate functions.

3.

GROUP BY and aggregate functions (such as SUM, AVG etc.) that are included in the query

Grouping and aggregation calculations.

4.

HAVING

Third query filter; only supports filtering on the results of aggregate functions.

5.

TOP

Fourth (and last) query filter; causes the query to return only the first X rows from the previous clause. (Note that, in this case, the TOP clause is executed before the ORDER BY clause.)

6.

UNION and SELECT

Concatenates the result of each SELECT statement included in the query; determines which columns should be returned by the query

7.

ORDER BY

Sorts the result of the UNION clause

8.

FOR XML

Converts the tabular result returned by the UNION/SELECT clauses to XML

The cause of the difference in the execution order is the introduction of the TOP clause (in SQL Server 7.0), which is not part of the ANSI/ISO SQL standard. This makes the standard-compliant behavior of the UNION clause (only allowing for one ORDER BY clause, which must be placed in the last SELECT statement of the query and must sort the entire query result) cause problems with the TOP clause. This isn't a huge issue, but it is important to be aware of. As an example, compare the result of the following two queries. Both queries are supposed to return the two most expensive "red" products and the two most expensive "black" products. The first query will produce an incorrect result, as follows:

 USE AdventureWorks; SELECT TOP(2) ProductID, Name, Color, ListPrice FROM Production.Product WHERE Color = 'Black' UNION SELECT TOP(2) ProductID, Name, Color, ListPrice FROM Production.Product WHERE Color = 'Red' ORDER BY ListPrice DESC; Results: ProductID   Name                      Color ListPrice ----------- ------------------------- ----- ---------- 706         HL Road Frame - Red, 58   Red   1431,50 707         Sport-100 Helmet, Red     Red   34,99 317         LL Crankarm               Black 0,00 318         ML Crankarm               Black 0,00 

The second query will produce the correct result, as follows:

 USE AdventureWorks; WITH a AS (     SELECT TOP(2) ProductID, Name, Color, ListPrice     FROM Production.Product     WHERE Color = 'Black'     ORDER BY ListPrice DESC ), b AS (     SELECT TOP(2) ProductID, Name, Color, ListPrice     FROM Production.Product     WHERE Color = 'Red'     ORDER BY ListPrice DESC ) SELECT * FROM a UNION ALL SELECT * FROM b; Results: ProductID   Name                      Color ListPrice ----------- ------------------------- ----- ---------- 775         Mountain-100 Black, 38    Black 3374,99 776         Mountain-100 Black, 42    Black 3374,99 749         Road-150 Red, 62          Red   3578,27 750         Road-150 Red, 44          Red   3578,27 

As you can see, the result of the first query does not return the correct values because the ORDER BY clause is executed after the TOP clause.

Optimizing Query Performance

There are several ways to optimize queries. Optimization consists of tasks such as rewriting the query, de-normalizing or normalizing tables, adding indexes, removing indexes, or a combination of these tasks.

The Graphical Execution Plan

The graphical execution plan is a great tool to use when optimizing queries. This chapter discusses several execution plans. Some of the items that you should look for in the execution plan are shown in Table 7-3.

Table 7-3: Items from the Graphical Execution Plan
Open table as spreadsheet

"Thick arrows"

image from book

A "thick arrow" represents a large number of rows moving from one operation in the execution plan to another. The greater the number of rows transferred from one operation to another, the thicker the arrow.

Hash operations

image from book

If a hash operation is used to handle clauses such as GROUP BY and JOIN, it often means that an appropriate index did not exist to optimize the query.

Sorts

image from book

A sort isn't necessarily bad, but if it is a high percentage of the query cost, you should consider whether an index can be built to remove the need for the sort operation.

"Large plans"

The plan with fewer operations is typically the better-optimized plan.

Table scans or clustered index scans

image from book

image from book

A clustered index scan and a table scan indicate that no appropriate index can be used to optimize the query.

Using Search Arguments

A search argument (SARG) is a filter expression that is used to limit the number of rows returned by a query and that can utilize an index seek operation which will substantially increase the performance of the query. Typically, a filter expression is not a SARG if the column from the table is changed in any way (such as LEFT(Name, 1) = 'A'). If the filter is not a SARG and no other SARGs exist in the query, this will result in an index or table scan. A scan refers to the scan of the entire table or index. Instead of a scan, you want a seek to be performed. A seek implies the use of the index's balanced tree to find the values for which the query searched. For example, in the following query, the OrderDateIndex index will be scanned (rather than seeked). The execution plan for the following query is shown in Figure 7-1.

 USE AdventureWorks; CREATE NONCLUSTERED INDEX OrderDateIndex     ON Sales.SalesOrderHeader (OrderDate); SELECT COUNT(*) FROM Sales.SalesOrderHeader     WHERE YEAR(OrderDate) = 2004; 

image from book
Figure 7-1: An execution plan from SQL Server Management Studio showing an Index Scan operation

If the query is instead rewritten so that the OrderDate column is not changed, an index seek operation will be used instead of a scan. The execution plan for the following query is shown in Figure 7-2.

 SELECT COUNT(*) FROM Sales.SalesOrderHeader     WHERE OrderDate >= '20040101' AND OrderDate < '20050101'; 

image from book
Figure 7-2: An execution plan from SQL Server Management Studio showing an Index Seek operation

Note that the use of the COLLATE operator in a filter expression also invalidates the use of an index on that column. We cover this in more detail in the next lesson.

Using Joins

To optimize queries, one of the first basic strategies is to minimize the number of join clauses used. Another consideration is that outer joins typically incur more cost than inner joins because of the extra work needed to find the unmatched rows. If only inner joins are used in a query, the behavior of the ON and WHERE clauses is the same; it does not matter if you put an expression in the ON or WHERE clause. Compare the following two queries; they both return the same results and will use equal execution plans.

 -- Query #1 SELECT p.ProductID, p.Name, sod.SalesOrderID FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod     ON sod.ProductID = p.ProductID WHERE p.Color = 'Black'; -- Query #2 SELECT p.ProductID, p.Name, sod.SalesOrderID FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod     ON sod.ProductID = p.ProductID        AND p.Color = 'Black'; 

If these queries had been written with an outer join, they would not be syntactically equal and could have substantially different performances.

Subqueries Without Correlation to the Outer Query

An uncorrelated subquery is executed only once per query execution and returns only one value. These queries typically incur very little overhead. Note that this type of subquery cannot have any reference (correlation) to the outer query. The following example uses a subquery to return all products that are cheaper than the average product price. The subquery calculating the average product price is executed first (only once), and then the value returned by the subquery is used as a parameter in the outer query.

 USE AdventureWorks; SELECT     p.ProductID     ,p.Name     ,p.ListPrice FROM Production.Product AS p WHERE p.ListPrice > (     SELECT AVG(p2.ListPrice)     FROM Production.Product AS p2 ); 

Correlated Subqueries

Correlated subqueries include a reference to the outer query. Typically, this reference is used to filter the correlated subquery. A correlated subquery is typically good for performance when used in combination with the EXISTS operator to filter the outer query. The following example query uses the EXISTS operator to return only products that have been sold:

 USE AdventureWorks; SELECT p.ProductID, p.Name FROM Production.Product AS p WHERE EXISTS (     SELECT * FROM Sales.SalesOrderDetail AS sod     WHERE sod.ProductID = p.ProductID) 

While this type of correlated subquery is typically a good implementation, the use of correlated subqueries in the SELECT clause typically has a negative effect on performance. This, of course, depends on the number of rows returned by the query; if a large number of rows is returned, each query in the SELECT clause would be executed for each row. The following query returns 6,224 rows and includes two correlated subqueries. Each of these queries is executed once per row, resulting in a total of 12,448 subquery executions.

 USE AdventureWorks; SELECT     soh.SalesOrderID     ,soh.OrderDate     ,(SELECT TOP(1) sod1.UnitPrice FROM Sales.SalesOrderDetail AS sod1       WHERE sod1.SalesOrderID = soh.SalesOrderID       ORDER BY sod1.OrderQty DESC) AS UnitPrice     ,(SELECT TOP(1) sod2.OrderQty FROM Sales.SalesOrderDetail AS sod2       WHERE sod2.SalesOrderID = soh.SalesOrderID       ORDER BY sod2.OrderQty DESC) AS OrderQty FROM Sales.SalesOrderHeader AS soh WHERE soh.TerritoryID = 4; 

There is also a potential bug in this query. Because each subquery is executed separately, they might end up using different indexes. This means that these queries might not return values from the same row (which they are probably intended to do) if the same value for OrderQty exists for multiple sales order details in any sales order.

There are several ways to rewrite this query; the most common one in SQL Server 2005 is probably to use the new APPLY clause. If the subquery is used in either one of the FROM, JOIN, or APPLY clauses, it might also be referred to as a derived table. The APPLY clause basically gives you the opportunity to combine the two subqueries in the previous query into one, splitting the number of subquery executions in half. For the new query to return the same results as the previous query, you must use an OUTER APPLY. (An OUTER APPLY works similarly to a left outer join, and its counterpart, the CROSS APPLY clause, behaves like an inner join.) This is because, in the previous query, the outer query will return a row even if the subqueries return nothing. The new query could be written as follows:

 USE AdventureWorks; SELECT     soh.SalesOrderID     ,soh.OrderDate     ,a.* FROM Sales.SalesOrderHeader AS soh OUTER APPLY (       SELECT TOP(1) sod.UnitPrice, sod.OrderQty       FROM Sales.SalesOrderDetail AS sod       WHERE sod.SalesOrderID = soh.SalesOrderID       ORDER BY sod.OrderQty DESC ) AS a WHERE soh.TerritoryID = 4; 

This query has a cost of roughly 73, while the first query's cost was double that, at about 146.

Another solution to this type of problem is to make use of the ROW_NUMBER function instead of a correlated subquery. By using the ROW_NUMBER function, you can find the specific number of rows that you need by filtering on the row number rather than using the TOP clause. To be able to filter on the result of the ROW_NUMBER function, the query needs to be placed inside a derived table or a common table expression (CTE). The larger the result set, the better this approach performs compared to the previous queries. The cost for the following query drops from 73 to about 2.88!

 WITH a AS ( -- Common table expression.     SELECT         soh.SalesOrderID         ,soh.OrderDate         ,sod.UnitPrice         ,sod.OrderQty         ,ROW_NUMBER() OVER (PARTITION BY soh.SalesOrderID             ORDER BY sod.OrderQty DESC) AS RowNo     FROM Sales.SalesOrderDetail AS sod     INNER JOIN Sales.SalesOrderHeader AS soh          ON sod.SalesOrderID = soh.SalesOrderID     WHERE soh.TerritoryID = 4 ) SELECT     a.SalesOrderID     ,a.OrderDate     ,a.UnitPrice     ,a.OrderQty FROM a WHERE a.RowNo = 1; 

Scalar User-Defined Functions

A scalar user-defined function (UDF) is a function that returns a single value (not a result set). This type of function is frequently used in queries and can significantly degrade performance. The reason for this is that these functions are not expanded and optimized into the main query plan by the query optimizer, but they are rather just called from the execution plan (without any optimization based on the context into which it is inserted in the plan). This also means that the cost of whatever is done inside the function is not included in the cost estimates found in the graphical execution plan for the query. This same problem occurs for the output of the SET STATISTICS IO ON statement, which will contain no references to what is done inside the UDF. The following performance comparison is between a query using a UDF and another query using a correlated subquery. Because of the problem with the query cost, for these queries, the performance metric will be the execution time, which is returned using the SET STATISTICS TIME ON statement.

Best Practices 

Query execution time

When using query execution times as a performance metric, it is typically a good idea to execute each query a few times and use either the lowest or the last execution time as the metric.

The following is a query using a UDF. The execution plan produced for the query is shown in Figure 7-3.

 USE AdventureWorks; GO CREATE FUNCTION dbo.fnGetCustomerAccountNumber(@CustomerID INT) RETURNS VARCHAR(10) AS BEGIN     RETURN ISNULL(        (         SELECT             AccountNumber         FROM Sales.Customer         WHERE CustomerID = @CustomerID     ), 'NOT FOUND'); END GO SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT     soh.SalesOrderID     ,soh.OrderDate     ,dbo.fnGetCustomerAccountNumber(soh.CustomerID) FROM Sales.SalesOrderHeader AS soh; 

image from book
Figure 7-3: An execution plan from SQL Server Management Studio

The cost of this query is 0.56 and the number of page reads is 703 (neither metric being useful for comparison), while the execution time on the test machine was 50 seconds. Examine the graphical execution plan and note that it contains no reference to the Sales.Customer table.

The following is an example query that uses a correlated subquery. The query's execution plan is shown in Figure 7-4.

 USE AdventureWorks; SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT     soh.SalesOrderID     ,soh.OrderDate     ,ISNULL(        (         SELECT             AccountNumber         FROM Sales.Customer         WHERE CustomerID = soh.CustomerID     ), 'NOT FOUND') FROM Sales.SalesOrderHeader AS soh; 

image from book
Figure 7-4: An execution plan from SQL Server Management Studio

The cost of the query without the UDF goes up to 5.99 and the number of page reads to 65,608 (both of which are "real" values). At the same time, the execution time drops to seven seconds. As you can see, the first query using the UDF is over 700 percent slower than the latter query because of the UDF use.

The use of inline table-valued user-defined functions or views does not incur the same performance cost as the use of scalar user-defined functions. Both inline table-valued user-defined functions and views are optimized (expanded) into the query plan.

Table-Valued User-Defined Functions

There are three different types of table-valued functions; two of them can be developed in T-SQL and the other one can be developed in a CLR language such as C# .NET or Visual Basic .NET. They are as follows:

  • T-SQL inline table-valued function

  • T-SQL multi-statement table-valued function

  • CLR table-valued function

These different types of functions behave differently. A T-SQL inline table-valued function is actually just a view that can accept parameters. It is optimized in the same way as a view or any select statement would be. An advantage of using inline table-valued functions instead of views is that you can require the user (for example, a database developer) that uses the function to supply parameters. In this way, you can make sure that a filter is always used for the query inside the function (based on the parameters provided).

T-SQL multi-statement table-valued functions, on the other hand, can be considered to work like a stored procedure that populates a temporary table that is used by an outer stored procedure. If you include a multi-statement table-valued function in a query (for example, in a join), the function has to be fully executed (that is, finish execution) before the query can make use of its results. This means that if a multi-statement table-valued function needs to return 1,000,000 rows, all rows must be processed by the function before the function's results can be used by the query.

CLR table-valued functions, on the other hand, stream their results. This means that while the CLR table-valued function is executing, its results become available to the calling query. This difference can help performance because the outer query does not have to wait for the entire result from the function to be available before it can start processing the returned rows. A CLR table-valued function consists of two CLR methods: one method that manages the overall execution of the function and one method that is called for every row that is returned by the function. The method that is run for each row returned by the function is not first run until the method that manages the function execution first starts executing yield return commands. This is important to remember because any processing before the start of the yield return commands will have to be finished before any rows are returned from the the function. CLR table-valued functions are typically useful for querying objects other than tables, such as strings (by using regular expressions) or, for example, the file system. Note that the processing done by a CLR function is not accurately included in the cost or page reads of the query.

Cursors

You should generally avoid using cursors because of their negative effect on performance. They have such an effect partly because each execution of a FETCH statement in a cursor loop is similar in performance cost to executing a SELECT statement that returns one row. Another problem is that a DML statement is optimized as a single unit, while a cursor loop cannot be optimized in the same way. Instead, each item in the loop will be optimized and executed separately for each iteration of the loop.

You should try to rewrite cursor logic into one or more set-based statements (SELECT, INSERT, UPDATE, DELETE). If you must use cursors, consider implementing the logic using a CLR stored procedure or a table-valued user-defined function instead (depending on the functionality you need).

Lab: Comparing Query Performance

In this lab, you will test the query performance of three different queries that should produce the same result set. The query will return all customers in a specific territory and the last order received for those customers. If the customer does not have any orders, it must still be returned.

The completed lab is available in the \Labs\Chapter 07\Lab1 folder on the companion CD.

Important 

Lab requirements

You will need to have SQL Server 2005 and the AdventureWorks database installed before you can complete this lab. Refer to the Introduction for setup instructions.

Exercise 1: Test Using a Small Result Set

image from book

In this exercise, you will execute the three queries mentioned in the lab preface and record each query's cost. In this case, the parameter supplied to all three queries (TerritoryID) will yield a small result set of 64 rows.

  1. Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

  2. In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

     CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.Customer FROM AdventureWorks.Sales.Customer; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); GO ALTER TABLE Test.Customer     ADD CONSTRAINT PKCustomer     PRIMARY KEY(CustomerID); 

  3. Turn on the Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M, or by selecting Include Actual Execution Plan from the Query menu.

  4. Type and execute Query #1 to test its performance. (Because of the use of two separate correlated subqueries in this query, it is not guaranteed that both these subqueries return data from the same row in the Test.Customer table.)

     -- Query #1 SELECT     c.CustomerID     ,c.AccountNumber     ,(         SELECT TOP(1) soh.SalesOrderID         FROM Test.SalesOrderHeader AS soh         WHERE soh.CustomerID = c.CustomerID         ORDER BY OrderDate DESC     ) AS SalesOrderID     ,(         SELECT TOP(1) soh.OrderDate         FROM Test.SalesOrderHeader AS soh         WHERE soh.CustomerID = c.CustomerID         ORDER BY OrderDate DESC     ) AS OrderDate FROM Test.Customer AS c WHERE c.TerritoryID = 2; 

    What was the total cost of Query #1?

    (You can find the value in the Execution Plan tab by moving the mouse pointer over the SELECT operator and locating the value named Estimated Subtree Cost.)

  5. Type and execute Query #2 to test its performance.

     -- Query #2 SELECT     c.CustomerID     ,c.AccountNumber     ,o.* FROM Test.Customer AS c OUTER APPLY (     SELECT TOP(1) soh.SalesOrderID, soh.OrderDate     FROM Test.SalesOrderHeader AS soh     WHERE soh.CustomerID = c.CustomerID     ORDER BY OrderDate DESC ) AS o WHERE c.TerritoryID = 2; 

    What was the total cost of Query #2?

  6. Type and execute Query #3 to test its performance.

     -- Query #3 WITH a AS (     SELECT         c.CustomerID         ,c.AccountNumber         ,c.TerritoryID         ,soh.SalesOrderID         ,soh.OrderDate         ,ROW_NUMBER() OVER (PARTITION BY c.CustomerID                 ORDER BY soh.OrderDate DESC) AS RowNo     FROM Test.Customer AS c     LEFT OUTER JOIN Test.SalesOrderHeader AS soh         ON soh.CustomerID = c.CustomerID ) SELECT     a.CustomerID     ,a.AccountNumber     ,a.SalesOrderID     ,a.OrderDate FROM a WHERE a.RowNo = 1 AND a.TerritoryID = 2; 

    What was the total cost of Query #3?

  7. To clean up after this exercise, close all open query windows in SQL Server Management Studio, open a new query window, and execute the following SQL statements:

     USE master; DROP DATABASE TestDB; 

image from book

Exercise 2: Test Using a Large Result Set

image from book

In this exercise, you will execute the three queries mentioned in the lab preface and record each query's cost. In this case, the parameter supplied to all three queries (TerritoryID) will yield a larger result set of 3,433 rows (compared to 64 rows in the previous exercise).

  1. Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

  2. In a new query window, type and execute the following SQL statements to create the TestDB database, the Test schema, and the two tables that will be used in this exercise:

     CREATE DATABASE TestDB; GO USE TestDB; GO CREATE SCHEMA Test; GO SELECT * INTO Test.SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader; GO SELECT * INTO Test.Customer FROM AdventureWorks.Sales.Customer; GO ALTER TABLE Test.SalesOrderHeader     ADD CONSTRAINT PKSalesOrderHeader     PRIMARY KEY(SalesOrderID); GO ALTER TABLE Test.Customer     ADD CONSTRAINT PKCustomer     PRIMARY KEY(CustomerID); 

  3. Turn on the Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M, or by selecting Include Actual Execution Plan from the Query menu.

  4. Type and execute Query #1 to test its performance.

    (Because of the use of two separate correlated subqueries in this query, it is not guaranteed that both these subqueries return data from the same row in the Test.Customer table.)

     -- Query #1 SELECT     c.CustomerID     ,c.AccountNumber     ,(         SELECT TOP(1) soh.SalesOrderID         FROM Test.SalesOrderHeader AS soh         WHERE soh.CustomerID = c.CustomerID         ORDER BY OrderDate DESC     ) AS SalesOrderID     ,(         SELECT TOP(1) soh.OrderDate         FROM Test.SalesOrderHeader AS soh         WHERE soh.CustomerID = c.CustomerID         ORDER BY OrderDate DESC     ) AS OrderDate FROM Test.Customer AS c WHERE c.TerritoryID = 1; 

    What was the total cost of Query #1?

    (You can find the value in the Execution Plan tab by moving the mouse pointer over the SELECT operator and locating the value named Estimated Subtree Cost.)

  5. Type and execute Query #2 to test its performance.

     -- Query #2 SELECT     c.CustomerID     ,c.AccountNumber     ,o.* FROM Test.Customer AS c OUTER APPLY (     SELECT TOP(1) soh.SalesOrderID, soh.OrderDate     FROM Test.SalesOrderHeader AS soh     WHERE soh.CustomerID = c.CustomerID     ORDER BY OrderDate DESC ) AS o WHERE c.TerritoryID = 1; 

    What was the total cost of Query #2?

  6. Type and execute Query #3 to test its performance.

     -- Query #3 WITH a AS (     SELECT         c.CustomerID         ,c.AccountNumber         ,c.TerritoryID         ,soh.SalesOrderID         ,soh.OrderDate         ,ROW_NUMBER() OVER (PARTITION BY c.CustomerID                 ORDER BY soh.OrderDate DESC) AS RowNo     FROM Test.Customer AS c     LEFT OUTER JOIN Test.SalesOrderHeader AS soh         ON soh.CustomerID = c.CustomerID ) SELECT     a.CustomerID     ,a.AccountNumber     ,a.SalesOrderID     ,a.OrderDate FROM a WHERE a.RowNo = 1 AND a.TerritoryID = 1; 

    What was the total cost of Query #3?

image from book

Exercise 3: Optimize Query #3

image from book

In this exercise, you will make a small change to Query #3 to optimize it.

  1. Open SQL Server Management Studio and connect to an instance of SQL Server 2005.

  2. Turn on the Actual Execution Plan in SQL Server Management Studio by pressing Ctrl+M.

  3. Type and execute the new version of Query #3 with the small result set (TerritoryID = 2). (The difference from the previous version of the query appears in bold.)

     USE TestDB; GO WITH a AS (     SELECT         c.CustomerID         ,c.AccountNumber         ,c.TerritoryID         ,soh.SalesOrderID         ,soh.OrderDate         ,ROW_NUMBER() OVER (PARTITION BY c.CustomerID                       ORDER BY soh.OrderDate DESC) AS RowNo     FROM Test.Customer AS c     LEFT OUTER JOIN Test.SalesOrderHeader AS soh         ON soh.CustomerID = c.CustomerID     WHERE c.TerritoryID = 2 ) SELECT     a.CustomerID     ,a.AccountNumber     ,a.SalesOrderID     ,a.OrderDate FROM a     WHERE a.RowNo = 1; 

    What was the total cost of this version of Query #3 for the small result set?

  4. Type and execute the new version of Query #3 with the larger result set (TerritoryID = 1). (The difference from the previous version of the query appears in bold.)

     USE TestDB; GO WITH a AS (     SELECT         c.CustomerID         ,c.AccountNumber         ,c.TerritoryID         ,soh.SalesOrderID         ,soh.OrderDate         ,ROW_NUMBER() OVER (PARTITION BY c.CustomerID                       ORDER BY soh.OrderDate DESC) AS RowNo     FROM Test.Customer AS c     LEFT OUTER JOIN Test.SalesOrderHeader AS soh         ON soh.CustomerID = c.CustomerID     WHERE c.TerritoryID = 2 ) SELECT     a.CustomerID     ,a.AccountNumber     ,a.SalesOrderID     ,a.OrderDate FROM a WHERE a.RowNo = 1; 

    What was the total cost of this version of Query #3 for the larger result set?

  5. To clean up after this lab, close all open query windows in SQL Server Management Studio, open a new query window, and execute the following SQL statements:

     USE master; DROP DATABASE TestDB; 

image from book




MCITP Self-Paced Training Kit Exam 70-442  .Designing and Optimizing Data Access by Using Microsoft SQL Server 2005
MCITP Self-Paced Training Kit Exam 70-442 .Designing and Optimizing Data Access by Using Microsoft SQL Server 2005
ISBN: 073562383X
EAN: N/A
Year: 2007
Pages: 162

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net