The Execution Plan


Whenever a T-SQL statement is executed, an execution plan is either newly generated, or the existing one is reused. However, for stored procedures the T-SQL statements are compiled into a single plan that is cached and reused as appropriate.

For example, take the following T-SQL statement:

     SELECT Sum(od.UnitPrice)     FROM Orders o     INNER JOIN [Order Details] od ON o.OrderID = od.OrderID     WHERE o.Freight > 1000 

Conceptually, the execution plan that SQL Server needs to follow to satisfy this request is:

  1. Retrieve all rows from the Orders table where Freight is greater than 1000

  2. Retrieve all rows from Order Details that have an OrderID in the set identified above

  3. Join the two sets above together based on the OrderID column

  4. Total the UnitPrice column for this set

  5. Present the result

Why Bother About the Execution Plan?

As a developer, why do you need to understand what's going on behind the scenes in SQL Server, and not simply execute your stored procedures without having to worry about how SQL Server chooses to carry out the request? From a performance perspective, being able to understand the execution plan and to look for obvious performance impacts is a valuable skill.

For every T-SQL statement there isn't just one potential execution plan, there may be many alternatives, and it is the job of the query optimizer to choose the most efficient one. What is determined as the most efficient one is based on a number of factors, including things like available indexes, data distributions, table sizes and so on.

If you view and understand the execution plan, you can see where the query consumes time and large amount of resources. This knowledge can help you to improve the performance, either by modifying your query to function in a more efficient manner, or by creating appropriate indexes to help reduce excessive resource usage.

Viewing the Execution Plan

There are two views of the execution plan available from Query Analyzer: a graphical view and a textual view.

Graphical View of the Execution Plan

Due to the ease of using the graphical execution plan, most people prefer this option, unless they explicitly want to capture the execution plan with script, for example. This option can be selected from the Query menu of the Query Analyzer or by clicking on the appropriate button in the toolbar.

If you select the Show Execution Plan option from the Query menu the execution plan is displayed only after the stored procedure has been executed. However, if you don't want to execute the stored procedure you can choose the Display Estimated Execution Plan option. When you select this option SQL Server goes through the motions of deciding on the execution plan it would use without actually executing the T-SQL statements. This is a handy way of viewing the execution plan while developing the procedure, as it doesn't make any changes to our database.

To view the estimated execution plan click on the following button within Query Analyzer, shown circled in the figure below:

click to expand

If we examine the Execution Plan tab, which is included with the usual Grids and Messages tabs along the bottom of our results window, we will see that it contains a graphical representation of our query execution plan:

click to expand

Important

This graphical information is not produced directly by SQL Server but is instead created by Query Analyzer from the information provided by SQL Server regarding the execution plan it has chosen to use. Therefore, the execution plan cannot be seen visually, unless we are using the Query Analyzer tool.

Textual View of the Execution Plan

The execution plan can also be displayed in text format by using the following command:

     SET SHOWPLAN_TEXT {ON | OFF} 

As we mentioned earlier, most people will find the graphical execution plan easier to understand. However, if you need to retain the execution plan for an automated task or future analysis then the text option will be more suitable, as there is currently no in-built method for saving the graphical execution plan.

To view the execution plan as text, first SET SHOWPLAN_TEXT must be executed as the only command within a batch. Now when the query is executed, we receive the textual representation of the execution plan within our results pane:

click to expand

If we expand the results of this statement to make viewing this a little easier, we see that the first rowset contains a copy of our original query, whereas the second rowset contains the actual query execution plan that was followed by the SQL Server:

     |--Compute Scalar(DEFINE:([Expr1002]=If ([Expr1009]=0) then NULL          else [Expr1010]))         |--Stream Aggregate(DEFINE:([Expr1009]=Count(*),           [Expr1010]=SUM([od].[UnitPrice])))             |--Nested Loops(Inner Join, OUTER REFERENCES:([o].[OrderID]))               |--Clustered Index Scan(OBJECT:([Northwind].[dbo].                 [Orders].[PK_Orders] AS [o]),                 WHERE:([o].[Freight]>1000.00))                   |--Clustered Index Seek(OBJECT:([Northwind].[dbo].                     [Order Details].[PK_Order_Details] AS [od]),                     SEEK:([od].[OrderID]=[o].[OrderID]) ORDERED FORWARD) 

Although not immediately obvious, this is the same execution plan that was represented visually in our previous example (the one relating to the graphical execution plan).

The Execution Plan and Stored Procedures

For the sake of simplicity, so far we have been demonstrating execution plans using simple T-SQL statements. However, as this book is on stored procedures, it would be wise to address the issue of what happens to the execution plan when the statements are contained within a stored procedure. This is really quite easy, as the basic steps needed to resolve the query remain the same.

For example, let's convert the above example into a stored procedure:

     CREATE PROCEDURE SumOfPriceForHighFreight     AS       SELECT Sum(od.UnitPrice)       FROM Orders o       INNER JOIN [Order Details] od ON o.OrderID = od.OrderID       WHERE o.Freight > 1000 

On executing this stored procedure in the Query Analyzer (with Show Execution Plan turned on), we can see that the resulting execution plan remains the same as in the previous example:

click to expand

When optimizing a stored procedure, examining the execution plan is indispensable. It shows us the resource (such as time, CPU, and disk) utilization that the stored procedure is consuming, as a whole, allocated proportionally to each of the SQL statements within that procedure. This allows us to focus our attention on the SQL statements that have the potential for making the maximum impact.

If you need to optimize stored procedures frequently, you should learn to make the most out of the execution plan. Quite often, developers attempt to optimize a stored procedure without using the execution plan and end up being misguided. They spend a lot of time optimizing a SQL statement that is consuming a very low percentage of the total stored procedure resources, and thus miss out on optimizing the offending statements completely.

Multiple Statements within the Stored Procedure

When multiple statements are executed within a batch or within a stored procedure, a single execution path is created for each query in the batch or stored procedure. However, each query's execution path is contained within one execution plan. For example, consider the following stored procedure:

     CREATE PROCEDURE SomeOrderInformation     AS       SELECT Count(*)       FROM Orders o       INNER JOIN [Order Details] od ON o.OrderID=od.OrderID       WHERE od.UnitPrice > 50       SELECT COUNT(*)       FROM Orders o       WHERE o.Freight < 1000 

When we execute this stored procedure in the Query Analyzer (remember to turn on the display of the execution plan), we can see that an execution path is returned for each statement. However, as the statements within the stored procedure are treated as a single unit, each of these execution paths are compiled into one execution plan that is retained for the stored procedure:

click to expand

Understanding the Execution Plan

Execution plans can be complicated, and you can spend a long time learning the functions of each operator in the execution plan. Once you have mastered the execution plan, you will be able to understand how SQL Server decides the execution plan for a query. Understanding the execution plan and then diagnosing performance issues is a scientific way of optimizing queries, unlike the usual approach of making a change and seeing if it runs quicker.

Due to the sheer number of operators (around 50), it's not possible for us to explain each one of them individually. Instead, we will offer pointers that will help you get started; for getting more detailed information you can refer to SQL Server Books Online, select the Index tab, and enter operators (symbols), as the keyword.

Cost

Understanding cost is the first step in understanding the execution plan from a performance perspective. To interpret the execution plan of a stored procedure, there are two costs that we need to be aware of:

  • Query cost (relative to batch)

  • Operator cost (relative to query)

Each of these differences, in application of cost, is explained next. However, the definition of cost remains the same for both situations. Cost is a metric used by the SQL Server for determining the impact of one operation in relation to another. In other words it is a measure of computing resources and takes into consideration things such as CPU, I/O, execution time needed, and so on.

Query Cost (relative to the batch)

When multiple SQL statements are contained within a stored procedure, the cost relative to the batch is expressed as a percentage. This percentage shows the relationship between the cost of each individual SQL statement, and the combined total of all SQL statements within the stored procedure:

click to expand

If we use our previous stored procedure, and examine the cost information within the execution plan, we can see that the cost of the first statement is 46.72% of the total cost, and the cost of the second statement is 53.28% of the total cost of the execution of the stored procedure. This shows that our second statement has a higher cost, or in other words, is slightly more resource-intensive than the first statement within this stored procedure.

Important

The actual cost results that you receive when executing these examples may vary, as it is related to the resource availability (CPU, IO, Memory, etc.) on your system.

Operator Cost (relative to query)

The operator cost, on the other hand, is local in scope to a given statement. This expresses the cost (resource usage) of a single operator, as a percentage of the entire resource usage of the single SQL statement:

click to expand

If we examine the operator cost of the first SQL statement of our SomeOrderInformation example, we will see that each operator has an associated cost. For this example, we can see that 99% of the cost of the first statement is consumed by the operator on the right (which is, in fact, the operation of pulling the rows out of the table). For more information on operators, please refer to SQL Server Books Online.

The Thick Arrows

Before we get into this section we need to create the following tables and populate them with the data that we need for this example. To do this, run the following script:

     SET NOCOUNT ON     CREATE TABLE ReallyReallyBig       (         IDCol INT NOT NULL,         VarCharCol VARCHAR (255) NOT NULL       )     CREATE TABLE ReallyReallyLittle       (         IDCol INT NOT NULL ,         VarCharCol VARCHAR (255) NOT NULL       )     DECLARE @Count INT     SELECT @Count=1     WHILE @Count<=10000     BEGIN       -- Insert a row with some nonsensical character data       INSERT ReallyReallyBig(IDCol, VarCharCol)       VALUES(@Count,REPLICATE(CHAR((@Count % 26)+64),@Count % 255))       SELECT @Count=@Count+1     END     INSERT ReallyReallyLittle(IDCol, VarCharCol)     VALUES(9999,'ABC') 

The arrows that connect operators within the execution plan represent the passing of rowsets from one operator to another. Although users of SQL Server don't see this happening, conceptually this is what happens within SQL Server during the execution of the query.

In addition to showing the passing of rowsets, the thickness of the line is also a quick indicator of the number of rows passed from one operator to another. This is another easy way of identifying potential performance issues. For example, consider the following query:

     SELECT *     FROM ReallyReallyBig rrb     INNER JOIN ReallyReallyLittle rrl on rrb.IDCol = rrl.IDCol 

Let's assume for the moment that the ReallyReallyBig table contains 10,000 rows, and the ReallyReallyLittle table contains only a single row. At the moment, these tables have no indexes, and so when we run this query it takes a while. Sure, in this example we can probably guess that creating an index on the ReallyReallyBig.IDCol column will improve performance.

Now let's, look at the execution plan for this example:

click to expand

We can see a couple of things happening. As there are no indexes, table scans are used to retrieve the rows from our tables. The top table scan (the one having a cost of 13%) is taking place on the ReallyReallySmall table, but this is not a problem in itself, as it produces only one row. However, the bottom table scan takes place on the ReallyReallyBig table, and we can tell by the width of the connecting line that a large number or rows, in this case 10,000, are being passed from the table scan operator to the hash match operator. Within the hash match operator, both rowsets are joined, producing a single row, as output. Therefore, while our query only produces a single row as output, internally SQL Server has to pass around and join together 10,000 rows, which is what is taking all the time.

Armed with this knowledge, we can mitigate this situation by creating a single index on our ReallyReallyBig table:

     CREATE NONCLUSTERED INDEX ix_ReallyReallyBig_idcol     ON ReallyReallyBig(IDCol) 

When we reissue the query, we can see that the table scan on our ReallyReallyBig table has been replaced with a more efficient Index Seek operator, and hence, SQL Server isn't passing around 10,000 rows internally anymore. Instead, each of the retrieve operators only produces a single row making the whole query significantly more efficient:

click to expand

So, what caused such a major saving of resource? When the index for the ReallyReallyBig table was absent, every row from that table had to be pulled off the disk, in the form of a table scan, as SQL Server did not know in advance which rows to join to our ReallyReallySmall table. These rows (10,000 of them) then had to be passed to the hash match operator that compared the values of the join column in the ReallyReallyBig table with all the values of the join column in the ReallyReallySmall table. The matching rows are kept; the others discarded. In this example, only one row matched while the other 9999, from our ReallyReallyBig table are thrown away.

Benefits of Understanding Cost

Once you understand the application of cost, optimization becomes a more scientific process. For example, if you have a complex stored procedure that contains many SQL statements the obvious way to approach this problem would be to:

  1. Execute the stored procedure with the Show Execution Plan option turned on

  2. Examine the query cost (relative to the batch) figures to see which queries are taking most of the resources

  3. Examine the operator cost to see which operator(s) within that query are consuming the most resources

  4. Optimize the query to reduce the operator cost

  5. Repeat steps 1 through 4 for the other queries having high costs

This simple understanding of cost can help to save us many hours in optimization time that could possibly occur if we are unnecessarily attempting to optimize low-cost SQL statements.




SQL Server 2000 Stored Procedures Handbook
SQL Server 2000 Stored Procedures Handbook (Experts Voice)
ISBN: 1590592875
EAN: 2147483647
Year: 2005
Pages: 100

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net