Reusing Query Plans | Designing for Scalability with Microsoft Windows DNA (DV-MPS Designing)

[Previous] [Next]

One of the ways to improve the performance of SQL Server or any relational DBMS is to give it less work to do when it executes a query. The work that any DBMS has to do to execute a query can be cut into two slices:

It has to make up a plan for how to execute the query. This involves several steps, including a check that the syntax is correct; a check that all the tables, views, columns, and other database objects referenced in the query really exist; and an investigation into whether any useful resources, such as indexes, exist and might help increase execution speed. This part of the work also involves setting up one or more alternative execution plans and comparing them with each other to find the one that will perform the job fastest. A relational engine, often called the brain of the system, is responsible for this part of the work.

When the relational engine has made up an execution plan, it asks the storage engine to perform the job. The storage engine, the workhorse of the system, physically stores data in files. It finds and manipulates this data for the relational engine.

If you can set up your queries in such a way that SQL Server can skip the first of these two steps for most executions, SQL Server can respond by giving you better performance. SQL Server offers two ways of skipping the first step in reusing the execution plan of stored procedures and in reusing the execution plan of carefully designed SQL queries, sent as text from the client to the database. Much of this chapter is about such reuse of existing execution plans.

Stored Procedures for Caching Plans

Better performance has always been a great argument for using stored procedures in SQL Server and similar products. It has been claimed that you should normally expect a 40 percent increase in performance when using stored procedures rather than sending SQL statements from the client to the DBMS. Every time we've tested this expectation, we've come up with figures that support such claims. Stored procedures are much faster than ordinary SQL statements that are sent from the client, and this improvement comes from reuse of execution plans.

In SQL Server, a stored procedure in its simplest form is a SQL statement cached in SQL Server's cache memory. All SQL Server stored procedures aren't as simple as that: some stored procedures contain several SQL statements. A stored procedure can also contain commands such as IF and WHILE for controlling the logical flow of procedure execution. But let's forget that for a while and just look at the simplest possible stored procedures, those that contain only one SQL statement, and see how they can render improved performance.

The source code for any SQL Server stored procedure is stored in a SQL Server file. When a user calls the procedure, and the procedure isn't already in cache memory, SQL Server puts it there and then keeps it there as long as it doesn't conflict with other, more urgent, needs. For instance, if the cache is full when another procedure needs to be cached and executed, and if "our" procedure hasn't been called again since that first time, SQL Server throws it out of the cache to make room for the other procedure. Next time a user calls the removed procedure, SQL Server loads it into the cache again.¹

The following code creates a simple stored procedure:

 CREATE PROCEDURE GetTrainersHorses (@TrainerId Int) AS SELECT HorseId, HorseName FROM Horses WHERE Trainer = @TrainerId ORDER BY HorseName

The stored procedure created by the preceding code performs much better than the corresponding SQL code sent from the client because its execution plan is set up for its first client only. When another client requests the use of this procedure, SQL Server connects the client to the execution plan established for the first client. Therefore, SQL Server doesn't have to check its syntax and objects again, nor does SQL Server have to look for indexes or calculate the performance of several alternative execution plans. All this work has already been done for the first client and doesn't have to be performed again. The only thing SQL Server must do before executing the existing procedure is to make sure that the user logged in has the right to execute it.

Caching and Reusing SQL Statements

Earlier versions of SQL Server would cache the execution plan of stored procedures only. SQL Server 7.0, in contrast, also caches plans for ordinary SQL statements, sent from the client. When having similar statements sent to it, SQL Server 7.0 tries to match these statements to cached execution plans. Finding a match, SQL Server 7.0 reuses the cached plan, thus saving time and increasing performance.

Even though this is good news, you should understand that it doesn't take away all the advantages of stored procedures. Most of the advantages are still prominent. It's true that the possibility of reusing ad hoc statements reduces the performance advantage of stored procedures, but all the other advantages of stored procedures are still there. For a discussion of this issue, we recommend Ron Soukup and Kalen Delaney's book Inside SQL Server 7.0, in our view the best reference to SQL Server 7.0 you can have. There are plenty of other good books on SQL Server too: Deploying Microsoft SQL Server 7.0: Notes from the Field, SQL Server 7 Developer's Guide, and SQL Server 7: The Complete Reference being three examples.

Anyway, sometimes SQL Server easily finds and recognizes the match and reuses an existing plan; sometimes, however, it can't. If you really want to make the most of SQL Server's ability to reuse execution plans not only for stored procedures but also for ordinary SQL statements, you must follow certain rules that will help SQL Server find the match between the query text and the cached execution plan. The first of these rules is mandatory:

You must qualify every object reference in your SQL statements!

The execution plan of an ad hoc query like the following will never be reused because the table name isn't qualified:

 SELECT * FROM Horses ORDER BY HorseName

If you want reuse, you must qualify the table name by prefixing it with the name of the database. Because the name of the owner comes syntactically between the database name and the table name, you must include the user name of the owner too, as the following example shows:

 SELECT * FROM MSPress_ADBRacing.DBO.Horses ORDER BY HorseName

If, as in the preceding example, the table owner is also the database owner (DBO), which is the default owner of any database object, you could just add an extra dot between the database name and the table name, as in the following example:

 SELECT * FROM MSPress_ADBRacing..Horses ORDER BY HorseName

Parameterized statements

Now, with the table name qualified, there's a much better chance that SQL Server will be able to reuse an existing query plan, especially because the query is so simple; it will be dead easy for SQL Server to see the similarities. But if you add a WHERE clause, filtering the horses to sift through those trained by trainer Michael Kahn only, for example, it will be less easy. As you can assume from the following example, the value of Michael Kahn's trainer ID is 16:

 SELECT * FROM MSPress_ADBRacing..Horses WHERE Trainer = 16 ORDER BY HorseName

There's still a good chance that SQL Server will reuse the same execution plan when the user later asks for the horses that trainer 80, Tommy Gustafsson, trains. The following example shows that the query at large remains the same. The only difference is the value in the Trainer column.

 SELECT * FROM MSPress_ADBRacing..Horses WHERE Trainer = 80 ORDER BY HorseName

It's because the query is so simple, and because only one parameter is different between the calls, that chances for reuse are good in this case. There's a way, however, to make it even easier for SQL Server to discover the similarities between two queries, thus increasing the chance for reuse even more. Here's the second rule, which in contrast with the first rule isn't mandatory:

You can parameterize the query and use the new sp_executesql stored procedure to have it executed.

It's not difficult at all to follow the second rule for reuse, and the following example proves it. This code is a new version of the last of the preceding queries, the one asking for Tommy Gustafsson's horses, as an example:

 EXEC sp_executesql N'SELECT * FROM MSPress_ADBRacing..Horses WHERE Trainer = @TrainerId ORDER BY HorseName', N'@TrainerId int', 80

This statement, calling the sp_executesql procedure, has three parameters. Let's investigate each of them, one at a time:

The first argument to sp_executesql is the SQL statement to execute. In the example, the statement is a SELECT statement with qualified object names. This argument must be a Unicode string, which is the reason for the N prefix. Such a prefix converts the string constant coming after the prefix to a Unicode string.

The second argument is another Unicode string, consisting of a comma-separated list of parameter declarations. In our case, there's only one such parameter, @TrainerId, and it's declared as an integer.

The third and last argument is a list of parameter values, one for each of the parameters declared in the second argument. In the preceding example, you see only one such parameter value, which represents the identification code of the trainer specified.

ADO Command objects generate calls to sp_executesql

Our data access objects, as we have written them so far, don't use or generate calls to the sp_executesql stored procedure; they send ordinary SQL statements to the database, just as we have saved them in our strSQL variables. In this chapter's sample code, we don't call SetAbort or SetComplete because we want to focus on the use of ADO and of SQL, as our first example, taken from our sample application's HorseFetcher class, shows:

 Public Function GetListForNamePattern _ (ByVal strNamePattern As String) As Recordset Dim rs As ADODB.Recordset, strSQL As String Const strConn = _ "Provider=SQLOLEDB.1;Server=STEN_ARMADA7800; " & _ "User ID=sa;Initial Catalog=MSPress_ADBRacing" strSQL = _ "SELECT HorseId, HorseName " & vbCrLf & _ "FROM Horses " & vbCrLf & _ "WHERE HorseName LIKE '" & strNamePattern & _ "' " & vbCrLf & _ "ORDER BY HorseName" Set rs = CreateObject("ADOR.Recordset") rs.ActiveConnection = strConn rs.Source = strSQL rs.LockType = adLockReadOnly rs.CursorLocation = adUseClient rs.CursorType = adOpenStatic rs.Open Set rs.ActiveConnection = Nothing Set GetListForNamePattern = rs End Function

As you can see from the preceding code, the GetListForNamePattern method uses one kind of ADO object only: a recordset object. To see what kind of SQL this method would send to SQL Server, we called the method with 'So%' as the strNamePattern argument, expecting the method to return a list of ID and name values for horses with names starting with So. Then we used SQL Server 7.0's Profiler tool to see exactly what SQL statement ADO and OLE DB would send to SQL Server for execution. Here's what the profiler said was executed:

 SELECT HorseId, HorseName FROM Horses WHERE HorseName LIKE 'So%' ORDER BY HorseName

This is exactly what you would expect. And it's not the very best way to help SQL Server reuse an existing plan, earlier established to fetch data on horses with other name beginnings. The query isn't parameterized, and the table name in the FROM clause isn't qualified. There's no way SQL Server can reuse the execution plan it parses, checks, optimizes, and compiles for this query, and there's no way SQL Server can match this query to an existing query plan.

To improve our chances for plan reuse, we changed the method's Visual Basic code. We added an ADO Command object because a parameterized command object calls the sp_executesql procedure in SQL Server 7.0 rather than simply sending the SQL text. We also wanted to change the command text to qualify the table name. The following code is the new version of the GetListForNamePattern method, and it highlights all the changes we made:

Public Function GetListForNamePattern _ (ByVal strNamePattern As String) As Recordset Dim cmd As Command, rs As Recordset, strSQL As String Const strConn = _ "Provider=SQLOLEDB.1;Server=STEN_ARMADA7800; " & _ "User ID=sa;Initial Catalog=MSPress_ADBRacing On Error GoTo GetListForNamePatternErr strSQL = _ "SELECT HorseId, HorseName " & vbCrLf & _ "FROM MSPress_ADBRacing..Horses " & vbCrLf & _ "WHERE HorseName LIKE ? "  & _ "ORDER BY HorseName"  Set cmd = CreateObject("ADODB.Command") cmd.ActiveConnection = strConn cmd.CommandType = adCmdText cmd.CommandText = strSQL cmd.Parameters(0) = strNamePattern Set rs = CreateObject("ADODB.Recordset")  Set rs.Source = cmd rs.LockType = adLockReadOnly rs.CursorLocation = adUseClient rs.CursorType = adOpenStatic rs.Open Set rs.ActiveConnection = Nothing Set GetListForNamePattern = rs  Set cmd.ActiveConnection = Nothing Set cmd = Nothing Exit Function GetListForNamePatternErr: Err.Raise Number:=Err.Number, Source:=Err.Source, _ Description:=Err.Description End Function

After changing the method's code, we called it again, using the same parameter value as before. The following code snippet, taken from SQL Server 7.0's Profiler, shows what the ADO Command object sent to SQL Server this time:

 sp_executesql N'SELECT HorseId, HorseName FROM HorseDatabase..Horses WHERE HorseName LIKE @P1 ORDER BY HorseName', N'@P1 varchar(20)', 'So%'

Now the SQL statement that ADO generated and executed is optimized for reuse. The query is parameterized, and the SQL object names are qualified. Chances for reuse of execution plans are now high, thanks to rather minor changes in our Visual Basic code.

Lesson to Learn

If you want to increase the scalability and performance of your application, you'll do well to make sure SQL Server (or whichever DBMS you use) reuses similar SQL execution plans as much as possible. The best way to assure reuse is to use stored procedures. If you can't or won't use stored procedures, you should try your very best to help SQL Server reuse execution plans for the ordinary SQL statements you send to the database. You can do two things to make it easier for SQL Server to match two similar queries to each other:

You can qualify the name of the table or view with the name of the database and an implicit (HorseDatabase..Horses) or explicit (HorseDatabase.DBO.Horses) reference to the owner of the object. This is mandatory. If you don't do this, you won't get reuse.

You can parameterize your query, thereby making it easier for SQL Server to discern the similarity between two queries. A good way to do that is to use parameterized ADO Command objects since they generate calls to the sp_executesql stored procedure and since that stored procedure is there to help SQL Server reuse execution plans. You don't have to parameterize your queries, and for simple queries it probably doesn't matter whether or not you do. But for more complicated queries, using multiple parameters can make quite a difference.