Using Stored Procedures and Caching Mechanisms | Inside Microsoft SQL Server 2000

Stored procedures are the most cost-effective mechanism for executing SQL commands from your applications, and I recommend that you use them whenever possible rather than passing ad hoc SQL statements from your applications. Chapters 3 and 11 discussed the efficiencies that stored procedures bring in terms of not requiring compilation of an execution plan for each execution. Plans can be reused and stay cached and available for subsequent use. Reusing a saved plan can provide a major performance advantage. The query optimizer might evaluate thousands of possible plans, so compilation and optimization time can end up being a more significant percentage of total execution time. It is critical that you understand when recompilation takes place and try to avoid it if at all possible.

SQL Server can save on recompilation using four mechanisms to make plan caching accessible in a wider set of scenarios than just using stored procedures:

Ad hoc caching
Autoparameterization
The sp_executesql procedure
The prepare and execute method

NOTE
The information here on caching query plans was adapted from a preliminary copy of a whitepaper by Peter Scharlock.

Ad Hoc Caching

SQL Server caches the plans from ad hoc queries, and if a subsequent batch matches exactly, the cached plan is used. This feature requires no extra work to use, but it is limited to exact textual matches. For example, if the following three queries are submitted in the Northwind database, the first and third queries will use the same plan but the second one will need to generate a new plan:

 SELECT * FROM orders WHERE customerID = 'HANAR' SELECT * FROM orders WHERE customerID = 'CHOPS' SELECT * FROM orders WHERE customerID = 'HANAR'

Autoparameterization

For simple queries, SQL Server guesses which constants might really be parameters and attempts to treat them as parameters. If this is successful, subsequent queries that follow the same basic template can use the same plan. The following four templates can be used for autoparameterization. Note that key-expression is an expression involving only column names, constants, AND operators, and comparison operators

(<, >, =, <=, >=, and <>).

 INSERT table VALUES ({constant | NULL | DEFAULT}, ...) DELETE table WHERE key-expression UPDATE table SET colname = constant WHERE key-expression SELECT {* | column-list} FROM table WHERE key-expression ORDER BY column-list

For example, these two queries will use the same plan:

 SELECT firstname, lastname, title FROM employees WHERE employeeID = 6 SELECT firstname, lastname, title FROM employees WHERE employeeID = 2

Internally, SQL Server parameterizes these queries as follows:

 SELECT firstname, lastname, title FROM employees WHERE employeeID = @p

SQL Server can allow other queries of the same template to use the same plan only if the template is safe. A template is safe if the plan selected will not change even if the actual parameters change. This ensures that autoparameterization won't degrade a query's performance.

The SQL Server query processor is much more conservative about deciding whether a template is safe than an application can be. SQL Server guesses which values are really parameters, whereas your application should actually know. Rather than rely on autoparameterization, you should use one of the following two mechanisms to mark parameters when they are known.

The sp_executesql Procedure

The stored procedure sp_executesql is halfway between ad hoc caching and stored procedures. Using sp_executesql requires that you identify the parameters but doesn't require all the persistent object management needed for stored procedures.

Here's the general syntax for the procedure:

 sp_executesql @batch_text, @batch_parameter_definitions,  param1,...paramN

Repeated calls with the same batch_text use the cached plan, with the new parameter values specified. The same cached plan is used in all the following cases:

 EXEC sp_executesql N'SELECT firstname, lastname, title FROM employees WHERE employeeID = @p', N'@p tinyint', 6 EXEC sp_executesql N'SELECT firstname, lastname, title FROM employees WHERE employeeID = @p', N'@p tinyint', 2 EXEC sp_executesql N'SELECT firstname, lastname, title FROM employees WHERE employeeID = @p', N'@p tinyint', 6

ODBC and OLE DB expose this functionality via SQLExecDirect and ICommandWithParameters. The ODBC and OLE DB documentation provide more details.

The Prepare and Execute Method

This last mechanism is like sp_executesql in that parameters to the batch are identified by the application, but there are some key differences. The prepare and execute method does not require the full text of the batch to be sent at each execution. Rather, the full text is sent once at prepare time; a handle that can be used to invoke the batch at execute time is returned. ODBC and OLE DB expose this functionality via SQLPrepare/SQLExecute and ICommandPrepare. You can also use this mechanism via ODBC and OLE DB when cursors are involved. When you use these functions, SQL Server is informed that this batch is meant to be used repeatedly.

Sharing Cached Plans

To allow users to share plans and thus maximize the effectiveness of the caching mechanisms, all users should execute in the same environment. Don't change SET options or database settings in the middle of an application or connection.

For all of the caching mechanisms, reusing a cached plan avoids recompilation and optimization. This saves compilation time, but it means that the same plan is used regardless of the particular parameter values passed in. If the optimal plan for a given parameter value is not the same as the cached plan, the optimal execution time will not be achieved. For this reason, SQL Server is very conservative about autoparameterization. When an application uses sp_executesql, prepare and execute, or stored procedures, the application developer is responsible for determining what should be parameterized. You should parameterize only constants whose range of values does not drastically affect the optimization choices.

The SQL Server Performance Monitor includes a counter called SQL Server:SQL Statistics that has several counters dealing with autoparameterization. You can monitor these counters to determine whether there are many unsafe or failed autoparameterization attempts. If these numbers are high, you can inspect your applications for situations in which the application can take responsibility for explicitly marking the parameters.

Examining the Plan Cache

The system table syscacheobjects keeps track of the compiled objects in cache at any time. Note that this table exists only in the master database and is really a pseudotable. It takes no space on disk and is materialized only when someone writes a query to access it. Table 15-2 lists some of the more useful columns in the syscacheobjects table.

Table 15-2. Useful columns in the syscacheobjects table.

Column Name	Description
bucketid	The bucket ID for this plan in an internal hash table; the bucket ID helps SQL Server locate the plan more quickly. Two rows with the same bucket ID refer to the same object (for example, the same procedure or trigger).
cacheobjtype	Type of object in the cache.
objtype	Type of object.
objid	One of the main keys used for looking up an object in the cache. This is the object ID stored in sysobjects for database objects (procedures, views, triggers, and so on). For cache objects such as ad hoc or prepared SQL, objid is an internally generated value.
dbid	Database ID in which the cache object was compiled.
uid	The creator of the plan (for ad hoc query plans and prepared plans).
refcounts	Number of other cache objects that reference this cache object.
usecounts	Number of times this cache object has been used since inception.
pagesused	Number of memory pages consumed by the cache object.
setopts	SET option settings that affect a compiled plan. Changes to values in this column indicate that users have modified SET options.
langid	Language ID of the connection that created the cache object.
dateformat	Date format of the connection that created the cache object.
sql	Procedure name or first 128 characters of the batch submitted.

Cacheobjtype refers to the type of plan. It can have one of the following values:

Compiled Plan The actual plan generated during compilation and optimization. A compiled plan can be shared by sessions each running the same procedure or query.
Executable Plan The execution environment that runs the compiled plan. Each concurrent invocation of the same compiled plan will have its own executable plan. Caching this environment information saves setup costs, which can be significant for a frequently run query. All executable plans must have an associated compiled plan with the same bucketid, but not all compiled plans will have an associated executable plan saved.
Parse Tree The internal form of a query prior to compilation and optimization.
Cursor Parse Tree The parse tree for a cursor's query.

Objtype refers to the type of object whose plan is cached. It can have one of the following values:

Proc For both stored procedures and inline functions.
Prepared For queries submitted using sp_executesql and using the prepare and execute method and for queries that SQL Server has decided to autoparamaterize.
Ad hoc query For queries that don't fall into any other category (as discussed above).
ReplProc For replication agents.
Trigger A type whose plans are very similar to procedure or inline function plans.
View A type for which you typically see only a parse tree, not a plan. You see this type not only when a view is accessed, but also when a non-inline function is encountered. The function does not have its own separate plan, but is expanded as part of whatever query it is part of.
Table A type for which a user or system table will generate a parse tree object if it has computed columns.
Default, Check, Rule Types that are expanded into parse trees and compiled as part of the query in which they are activated.

Two other columns, refcount and usecount, indicate how the plans are being used. The refcount value indicates how many times the plan has been referenced. An executable plan in cache will always have a refcount of 1. An executable plan might actually have a refcount greater than 1 if the plan is currently being executed, but that will not show up in the syscacheobjects table. The refcount for compiled plans is 1 + the number of execution plans, either in cache or in use. The usecount value refers to the number of times a cached object is used. Each time the plan is looked up and found, the usecount is incremented. If you see a value of 2 for usecount after the first execution of a procedure or query, that is because the plan is accessed once when it is created and stored in cache, and once again when it is retrieved from cache and executed.

Multiple Plans in Cache

SQL Server will try to limit the number of plans for a query or a procedure. Because plans are reentrant, this is easy to accomplish. You should be aware of some situations that will cause multiple plans for the same procedure to be saved in cache. The most likely situation is a difference in certain SET options, database options, or configuration options. For example, a stored procedure that concatenates strings might compile the concatenation differently depending on whether the option CONCAT_NULL_YIELDS_NULL is on or off or whether the corresponding database option is true or false. If a user executes the procedure with the option on, that person will use a different plan than if the option is off. The column setopts in the syscacheobjects table is a bitmap indicating which SET options were in effect when the plan was placed in cache. If you start a different session with different SET options enabled, SQL Server will always generate a new plan. The options included in the setopts field include:

 ANSI_PADDING FORCEPLAN CONCAT_NULL_YIELDS_NULL ANSI_WARNINGS ANSI_NULLS QUOTED_IDENTIFIER ANSI_NULL_DFLT_ON ANSI_NULL_DFLT_OFF

Two sessions that have set different languages or different dateformat values will also always generate separate plans. SQL Server must then also keep the langid value and dateformat value in syscacheobjects so that these values can be compared with future attempts to execute the same plan.

One other connection issue can affect whether a plan can be reused. If an owner name must be resolved implicitly, a plan cannot be reused. For example, suppose user sue issues the following SELECT statement:

 SELECT * FROM mytable

SQL Server will first try to resolve the object by looking for an object called mytable owned by sue, and if no such object can be found, it will look for an object called mytable owned by the DBO. If user dan executes the exact same query, the object can be resolved in a completely different way (to a table owned by dan), so sue and dan could not share the plan generated for this query. However, the situation is different if sue issues this command:

 SELECT * FROM dbo.mytable

Now there's no ambiguity. Anyone executing this exact query will always reference the same object. In the syscacheobjects table, the column uid indicates the user ID for the connection in which the plan was generated. Only another connection with the same user ID value can use the same plan. The one exception is if the user ID value is recorded as _2 in syscacheobjects, which indicates that the query submitted does not depend on implicit name resolution and can be shared among different users. This is the preferred method.

TIP
If your server has been running for a while, the syscacheobjects table can get quite large and awkward to work with. If you want to run your own tests to determine what gets put in cache, and when plans are reused, you will probably want to clear out the cache occasionally to keep its size manageable. This can be done with the command DBCC FREEPROCCACHE, which removes all cached plans from memory. In addition, DBCC FLUSHPROCINDB(<dbid>) can be used to clear all plans from one particular database. DO NOT use these commands on your production servers as it could impact the performance of your running applications.

When to Use Stored Procedures and Other Caching Mechanisms

Keep the following guidelines in mind when you are deciding whether to use stored procedures or one of the other mechanisms:

Stored procedures Use when multiple applications are executing batches in which the parameters are known.
Ad hoc caching Useful in limited scenarios. Don't design an application to use this.
Autoparameterization Use for applications that cannot be easily modified. Don't design an application to use this.
The sp_executesql procedure Use when a single user might use the same batch multiple times and when the parameters are known.
The prepare and execute method Use when multiple users are executing batches in which the parameters are known, or when a single user will definitely use the same batch multiple times.

Recompiling Stored Procedures

In spite of the benefits of the mechanisms just discussed, you should still use stored procedures whenever possible. Besides giving you the benefits of precompilation, they minimize network traffic by reducing the text that needs to be sent from your application to SQL Server, and they provide a security mechanism to control who can access data and under what conditions. Stored procedures can be recompiled automatically or manually under various circumstances. Chapter 11 discussed some of these issues.

You can actually watch automatic recompilation occurring by using SQL Profiler: Choose to trace the event in the Stored Procedures category called SP:Recompile. If you also trace the event called SP:StmtStarting, you can see at what point in the procedure it is being recompiled. In particular, you might notice that stored procedures can be recompiled multiple times during their execution. For example, if you build a temporary table and later create an index on that table, and then add data to the table, your stored procedure will be recompiled numerous times. One way to avoid this is to include all data definition statements dealing with your temporary tables, as well as the insertion of rows into the temporary tables, right at the beginning of the procedure. So if the procedure must be recompiled, it won't happen more than once. Another way to prevent recompilations is to include the query hint KEEPPLAN in your statements that access the temporary tables. I'll discuss query hints in Chapter 16.

Although we've been assuming that recompilation is something you will usually want to avoid, this is not always the case. If you know that updated statistics can improve the query plan, or if you know that you have wildly different possible parameter values, recompilation can be a good thing.

As you saw, the system table syscacheobjects keeps track of the compiled objects in cache at any time. This table is accessible only by system administrators, but you can write your own stored procedures to provide information for your own tuning and testing purposes. You can use a procedure called sp_procs_in_cache (which is on the companion CD) to return a list of all procedure plans in cache and the number of times each plan occurs. The list is organized by database. If you execute the procedure without passing a parameter, you see procedures in your current database only. Alternatively, you can provide a specific database name or use the parameter all to indicate that you want to see procedure plans in cache from all databases. If you want to pull different information out of syscacheobjects, you can customize the procedure in any way you like. Remember to create the procedure in the master database so that it can be called from anywhere. Also remember to grant appropriate permissions if anyone other than a system administrator will be running it. The sp_procs_in_cache procedure shows both compiled plans and executable plans.

Other Benefits of Stored Procedures

Beyond the significant performance and security advantages, stored procedures can provide a valuable level of indirection between your applications and the database design. Suppose your application calls a procedure such as get_customer_balance and expects a result set to be returned. If the underlying database is then changed, the application can be totally unaffected and unaware of the change as long as the procedure also changes to return the result set as expected. For example, if you decide to denormalize your database design to provide faster query performance, you can change the stored procedure to respecify the query. If many applications call the procedure, you can simply change the stored procedure once and never touch the application. In fact, a running application doesn't even need to be restarted—it executes the new version of the stored procedure the next time it is called.

Another good reason to use stored procedures is to minimize round-trips on the network (that is, the conversational TDS traffic between the client application and SQL Server for every batch and result set). If you will take different actions based on data values, try to make those decisions directly in the procedure. Strictly speaking, you don't need to use a stored procedure to do this—a batch also provides this benefit. Issue as many commands as possible in a batch. You can try a simple test by inserting 100 rows first as a single batch and then as every insert in its own batch (that is, 100 batches). You'll see a performance improvement of an order of magnitude even on a LAN, and on a slow network the improvement will be stunning. While LAN speeds of 10 megabits per second (Mbps) are fast enough that the network is generally not a significant bottleneck, speeds can be tremendously different on a slow network. A modem operating at 28.8 kilobits per second (Kbps) is roughly 300 times slower than a LAN. Although WAN speeds vary greatly, they typically can be 100 to 200 times slower than a LAN. The fast LAN performance might hide inefficient network use. You can often see a prime example of this phenomenon when you use cursors. Fetching each row individually might be tolerable on a LAN, but it is intolerable when you use dial-up lines.

If a stored procedure has multiple statements, by default SQL Server sends a message to the client application at the completion of each statement to indicate the number of rows affected for each statement. This is known as a DONE_IN_PROC message in TDS-speak. However, most applications do not need DONE_IN_PROC messages. So if you are confident that your applications do not need these messages, you can disable them, which can greatly improve performance on a slow network when there is otherwise little network traffic.

You can use the connection-specific option SET NOCOUNT ON to disable these messages for the application. While they are disabled, you cannot use the ODBC SQLRowCount function or its OLE DB equivalent. However, you can toggle NOCOUNT on and off as needed, and you can use select @@rowcount even when NOCOUNT is on. You can also suppress the sending of DONE_IN_PROC messages by starting the server with trace flag 3640, which lets you suppress what might be needless overhead without touching your application. However, some ODBC applications depend on the DONE_IN_PROC message, so you should test your application before using trace flag 3640 in production—it can break applications that are written implicitly to expect that token.

You should keep an eye on the traffic between your client applications and the server. An application designed and tuned for slow networks works great on a fast network, but the reverse is not true. If you use a higher-level development tool that generates the SQL statements and issues commands on your behalf, it is especially important to keep an eye on what's moving across the network. You can use the SQL Profiler to watch all traffic into the server. Just to monitor network traffic, though, this might be overkill; a network sniffer such as Network Monitor (available with Microsoft Windows NT Server, Microsoft Windows 2000 Server and Microsoft Systems Management Server) can work better if you want only to monitor network traffic. Network Monitor is easy to use, and even a networking neophyte can set it up quickly and with a few mouse clicks watch the traffic between machine pairs.