Executing Batches, or What s Stored About Stored Procedures (and Functions)?

Typically, when a batch of Transact-SQL commands is received from a client connection, the following high-level steps are performed:

Step One: Parse Commands and Create the Sequence Tree

The command parser checks for proper syntax and translates the Transact-SQL commands into an internal format that can be operated on. The internal format is known as a sequence tree or a query tree. The command parser handles these language events.

Step Two: Compile the Batch

An execution plan is generated from the sequence tree. The entire batch is compiled, queries are optimized, and security is checked. The execution plan contains the necessary steps to check any constraints that exist. If an after trigger exists for any data modification statement, the call to that trigger is appended to the execution plan for the statement. If an instead-of trigger exists, the call to that trigger replaces the execution plan for the data modification statement. Recall that a trigger is really a specialized type of stored procedure. Its plan is cached, and the trigger doesn't need to be recompiled every time it is invoked.

The execution plan includes the following:

All the necessary steps to carry out the commands in the batch or stored procedure.
The steps needed to enforce constraints. (For example, for a foreign key, this would involve checking values in another table.)
A branch to the stored procedure plan for a trigger, if one exists.

Step Three: Execute

During execution, each step of the execution plan is dispatched serially to a manager that's responsible for carrying out that type of command. For example, a data definition command (in DDL), such as CREATE TABLE, is dispatched to the DDL manager. DML statements, such as SELECT, UPDATE, INSERT, and DELETE, go to the DML manager. Miscellaneous commands, such as DBCC and WAITFOR, go to the utility manager. Calls to stored procedures (for example, EXEC sp_who) are dispatched to the stored procedure manager. A statement with an explicit BEGIN TRAN interacts directly with the transaction manager.

Contrary to what you might think, the execution plans for stored procedures and functions are not permanently stored on disk. (This is a feature—the execution plan is relatively dynamic.) Think for a moment about why it's important for the execution plan to be dynamic. As new indexes are added, preexisting indexes are dropped, constraints are added or changed, and triggers are added or changed; or as the amount of data changes, the plan can easily become obsolete.

So, what's stored about a stored procedure or function?

The SQL statements that were used to create the routine are stored in the system table syscomments. The first time a routine is executed after SQL Server was last restarted, the SQL text is retrieved and an execution plan is compiled. The execution plan is then cached in SQL Server's memory, and it remains there for possible reuse until it's forced out in a least recently used (LRU) manner. We'll look at more details about the SQL Server's caching mechanism and when plans are forced out of cache in Chapter 15.

Hence, a subsequent execution of the routine can skip not only step 1, parsing, but also step 2, compiling, and go directly to step 3, execution. Steps 1 and 2 always add some overhead and can sometimes be as costly as actually executing the commands. Obviously, if you can eliminate the first two steps in a three-step process, you've done well. That's what the stored routines let you do. Note that most of this discussion of routines applies to triggers as well as to stored procedures and functions, even though triggers are not listed in the SCHEMA VIEW called ROUTINES.

When you execute a routine, if a valid execution plan exists in the procedure cache, it will be used (eliminating the parsing and compiling steps). When the server is restarted, no execution plans will be in the cache; so the first time the server is restarted, routines will all be compiled the first time they are executed.

TIP
You can preload your procedure cache with execution plans for routines by defining a startup stored procedure that executes the routines you want to have compiled and cached.

After a routine executes, its plan remains in the cache and is reused the next time any connection executes the same routine. In addition, because multiple connections might try to execute the same plan concurrently, a part of the plan (called the shareable portion) is reentrant. Each concurrent user of that plan also requires an execution context. If you think of this in traditional programming terms, this execution context is needed to contain the dirty data.

The execution context tends to be fairly small compared to the plans. The execution contexts are themselves serially reusable for certain plans that don't contain special operators. Figures 11-1 and 11-2 show execution with and without a stored routine.

click to view at full size.

Figure 11-1. Efficient execution using a stored routine.

click to view at full size.

Figure 11-2. Less efficient execution with an ad hoc query.

In Chapter 15, you'll see that even ad hoc queries do not need to be recompiled every time they're run. But when and how their plans are stored and reused is not nearly as predictable as with stored routines. In addition, you have little explicit control over the recompilation of ad hoc queries.

Step Four: Recompile Execution Plans

By now, it should be clear that the SQL code for routines persists in the database but execution plans don't. Execution plans are cached in memory. But sometimes they can be invalidated and a new plan generated.

So, when is a new execution plan compiled?

The short answer is: whenever SQL Server needs to! This can include the following cases:

When a copy of the execution plan isn't available in memory.
When an index on a referenced object is dropped.
When updated statistics are available for any table used by the routine. (Chapter 15 covers statistics in detail.)
When an object referenced in the routine is altered using ALTER TABLE.
When the object has been named as the argument to sp_recompile. If the object is a routine, the routine will be marked so that it is recompiled the next time it is run. If the object is a table or view, all the routines that reference that object will be recompiled the next time they are run. The system procedure sp_recompile increments the schema column of sysobjects for a given object. This invalidates any plans for the object as well as any plans that reference the object, as described in the previous examples.
When a stored procedure has been created using the WITH RECOMPILE option. A stored procedure can be created using WITH RECOMPILE to ensure that its execution plan will be recompiled for every call and will never be reused. Note that the WITH RECOMPILE option is available only when you create stored procedures and is not available for function creation. Creating a stored procedure WITH RECOMPILE can be useful if procedures take parameters and the values of the parameters differ widely, resulting in a need for different execution plans to be formulated. For example, if a procedure is passed a value to be matched in the WHERE clause of a query, the best way to carry out that query can depend on the value passed. SQL Server maintains statistics for each index as a histogram to help it decide whether the index is selective enough to be useful.
For one given value, an index might be highly selective, and the distribution statistics might indicate that only 5 percent of the rows have that value. SQL Server might decide to use that index because it would exclude many pages of data from having to be visited. Using the index would be a good strategy, and the cached plan for the procedure might include the instructions to use the index. However, using another value, the index might not be so selective. Rather than use only the index to visit most of the data pages (ultimately doing more I/O because you're reading both the index and the data), you'd be better off simply scanning the data and not reading the index.

For example, suppose we have a nonclustered index on the color column of our automobile table. Forty percent of the cars are blue, 40 percent are red, and 5 percent each are yellow, orange, green, and purple. It's likely that a query based on color should scan the table if the color being searched on is blue or red, but it should use the index for the other colors. Without using the WITH RECOMPILE option, the execution plan created and saved would be based on the color value the first time the procedure was executed.

So if we passed yellow to the procedure the first time it executed, we'd get a plan that used an index. Subsequently, if we passed blue, we might be able to use the previous plan that was created for yellow. In this case, however, we'd be traversing a nonclustered index to access a large percentage of rows in the table. This could end up being far more expensive than simply scanning the entire table. In such a case, when there's a lot of variance in the distribution of data and execution plans are based on the parameters passed, it makes sense to use the WITH RECOMPILE option.

This example also shows that two execution plans for the same procedure can be different. Suppose we're not using WITH RECOMPILE, and we execute the procedure for both blue and green simultaneously from two different connections. Assume for a moment that no plan is cached. Each will generate a new plan, but one plan will use the index and the other won't. When a subsequent request for red arrives, the plan it uses will be a matter of chance. And if two simultaneous calls come in for red and each plan is available, the two equivalent requests will execute differently because they'll use different plans. If, as you're processing queries, you see significant deviations in the execution times of apparently identical procedures, think back to this example.
When the stored procedure or function is executed using the WITH RECOMPILE option. This case is similar to the preceding one, except that the routine isn't created with the option; rather, the option is specified when the routine is invoked. The WITH RECOMPILE option can always be added upon execution, forcing a new execution plan to be generated. Note that you can use this option with user-defined scalar functions, but only if you invoke the function with the EXEC keyword.

Storage of Routines

With each new routine, a row is created in the sysobjects table, as happens for all database objects. The text of a routine (including comments) is stored in syscomments, which is typically useful. Storing the text allows procedures such as sp_helptext to display the source code of a routine so that you can understand what's going on, and it allows function editors and debuggers to exist.

Most users and developers find it helpful to have the full text of a routine stored in clear text. There is a limit on the size of the text for a routine due to the amount of text that syscomments can store for a given routine, but you're not likely to run up against that limit soon. The syscomments.text field can hold up to 8000 bytes, and you can have multiple rows in syscomments for each routine. The sequence of the rows for a given routine is tracked in the colid column. Since colid is a smallint, there can be up to 32,767 rows for any routine. Any one routine should then be able hold 8000 × 32,767 bytes, or 250 MB. There's one additional limiting factor, however: the maximum size of a batch. To create the routine, the client must send the complete definition to SQL Server in the CREATE statement, and this must be sent as a single batch. SQL Server has a limit on the batch size, which is 65,536 times the network packet size. Because the default value for network packet size is 4096 bytes, we get a batch size of 256 MB. So the limit on the size of your routines text isn't worth worrying about right away.

Encrypting Routines

With the rollout of version 6, the SQL Server developers at Microsoft learned somewhat painfully that some users didn't appreciate that the text of certain programmable objects was available in the syscomments system table. Several ISVs had built integrated solutions or tools that created stored procedures to use with earlier versions of SQL Server. In most cases, these solutions were sophisticated applications, and the ISVs viewed the source code as their proprietary intellectual property. They noticed that the text of the procedure in syscomments didn't seem to do anything, so they set the text field to NULL and the procedure still ran fine. In this way, they avoided publishing their procedure source code with their applications.

Unfortunately, when it came time to upgrade a database to version 6, this approach created a significant problem. The internal data structures for the sequence plans had changed between versions 4.2 and 6. The ISVs had to re-create procedures, triggers, and views to generate the new structure. Although SQL Server's Setup program was designed to do this automatically, it accomplished the tasks by simply extracting the text of the procedure from syscomments and then dropping and re-creating the procedure using the extracted text. Obviously, this automatic approach failed for procedures in which the creator had deleted the text.

When the SQL Server developers at Microsoft learned of this problem after the release of the version 6 beta, they immediately understood why developers had felt compelled to delete the text. Nonetheless, they couldn't undo the work that they had already done. Developers with these ISVs had to dig out their original stored procedure creation scripts and manually drop and re-create all their procedures. It might have been possible to create a converter program that would operate purely on the internal data structures used to represent procedures and views, but it wasn't practical. Attempting this would have been like developing a utility to run against an executable program and have it reverse-engineer the precise source code (more than a disassembly) that was used to create the binary. The SQL source code compilation process is designed to be a descriptive process that produces the executable, not an equation that can be solved for either side.

After the beta release but before the final release of version 6, the developers added the ability to encrypt the text stored in syscomments for stored procedures, triggers, and views. This allowed programmers to protect their source code without making it impossible for the upgrade process to re-create stored procedures, triggers, and views in the future. You can now protect your source code by simply adding the modifier WITH ENCRYPTION to CREATE PROCEDURE. No decrypt function is exposed (which would defeat the purpose of hiding the textlike source code). Internally, SQL Server can read this encrypted text and upgrade the sequence trees when necessary. Because the text isn't used at runtime, no performance penalty is associated with executing procedures created using WITH ENCRYPTION.

NOTE
You give up some capabilities when you use WITH ENCRYPTION. For example, you can no longer use the sp_helptext stored procedure or object editors that display and edit the text of the stored procedure, and you can't use a source-level debugger for Transact-SQL. Unless you're concerned about someone seeing your procedures, you shouldn't use the WITH ENCRYPTION option.

With SQL Server 2000, if you create a procedure, function, trigger, or view using WITH ENCRYPTION, the status column of syscomments will have a value of 1 (it's first bit will be set) and if it's not encrypted, the status column will be 2 (the second bit will be set). The encrypted and texttype columns of syscomments will also reflect the encryption status, but these columns are both computed columns based on the value that is in the status column. The value of texttype is 6 for an encrypted routine and 2 otherwise. If you want to programmatically determine whether a routine is encrypted, it's safer to check the encrypted column for a value of 0 or 1. Also, the SCHEMA VIEW called ROUTINES will have NULL in the routine_definition column if the routine is encrypted.

I created two procedures—one encrypted and one not—to illustrate the effects on syscomments:

 CREATE PROCEDURE cleartext AS SELECT * FROM authors GO CREATE PROCEDURE hidetext WITH ENCRYPTION AS SELECT * FROM authors GO SELECT o.name, o.id, number, colid, c.status, texttype, encrypted, text FROM syscomments c JOIN sysobjects o ON o.id=c.id WHERE o.id=OBJECT_ID('hidetext') OR o.id=OBJECT_ID('cleartext')

Here's the output:

 name id number colid status texttype encrypted ------------- ----------- ------ ------ ------ -------- --------- cleartext 1189579276 1 1 2 2 0 hidetext 1205579333 1 1 1 6 1 text -------------------------------- CREATE PROCEDURE cleartext AS SELECT * FROM authors ???????l???????????????????????

To find created objects that have encrypted text, you can use a simple query:

 -- Find the names and types of objects that have encrypted text SELECT name, type FROM syscomments c JOIN sysobjects o ON o.id=c.id WHERE encrypted = 1

Here's the output:

 name type --------------------- ---- hidetext P

The values that you might see in the type column are P (procedure), TR (trigger), V (view), FN (scalar function), and TN (table function). If you try to run sp_helptext against an encrypted procedure, it will return a message stating that the text is encrypted and can't be displayed:

 EXEC sp_helptext 'hidetext' The object's comments have been encrypted.

Altering a Routine

SQL Server allows you to alter the definition of a routine. The syntax is almost identical to the syntax for creating the routine initially, except that the keyword CREATE is replaced by the keyword ALTER.

The big difference is that the routine name used with ALTER must already exist, and the definition specified replaces whatever definition the routine had before. Just like when you use the ALTER VIEW command, the benefit of ALTER PROCEDURE or ALTER FUNCTION comes from the fact that the routine's object_id won't change, so all the internal references to this routine will stay intact. If other routines reference this one, they'll be unaffected. If you've assigned permissions on this routine to various users and roles, dropping the routine removes all permission information. Altering the routine keeps the permissions intact.