Coding CLR Stored Procedures

Now that you understand how to build CLR functions, you're ready to step through the implementation of the other CLR executables. Thankfully, most of the other executable types share a few attributes in common, so you won't have to start from scratch. We start with stored procedureslike CLR functions, CLR stored procedures can be prototyped and tested quite easily with Visual Studio 2005. However, there are a number of tips I can provide to make the process of coding, deploying, and debugging your CLR stored procedure easier. Let's start by comparing functions and stored procedures. Table 13.1 highlights the fundamental differences.

Table 13.1. Functions Versus Stored Procedures
Function	Stored Procedure
Supports 0 to N input parameters	Supports 0 to N input parameters
No OUTPUT parameters	Can return N OUTPUT parameters
Returns a single value	Returns a SqlInt32
Can return a T-SQL table	Can return N Rowsets
Cannot return messages	Can return "PRINT" or "RAISERROR" messages

Note that functions are generally used in expressions and aren't meant to return more than a single rowset (as a SQL Table). The function's return value is more flexible, in that you can return almost any datatype, while a stored procedure RETURN returns only a SqlInt32 value. However, that's deceptive, in that a stored procedure can return virtually any number of OUTPUT parameters of almost any type. When stored procedures are coded, the arguments passed in the "parameters collection" (the arguments passed to the CLR stored procedure) must be specifically designated so the CLR compiler knows which parameters are input and which are output. Sure, an output parameter can also accept an input value, and in testing, you might have to force a value to get Visual Studio to accept it. Later, I'll show you how to return one or more rowsets and "PRINT" messages from a CLR stored procedureagain, something you can't return from a function.

Tip

Later in this chapter, I illustrate how to code and debug a CLR stored procedure that accepts a CLR UDT as an input parameter.

The example stored procedure I'm going to build addresses one of the reputed weaknesses in T-SQLnamely, string handling. The challenge is to search a varchar(max)^[11] column for a specific set of words and report the hit count. The CHARINDEX function in T-SQL does a pretty good job of this, but it's only part of the coded needed to manage the incoming list of search words and the resulting output rowset.

^[11] varchar(max), nvarchar(max) are designed to supplant the TEXT datatype.

IMHO

When building any CLR executable, you have to balance the performance cost or benefit with the amount of developer productivity you gain or lose.

In the process of building this example, I created four versions of the codeeach a bit more sophisticated than the last. This approach should help you get a better understanding of the issues one at a time and not overwhelm you with a completed (albeit somewhat complex) end-product. Each of these examples is driven from a delimited list of words to locate in the BookContent column of the BookText table in the BookSource database (created just for this chapter). These examples^[12] can be briefly described as:

^[12] All of these examples are on the DVD under "CLR Stored Procedures."

The first version implemented the code using a T-SQL stored procedure (CountWords) that calls the aforementioned delimited stringtotable function and another T-SQL function to count the words in a string. The stored procedure plus the T-SQL functions total about 100 lines of T-SQL, including the comments. This is the benchmark against which the CLR procedures will be judged. The code for this stored procedure and companion functions are all on the DVD.
The remaining examples are all built using CLR stored procedures. The first of these CLR implementations (V1) closely approximates the approach used by the T-SQL example, which approximates what a developer might implement when converting an existing T-SQL routine. It executes the CountWordsInString function in conjunction with a SELECT statement that's executed for each word in the list of words to locate. It uses a Structure (C# struct) collection to store the data in memory. The backend of this example spools the contents of the in-memory collection back to SQL Server as a rowset. I'll show you exactly how that's done later in this section.
The second CLR example (V2) saves the results of the initial SELECT to a DataTable that's populated using a DataReader. This was implemented to permit easy sorting of the output rowset (as implemented in the T-SQL routine). I had planned to use a high-speed dump of the DataTable back to SQL Server using an extruded DataReader, but as I describe later in this section, the SqlDataReader required by the SqlPipe interface was not compatible with the DataTableReader returned by the new 2.0 function.
The third CLR example (V3) took a radical approachit did not use the server-side function to locate the words in the selected strings. Instead, I rolled my own FindWordInString function within the CLR stored procedure. This simple-looking routine takes advantage of the new IndexOf function exposed by the 2.0 Framework. It took considerable time to debug this routine so it matched the functionality of the T-SQL implementation. The results were pretty spectacular as I'll show you when we get to the performance comparisons later in this section.

Building the Example CLR Stored Procedures

Creating a CLR stored procedure using Visual Studio is not much different than creating a CLR function, so I'll skip down to the stuff that is different. Thankfully, testing is usually pretty easy as welluntil Visual Studio decides that it's quitting time when it seems that nothing will get it to work right. I'll show you how to avoid those situations and when to just reboot and start over. Again, it's always a good idea to keep saving your applications and exit Visual Studio from time to time to let it clean its house of leftover application chad.

Let's walk through the process of creating your first example CLR stored procedure.


1.	Start a new SQL Server Database project and add a new itemin this case, a Stored Procedure. This opens the default stored procedure template (as shown in Figure 13.33), which is not completely correct, so it's not that helpful. I say that because the template does not hint about how to pass input or output parameters or set the RETURN value. The beta versions I saw had some of this functionality included in the template. That's okay; I can add it back fairly easily. Let's step through the process of adding these features to my stored procedure. Figure 13.33. The unmolested CLR stored procedure template.

2.	Because the template suggests that you should use a Sub for a stored procedure, you're going to need to make some fundamental changes. A stored procedure (like a function) returns a value set by the RETURN statement in a T-SQL stored procedure or the Return operator in Visual Basic .NET. Yes, in the case of a stored procedure, the return value always cast as a SqlInt32. I generally use the RETURN value to set a "success" value0 means the stored procedure worked, and something else like 1 means it didn't. So, I really need to implement a standard stored procedure. The first thing I must do is change the stored procedure CLR "subroutine" code to a function and add the AS SqlInt32 clause to the Function declaration, as shown in Figure 13.34. Sure, it would be a great idea to change the Visual Studio template as well so that each new CLR stored procedure is created correctly. Figure 13.34. The "function" version of the CLR stored procedure implementation. The Visual Basic language does not verify that the Return and output parameters are set in your code. This is something that the C# compiler flags as warnings when you build the project. This helps ensure that all of these values are set before exiting the routine. Note that in Figure 13.34 (and the C# version in Figure 13.35), I've filled in the three parameters to be passed to and from the stored procedure. New for the RTM version^[13] is the requirement that output parameters be marked with the "<Out()>" attribute in addition to being passed ByRef (by reference) instead of ByVal (by value, the default). The <Out()> attribute is not needed in C#, as the language supports parameter direction attributesthe C# version of the template is shown in Figure 13.35. Note that Microsoft incorrectly assumes that you need a "subroutine" instead of a "function" template. The fleshed-out code illustrates that you must set the output and return value before exiting the routine. ^[13] I worked with the "alpha", beta, and CTP versions of SQL Server and Visual Studio for about two years before seeing the final shipping version. It changed a lot over the years. Figure 13.35. The C# prototype uses somewhat different notation.

3.	The first task the example takes on is to convert the input parameter containing the delimited list of words to use as search arguments into an array. This is accomplished in a single line of code, as shown in Figure 13.36. Remember, this took about 20 lines of T-SQL to implement. At this point, you're already ready to set the first OUTPUT parameter. This is easy; just assign a value to the variable declared in the stored procedure function declaration header. Figure 13.36. Parsing the delimited list into an array of search words. The next job the CLR stored procedure has to accomplish is to fetch the string (varchar(max)) from the BookSource database' BookText table. This is done by creating and opening an ADO.NET Connection instance. The ConnectionString property, in this case, is specialunique to CLR stored procedures. While made much simpler than the early betas, this concept sometimes confuses developers. Consider that this stored procedure is being executed on a specific SQL Server instance, so there is no need to open a connection to the serveryou're already "connected." Yes, you do need to tell ADO.NET that the query you're about to execute is to be run on the current context's SQL Server instance by setting the ConnectionString to "context connection=true", as shown in Figure 13.37 (line 24). No, there is no connection pool involved here, but it's still a good idea to close this Connection when you're done. Figure 13.37. Opening a Connection to the current SQL Server instance.

4.	Until I'm ready to return the results from the stored procedure I can proceed just as if I have set up a remote Connection. I'm ready to set up the query using a SqlCommand and populate the Parameters collection. Figure 13.38 illustrates how this is done. It should look pretty familiar by nowit's just an ordinary SqlCommand object instantiation. Figure 13.38. Creating a simple Command object.
5.	Your next task is to execute the SqlCommand once for each word in the list. Yes, it might be more efficient to avoid all of these queries, but these "round-trips" are far cheaper than those executed from a networked client. Note that the code still has exception handlers, but you can't include any "debug.print" messages or expose a dialog to the user to notify him or her of an issuethere is no "user". This code is running on a server somewhere, and even if the CLR would let you expose a dialog, there's no display to show it on. It's the task of the code calling this stored procedure to deal with these exceptions. Figure 13.39. Execute the query for each word in the collection.

6.	Once the SqlDataReader returns rows, it's time to capture the data returned. Remember, this rowset has the string used as a search argument (the "word"), the hitcount, and the name of the original source file. As I said earlier, this code saves this rowset data to a collection, as shown in Figure 13.40. Yes, you'll have to close the DataReader between invocations of the ExecuteReader methodjust like in a client/server or ASP.NET application. Note that I'm not performing anything time-consuming in this routineI don't want to delay rowset population any longer than necessary, so as to free the locks as soon as possible. Figure 13.40. Persisting the returned rowset in a Collection.
7.	Once the SqlDataReader returns False from the Read, you know that there is no more data to process. You can close the connection. You're now ready to proceed to phase 2returning the results of the query as one or more resultsets.

Using the SqlContext Class to Return Resultsets

New for the 2.0 Framework, the SqlContext class is used to expose properties and methods that are passed directly to the local SQL Server instance hosting the CLR code you're executingthe local "context". This context is used whenever working with stored procedures or any other CLR code that needs to access the host server. For example, when implementing a CLR trigger (which is simply a form of stored procedure), the SqlContext exposes the pseudo "inserted" and "deleted" tablesjust like the pseudo tables exposed to T-SQL triggers.

Let's take a short tour of the SqlContext class and highlight the most interesting points of interest:

The IsAvailable property is useful to determine if the currently executing code is indeed running in processin context with a SQL Server instance. Unless IsAvailable is True, you won't be able to use any of the other SqlContext objects, properties, or methods.
The SqlPipe object is basically a pathway back to the local SQL Server context. You write resultsets to the pipe, as illustrated later in this section.
The SqlPipe supports several methods that can send strings, which appear in the same way as PRINT messages. You can also use other methods on the SqlPipe object to send single-row or multi-row resultsets, or execute a query and return the rowset. You can also create a SqlDataReader and return the data stream directly to SQL Server via the SqlPipe object. I'll show examples of each of these techniques later in this section.
The SqlPipe also permits you to as many multi-row resultsets as needed between calls to the SendResultsStart and SendResultsEnd methods. The data returned is formatted using the SqlDataRecord class to lay out the schema.
The SqlTriggerContext class describes the context in which triggers are invoked, including an XML EventData structure that describes the operation that fired the trigger. I illustrate how to code a CLR Trigger later in this chapter.

I use the SqlContext class to send the example's persisted rowset back to SQL Server as a resultset. Note that you don't need to have an open Connection for the SqlContext properties and methods to work. An open Connection (pointing to the local context) is required only when using ADO.NET to execute a SqlCommand in the local context. Let's continue stepping through phase 2 of the example.


1.	Start by signaling the client calling this stored procedure that Phase 1 of the stored procedure succeeded. This isn't really necessary, but it's a simple way to illustrate use of informational messages to report progress or other notable events. The code to send this message is shown in Figure 13.41. Figure 13.41. Returning a PRINT message to the client using the SqlContext object.
2.	Next, you need to return a multi-row resultset to the client^[14] containing the results from the earlier string-search algorithm, so you'll need to describe the columns (fields) of the row to be returned. This is easysimply build an array of SqlMetaData objects and populate the array using the New constructor that sets the appropriate properties. Then, create a new SqlDataRecord object using this array of SqlMetaData objects. All of this is illustrated in Figure 13.42. ^[14] The "client," in this case, is the routine calling the CLR stored procedure. Figure 13.42. Defining the resultset schema using the SqlMetaData and SqlDataRecord classes.

3.	Since you're sending a whole set of rows (and not just one), you need to set up the SqlContext Pipe, so it knows that it should bundle all of the rows together in a single resultset. This is accomplished by calling the SqlContext.Pipe.SendResultsStart method. Once the Pipe is "primed" for multiple rows, you must flush it by calling the SqlContext.Pipe.SendResultsEnd method, but before doing that, you need to send each data-populated row down the pipe using the SqlContext.Pipe.SendResultsRow method. All of this is illustrated in Figure 13.43. Figure 13.43. Passing individual rows into the SqlContext.Pipe. When declaring variables using SqlTypes, be sure to initialize them as the default to NULLnot 0, as some might expect. Remember, X + NULL = NULL.

4.	Remember that I had a couple of OUTPUT parameters defined for this stored procedure as well as a RETURN value? These parameter values are set by simply assigning values to the variables defined in the stored procedure declaration header (as shown in Figure 13.44). Figure 13.44. Setting the final OUTPUT and RETURN value parameters.

Basically, that's it for the CLR stored procedure code. Testing the procedure is our next challenge.

Testing the CLR Stored Procedure

You'll need a test script to try out the new CLR stored procedure. Visual Studio has created one of these for you (sort of)at least, it's given you a reasonable starting point to build a test script. Figure 13.45 is my test.sql script. This script should be installed in the projectassuming you want to debug the CLR stored procedure with Visual Studio. Nope, don't plan to evaluate performance using this techniqueas discussed earlier, Visual Studio adds a ton of overhead to the process of executing and stepping through the CLR code.

Figure 13.45. The test.sql test script for V1 of my CLR stored procedure.

Since I plan to debug using Visual Studio, this test script should be installed in the project using the Project Explorer. Yes, I discussed this earlier when building my first CLR function, so refer back there for more details on setting up a debug script if you skipped that section.

When you start the debug process (as described earlier), Visual Studio starts the test script. If you didn't set a breakpoint, the entire test script is executed and the resultsets are displayed in the Visual Studio "Output" (debug) window. The results are not formatted very neatly, but you can tell if the stored procedure threw an exception or returned the right set of data.

As you step through the code and reach the ExecuteReader code (line 38), Visual Studio displays a new window that contains the T-SQL about to be executed, along with the value of any parameter supplied. This feature is new for the RTM versionI can't recall seeing it in the Beta or CTP versions...but I could be wrong. Anyway, this is a handy way to determine just exactly what's getting executed. You can also inspect the contents of the input parameter using the new Visual Studio Intellisense feature (as shown in Figure 13.46).

Figure 13.46. The T-SQL being executed by the ExecuteReader method.

You'll also discover that as you debug your CLR executable, Visual Studio walks you right into the T-SQL functions invoked by the T-SQL code being executed. In this case, you're taken into the source code for the CountWordsInString function; again, you can use Intellisense to see the contents of the T-SQL variablesjust select the variable. No, this version does not support edit and continue. That's a bit of a tougher problem to crack when the code is running on a remote SQL Server. I don't expect that I'll see that feature before we're out of Iraq.

Figure 13.47. Debugging with Visual Studio walks you into the underlying functions.

Another interesting point that should be mentioned is how SQL Server manages the execution thread of this code. When debugging this routine (well before it was working), I found that stepping into the routines that worked with the resultset returned by the SqlDataReader was interrupted by jumps back into the CountWordsInString T-SQL function. That's because as SQL Server processed the query, it was running separate threads to manage both the "server side" of the query (the function invocation and row-fetch operations) and the "client side" (the CLR stored procedure code). This can be a bit confusing at first and might require that you trap exceptions and data value snapshots as you go.

Exploring the Remaining CountWords Examples

The second (V2) "count words" example is very similar to the first versionit simply uses a DataTable to persist the intermediate data instead of using a collection. I took this route because I wanted to illustrate how to use the DataTable object's DataView to sort the rowset before passing it back to the client. As I said before, I also wanted to illustrate the use of filling the SqlPipe with the results from a DataReader. Unfortunately, this approach failed, as the DataTable class's CreateDataReader does not extrude a data reader consumable by the SqlPipe method. This means that you can pass only a SqlDataReader to the SqlPipe object.

The third (V3) example is worth exploring a bit, as I took an entirely different approach to handling the job of locating words in the target column strings (varchar(max)). Instead of calling a T-SQL routine to find the words (as was done in the baseline T-SQL example), V3 takes on that task itself. The query used to return data to the CLR stored procedure is shown in Figure 13.48.

Figure 13.48. Returning the varchar(max) data to process.

The rowset returned from this query would overwhelm the client memory space if I created a DataTable to hold it, but I'm only going to work with a single varchar(max) data column at a timeas quickly as I can. No, this data is not being moved over the Net, so this approach should be dramatically faster than a client/server program that does the same job. You should also save time and CPU cycles by implementing the "search for words in a string" code locally instead of having SQL Server call a T-SQL routine. Sure, one alternative approach would have been to replace the T-SQL CountWordsInString function with an equivalent CLR function. Just to get an idea of what's going on behind the scenes, let's compare the two approaches. First, the T-SQL CountWordsInString function (shown in Figure 13.49).

Figure 13.49. The T-SQL CountWordsInString function.

The equivalent CLR CountWordsInString function (called as I out-process the rowset) is shown in Figure 13.50. Note that this routine would have been harder in earlier versions of .NET, as the IndexOf method was introduced in the 2.0 Framework. It's a pretty powerful method, as it permits you to choose a case-sensitive or non-case-sensitive search (along with other options).

Figure 13.50. The CLR CountWordsInString function.

Evaluating CLR vs. T-SQL Performance

As, with all of these versions, which do you expect to run faster than the T-SQL code? Given the pitiful performance of the temperature-conversion example, perhaps none of them. To test the routines, I set up a number of individual test scripts to run that include dumps of the operational statistics. You'll see what I mean in a minute. Let's take a look at one of the test scriptsthe one used to test and evaluate the T-SQL version. Remember, you can't use Visual Studio to test these routinesit adds too much overhead. The base test script is shown in Figure 13.51. Each example can be called from this templateI simply change the name of the stored procedure to execute.

Figure 13.51. Test script to exercise the CountWords examples.

To fairly measure the four examples, I ran this same script using SQL Server Management Studio with the "Show Client Statistics" option enabled, which is exposed as a button on the toolbar shared by the Execute button (as shown in Figure 13.52).

Figure 13.52. Enabling client statistics diagram in SQL Server Management Studio.

Once enabled, SQL Server Management Studio captures and persists operational statistics about each query executed in this window. By changing the name of the stored procedure executed for each test, you can to get a rough idea of the overall performance among the various implementations. I ran each version once, which might favor the first or last routine executed because of caching, so I re-ran the T-SQL example again to check the validity of the resultsthey were identical. Sure, I had run the test a dozen times beforehand, so the data was probably all sitting in the data cache, so disk I/O would not be a factor in this evaluation.

The results from all four tests (T-SQL and the three CLR implementations) are shown in Figure 13.53.

Figure 13.53. The client statistics chart for the CountWords examples.

If you look closely, you can see that the bottom-line results are pretty startling. Microsoft was actually rightthe CLR code is faster, a lot faster. The final CLR example shown as Trial 4 in the chart took only 78ms to execute, while the T-SQL version took 640msover eight times faster. Note that the other CLR implementations did not do any better (but not really any worse) until I incorporated my own CountWordsInString function in the CLR stored procedure.

Table 13.1. Functions Versus Stored Procedures

Building the Example CLR Stored Procedures

Figure 13.33. The unmolested CLR stored procedure template.

Figure 13.34. The "function" version of the CLR stored procedure implementation.

Figure 13.35. The C# prototype uses somewhat different notation.

Figure 13.36. Parsing the delimited list into an array of search words.

Figure 13.37. Opening a Connection to the current SQL Server instance.

Figure 13.38. Creating a simple Command object.

Figure 13.39. Execute the query for each word in the collection.

Figure 13.40. Persisting the returned rowset in a Collection.