Where Do CLR Executables Make Sense? | Hitchhikers Guide to Visual Studio and SQL Server: Best Practice Architectures and Examples, 7th Edition (Microsoft Windows Server System Series)

Consider that the code SQL Server executes for each connection is only one of many jobs/tasks/threads the server is expected to execute. SQL Server is and always has been a "database" server designed to locate and store data managed by remote clientseven if they are on the same system. Generally, the more code SQL Server has to execute, the fewer operations of all kinds it can execute in a given length of time. SQL Server has also been a host for its own brand of executablesfrom the earliest versions, SQL Server has supported T-SQL stored procedures.

Sure, there are some tasks that make sense to execute on the serverespecially when the operations require direct (and proximitous) access to the data. For example, if you execute a query that returns all airports close to a specified hotel based on GPS coordinates, you'll probably end up performing quite a few geometric equations that are a bit difficult for T-SQL code. You might try to move all of the pertinent data to the client and perform the calculations in an applicationand some applications take this approach. However, the expense of moving too many rows over the wire to a distant client can be prohibitively expensive. My tests show that GPS distance calculations are an ideal candidate for a non-T-SQL code functionyes, a CLR function. However, when working with sets of data (such as those returned by a SELECT statement), calling out to a CLR function from within SQL Server can actually hurt performance. T-SQL and SQL Server work closely together to leverage SQL Server's extensive cache systems to make sure relevant and frequently accessed data is left within easy (and proximitous) reach in memory.

Other factors play a role as well:

Does the CLR code leverage the ability to perform recursive or complex operations that cannot be done to sufficient accuracy (or at all) in T-SQL?
Is the code being executed already compiled, or must it be interpreted or compiled before first use? This applies to both the T-SQL and CLR routines. Is this a one-time performance expense or a price that must be paid on each execution?
Do features built into the CLR language or data access paradigm address problems that can't be easily or directly solved in T-SQL?
Are you using a CLR routine because you're not that familiar with T-SQL, or vice versa?
Does it take more time to call the CLR routine than it would to simply execute a potentially slower T-SQL routine? In other words, is SQL Server spending more time transitioning to and from the CLR function than actually executing the code?
Can the CLR code be written and implemented without having to resort to "fully trusted" security mode where the CLR object can do anything? Can the CLR code function in an "untrusted" or "partially trusted" environment?
Does your development team know how to write, test, debug, and ultimately support CLR code well enough to implement the procedures and other objects needed? Are you prepared to support this code indefinitely, or must you bring in consultants to repair or tune it?
Have your DBA and management team bought in on use of CLR and the potential security issues it raises? Are they prepared to support the use of CLR code indefinitely? Note that it's tough to go back once CLR executables are enabledit's an "all or nothing" switch on the server instance. It applies to all databases on the server instance.
When implementing CLR User-Defined types (UDTs), are you aware of the inability of the tools to automatically change, alter, or drop types and preserve existing data? Do you and the DBA understand the maintenance impact of CLR UDTs?

What Kind of Executables Can or Should Be Built?

To get an idea of where CLR code is possible, let's briefly outline the basic types and provide some insight into how they work, how they're different, and how they can be expected to behave on the server.

CLR Stored Procedures

A CLR stored procedure can accept (virtually) any number of input parameters and return (virtually) any number of OUTPUT parameters, an integer (SqlInt32) RETURN value, and zero or (virtually) any number of resultsets (including 0 to N rowsets). In theory, CLR stored procedures can be used anywhere a T-SQL stored procedure can be used, including "normal" database or master-based stored procedures, DML triggers, or DDL triggers. As with T-SQL triggers, the special virtual INSERTED and DELETED tables are exposed.

CLR Functions

Like a stored procedure, a CLR function can accept any number of input parameters, but it cannot return OUTPUT parameters or resultsetsonly a RETURN value variable that can be (almost) any type, including table-type variables or XML. In theory, CLR functions can be used anywhere a T-SQL function can be used, so they can be called from stored procedures, used in T-SQL expressions, or called from or other functions.

CLR User-Defined types

A CLR User-Defined type (UDT) is a way to build an object that's stored in place of a SQL Server system type in a SQL Server column. Unlike a T-SQL UDT that's simply an alias to another base type, a CLR UDT is a composed of many individual data elements. For example, you could create a "Currency" type that stored not only the monetary value, but also the unit (U.S. dollars, British pounds, Japanese yen, euro dollar), the monetary symbol ($, £, ¥, €), the exchange rate, the base currency and any other pertinent information. Consider that as the complexity of the UDT increases, its usefulness and performance decrease exponentially.

IMHO

Consider that as the complexity of the UDT increases, its usefulness and performance decrease exponentially.

Keep in mind that a column definition configured with a CLR UDT can't be easily removed or changed, as with system data typed columns. All references and dependencies to the UDT must be purged from all tables and procedures that reference it before it can be altered.

CLR Aggregates

A CLR aggregate is designed to perform the same type of operations that T-SQL aggregates perform. Examples of T-SQL aggregates are SUM, MIN, MAX, AVG, COUNT, and so forth. These functions accept a set of values and return a single value (ignoring NULL values). CLR aggregates make the most sense when you have to build a custom aggregating operation that's not supported by T-SQL, such as mean, frequency, or other statistical aggregations. One interesting use of a CLR aggregate would be to implement the SUM aggregate for the aforementioned Currency UDT.

CLR Triggers

A CLR trigger is designed to perform the same type of operations that a T-SQL trigger performs. Triggers are your last-ditch firewall of protection on your data tables. They can also manage any number of business functions or implement sophisticated business rules. SQL Server 2005 adds DML triggers to the mix, which means you can create triggers that monitor changes in the tables, users, or other objects (even other triggers). A trigger generally implements a business rule that tests the new (INSERTED) or updated rows about to be committed to a specified table. If the rule finds there's something wrong, it can choose to roll back the transaction. A CLR trigger is no different, except that it can execute far more sophisticated tests, as it's not hamstrung by the limitations of the T-SQL language. Later in this chapter, I illustrate a CLR trigger that executes one of the Framework's financial functions, just to illustrate this point.

Previewing the Examples

Once you've read this chapter, you'll get a pretty good idea of the CLR executable solutions that make sensethose that outperform the T-SQL they're replacing. The examples implemented in this chapter include:

A set of CLR functions to convert temperature values (actually, an example of what not to do)
A CLR function to return a Table-type variable generated from a delimited string
A CLR function to count the number of times a string appears in another string
A CLR function to compute the distance between two points of latitude and longitude
A CLR UDT to manage international currency data
A CLR stored procedure to count specific words found in a TEXT (varchar(max)) column
A CLR aggregate to sum values in the international currency UDT columns
A CLR trigger that accepts or rejects additions to a table based on the results of a financial analysis of new or changed row data

Along the way, I've done performance tests on these carefully chosen solutions to see which make sense. All but the first (temperature conversion) performed as well as their T-SQL counterparts (when the problem could be solved with T-SQL). I also came to some other conclusions:

When T-SQL has to perform string functions, CLR can usually outperform T-SQL. This includes searching, concatenation, and other bulk text operations. This is evidenced in several examples where I manipulated strings.
When T-SQL has to perform trigonometry, financial, or higher math functions, CLR does better or at least as well. The calculations I performed were complex for a sixth-grade math student (at least, in the U.S.) but not that hard for T-SQL. I also suspect that there are many situations where far more complex math is required, as when doing financial or statistical calculations or aggregations. In these cases, CLR code would certainly perform better, as T-SQL has no support for these operations.
In situations where the client must draw data from the server, munge, and return it, a server-side CLR executable can easily outperform the client executable. All too often, we've all seen developers pull heavy rowsets from the server, update a row based on a complex algorithm, and return the result to the server via an UPDATE. Part of the problem here is that many developers are not familiar with T-SQL and don't appreciate that the task can be more easily performed on the server. By implementing a CLR executable in a familiar language, these developers can solve a problem they could not even approach with T-SQLnot because of T-SQL limitations, but because of their lack of T-SQL skill.
Not all CLR and Framework functionality is available on all types of CLR executables. For example, an aggregate cannot use the local "context" connection to capture data. As we progress through the types of executables, we'll witness many other limitations.
When invoking functionality available only via .NET Framework classes, you don't have a T-SQL optionyou must forgo the functionality or code it in a CLR executable. For example, if you want to encrypt or compress data, use the financial functions, or access system internals, the Framework has built-in functionality that supports these operations that can't be accessed directly from T-SQL (at least, not without an ESP).

This should give you a rough idea of what works and why. After you've worked through these examples, you'll find how well (or if) CLR executables can solve your problems.

What Makes CLR Executables Fast and Safe?

If you've chosen the right job for a SQL Server executable, it can outperform a T-SQL executable or perform a task that T-SQL simply can't handle. SQL Server 2005 plays an important role in the performance of a CLR executable. One factor that helps performance is that the CLR executable is managed code that runs within the database service process. This is possible because the .NET Framework (thus, the CLR) is hosted within SQL Server's process space. This makes transitioning between SQL Server (T-SQL) and the CLR executable more efficient and permits SQL Server to manage cross-operation conflicts and integrate the memory demands of the CLR executable with the demands of other operations it's managing. To make this work, a CLR executable requests memory from SQL Servernot Windows. As with other SQL Server internals, like the query processor, SQL Server manages memory requests from a common cached pool of memory that it orchestrates based on frequency of use and other logic-related factors. Windows would have no idea how important a particular block of memory is in regard to your query or CLR executable. Because garbage collection is handled by SQL Server, this CPU-intensive operation can be coordinated with other SQL Server operations to optimize performance and idle time. As I'll illustrate later in this chapter, a CLR executable in need of data rows can access a local "context" connection that completely bypasses the network protocol and TDS translation layers. All of these features taken together also prevent a CLR executable from running away with the CPU or compromising the stability of SQL Server.^[4]

^[4] For more details, see Kimberly Tripp's article "The Database Administrator's Guide to the SQL Server Database Engine .NET Common Language Runtime Environment" (www.only4gurus.net/miscellaneous/dba_guide.pdf).

Tip

Remember that CLR executable support is "off" by default in all versions of SQL Server. It must be enabled before it can be used.