Implementing CLR Aggregate Executables

The next challenge you might choose to scale is building CLR aggregates. These are a requirement if you expect to get meaningful aggregated information from a column declared with a UDTespecially with a UDT such as the international currency typICurrency I've been discussing. Sure, you can still use the built-in T-SQL aggregators (like SUM, MIN, MAX, etc.), but when used against a column whose value is not really stored in the same terms as other values in the same column, the results won't be of much use. For example (as shown in Figure 13.92), you can execute a T-SQL SUM against the Products table in my UDTCurrency database. However, in this case, the sum aggregates U.S. and Australian dollars, yen, British pounds, and euro values without regard to their currency. Most banks outside of Texas don't use this kind of math.

Figure 13.92. Using a T-SQL aggregate against a CLR UDT column.

To solve this problem and any problem that requires a custom aggregate function (whether or not it's used against a UDT, you'll have to write your own custom CLR aggregate function. Again, this can seem like a very big challenge at first, as there seem to be a lot of details to worry about. Relax; I've blazed a pretty clear trail (which mostly involved figuring out how to actually implement the CLR aggregate, given the state of the Microsoft documentation and examples).

Unlike a UDT, coding, building, deploying, and debugging a CLR aggregate is really pretty easythat's because Visual Studio can correctly tear down and use T-SQL ALTER to replace an existing CLR aggregate. This means you'll be able to set up a test.sql script in Visual Studio and simply click F5 when you need to build, deploy, and walk though the codeeach time you need to test it.

What Is an Aggregate?

An aggregate is a block of code designed to "aggregate", accumulate, or total values presented to it. It can also be used to perform set^[22] functions on the values supplied. For example, an aggregator can find the maximum value of a set of numbers or the minimum length of a set of strings.

^[22] Set as in set arithmetic or set logic.

An accumulator can also format values and return a concatenated result in any format you choose. A CLR aggregate can perform any operation you choose on a given set of values. This means your "SUM" can include or exclude any values you choose. Yes, this smacks of the accounting done at Enron, but I doubt they used a CLR to do the work. Perhaps, they would not have been caught if they had.

When writing your CLR accumulator, you define the input and output data typewhich can be different. In the case of the accumulator, I also define (and coerce) the Precision and Scale to ensure that the data is correctly formatted when sent back to SQL Server. SQL Server (and T-SQL) does the work of picking off the values to aggregate and calls the Accumulate function to pass values to be aggregated, the Merge function to combine previously constructed "sub-totals", and the Terminate function to return the final result. It's the Terminate function that determines the datatype of the value ultimately returned by the aggregate. I'll focus quite a bit on this function.

Walking Through the CLR Aggregate Example

The example aggregate illustrated here is designed to total the mnyValue property values from a column defined with the CLR UDT typICurrencyV2. Before adding any non-U.S. currency to the total, the aggregator code converts the value to U.S. dollars. No, I'm not suggesting that the approach used here is a best practice. For one thing, it depends on fixed conversion rates, which would be suitable for ratios or conversion factors that don't change at all, but for international currency, exchange rates vary quite a bit over time. Perhaps a better example might be a UDT that aggregates temperature values in Fahrenheit, Celsius, and Kelvin and returns the values in a common scale.

I also tried to read the conversion rates from a rates table stored in the database. The problem with this approach is that a CLR aggregate cannot access the local context connection. Yes, it would be possible to open a "normal" shared memory provider ADO.NET connection, but that would significantly impact performance. However, it might be a viable option for some situations where you really can't make do with a CLR Function.

The MSDN documentation (available online or via Visual Studio help) is pretty good for aggregates. It is missing a few (too many, I think) key points that are emphasized here to keep you on track. However, I do recommend that you use the help topics for more in-depth information on attributes.

A CLR aggregate has a unique configuration of functions and attributes that makes it quite different in a few respects from a CLR function or stored procedure. These CLR aggregate entry points (as described below) are called by SQL Server to initialize, add selected values to your in-memory accumulator, and retrieve the final sum.

Let's take a high-level look at the code before venturing into the aggregator's back alleys and underground passages. Figure 13.93 shows the completed CLR SUMUSD aggregate program collapsed to the routine level.

Figure 13.93. The Visual Studio CLR Accumulator application.

Note that there are several routines each CLR aggregate must support and a few others that you might need to add:

The Function attributes: These attributes, shown in Figure 13.93, define how the aggregate is seen by Visual Studio and built on SQL Serverat least, to an extent^[23]. Here is where you indicate whether or not you want SQL Server to serialize the data values passed to and from the aggregate, the maximum size, and several other attributes settings that are well documented in the MSDN documentation. Because I'm working with the SqlDecimal datatype, I must provide my own serialization. As you'll see, this is not at all hard, as I'm serializing only one value. The function prototype does not set any of these attributes and defaults to Native serializationthat might be just fine for a simple aggregate.
^[23] The Return type is not set until you get to the Terminate function (as I'll show you later).
The Init Sub: This subroutine (prototyped by the Visual Studio CLR aggregate template) is coded to initialize the accumulator state. It's called once when the aggregate is first invoked. For this example, the internal state includes the local accumulatorthe value used to keep a running total of values declared as SqlBinary. Note that I set the Precision and Scale of the private variable sDecAccumulator (line 27). No, this might not be necessary in all cases, but I wanted some control over the way the decimal values were being kept in memory. I chose to use SqlDecimal to give me more control over the specific Precision and Scale value returned from the accumulator.
Next, I populated the list of conversion rates. I already discussed how it might be necessary to populate this table dynamically. The completed Init routine is shown in Figure 13.94. The AddRate function is home-grownit simply populates a collection of currency types and rates.

Figure 13.94. The CLR aggregate Init function.
The Accumulate Subroutine: This routine, shown in Figure 13.95 (prototyped by the Visual Studio CLR aggregate template), is used to accept a typed value and "add" it to the set of values being accumulated. This might mean concatenating the value to a string, adding it to a collection of values to be summarized later, or anything else you can imagine. This routine is called each time SQL Server finds another value to include in the accumulator value set.

Figure 13.95. The Accumulate function used to gather values.

For this example, the CLR aggregate is designed to accept a (non-NULL) value typed with the CLR UDT typICurrencyV2. The routine tests the currency "type" (USD, AUD, GBP, etc.), and if it's not U.S. dollars ("USD"), it converts the value to U.S. dollars and adds it to the local accumulator (sDecAccumulator) used to keep a running totalU.S. dollar values are simply added to the total without conversion. Note that I make sure the converted value remains cast to the correct Precision and Scale (15,6) by using the ConvertToPrecScale SqlType method.
The hand-made ConvertToDollars support function, shown in Figure 13.96, uses the conversion rate tables to figure the exchange rate value. Note that I also illustrate a couple of techniques to force the correct Precision and Scale. On line 92, I add a SqlFacet attribute to the Function. This gives the developer a way to specifically define how the return value is to be formatted. In this case, I set the SqlDecimal Precision and Scale. On line 97, I force the result to conform to this same (limited) precision and scale. Without these restrictions, the default behavior explodes the scale with more "accuracy" than makes sense (even to a mathematician).

Figure 13.96. The custom ConvertToDollars function.
The Merge subroutine: When SQL Server has to build totals and sub-totals from accumulated values, it calls the CLR accumulator's Merge routine to add these intermediate values to the set of values being accumulated. The routine (prototyped by the Visual Studio CLR aggregate template) is very simple in this caseI simply add the value to the local accumulator.

Figure 13.97. The Merge routine is used to combine intermediate results.
The Terminate function: The Terminate function (prototyped by the Visual Studio CLR aggregate template) is designed to summarize the set of values passed one by one to the accumulator and return that value back to SQL Server. This might mean computing the minimum, maximum, average, or other statistical summary value from the set of values. In this example, I simply pass back the current value of the accumulator used to hold the sum of the values presented.
An important point to note here: By default, if you build a CLR aggregate, Visual Studio deploys the CLR aggregate using default numeric or string values. For example, if you simply declare that the UDT is to return a SqlDecimal without forcing the Precision and Scale, Visual Studio generates a RETURNS[numeric](18,0) clause in the CREATE AGGREGATE statement. This means that if you intend to pass back a floating point number or any number with a fractional component, all you'll get is the whole numberthe fraction is stripped. Strings are also formatted as NVARCHAR(4000), so you might also want to hard-cast the length of these return values.
To make sure the entire value is sent to SQL Server, I force the Precision and Scale of the return value passed back from this routine using the obscure (but very handy) SqlFacet attribute, as shown in Figure 13.98. This "As" syntax that references the SqlFacet attribute is not documented in MSDN helpit was shown to me when I had trouble finding out how to force the compiler to set these datatype attributes.

Figure 13.98. The Terminate function illustrating use of SqlFacet.

If you want to verify what type and precision is being used to handle your CLR aggregate, script the deployed aggregate using SQL Server Management Studio, as shown in Figure 13.99.

Figure 13.99. Scripting the deployed aggregate in SQL Server Management Studio.
The Serializer and Deserializer: Because I'm not using one of the simpler types, SQL Server requires that I add my own serialization code. Since I don't have to worry about formatting or reformatting the values, my Read and Write routines are brutally simple, as shown in Figure 13.100.

Figure 13.100. The Read and Write routines serialize and deserialize the CLR aggregate value.

Testing the CLR Aggregate^[24]

^[24] The complete example is available on the DVD. See prjAggregateSumUSD.

The bulk of the work required to set up a test scenario has already been doneI did it when I built the test scripts for the CLR UDT typICurrencyV2. When I'm ready to test, I set up a simple test script. Yes, I was able to use the test.sql script sample generated by Visual Studio as a starting point. Our test script (shown in Figure 13.101) expanded on that quite a bit. I suggest using small amounts of test data that include values at each end of the possible values. Once you're happy with the results of the limited test (and the computations do reflect reality), you should move on to larger sets of data.

Figure 13.101. The test.sql script used to exercise the CLR aggregate code.

Because Visual Studio and SQL Server can work together to alter the CLR aggregate in place, it's easy to debug the routine just as you would debug a CLR function or subroutine.

Figure 13.92. Using a T-SQL aggregate against a CLR UDT column.

What Is an Aggregate?

Walking Through the CLR Aggregate Example

Figure 13.93. The Visual Studio CLR Accumulator application.

Figure 13.94. The CLR aggregate Init function.

Figure 13.95. The Accumulate function used to gather values.

Figure 13.96. The custom ConvertToDollars function.

Figure 13.97. The Merge routine is used to combine intermediate results.

Figure 13.98. The Terminate function illustrating use of SqlFacet.

Figure 13.99. Scripting the deployed aggregate in SQL Server Management Studio.

Figure 13.100. The Read and Write routines serialize and deserialize the CLR aggregate value.

Testing the CLR Aggregate[24]

Figure 13.101. The test.sql script used to exercise the CLR aggregate code.

Testing the CLR Aggregate^[24]