One of the more appealing features of CLR executables is the ability to define custom user-defined datatypes, or UDTs, for short. Like the sirens of Homer's Odyssey, CLR UDTs can be a very attractive way to gain functionality, but at the same time, it can be hard to prevent your development team from sailing up on the rocks. No, I don't recommend putting wax in their ears. While it worked for Odysseus, his crew was too deaf to avoid the Charybdis and Scylla later in the voyage. The challenges of choosing to implement CLR UDTs include:
Okay, given these challenges, why would one want to bother creating a CLR User-Defined type? Well, suppose you have a data structure that lends itself to being contained in a single column. Examples that come immediately to mind include:
So, yes, there are specific cases where CLR UDTs seem to make sense. If you consider that the UDT implementation can embody comprehensive validation logic to ensure that each component meets specific range criteria, you can see how a CLR UDT can help keep data validregardless of the source. Coding CLR UDTsAs with the other CLR executables I've created so far, you start building a CLR UDT using the same Visual Studio "Database" project type as previously discussed, but in this case, you should choose "User-Defined type" from the "Add New Items" menu. Name your UDT as neededin this case, I call it "typGPS". Our UDT is going to store the latitude and longitude for GPS data, as discussed earlier. IMHO I don't really expect this CLR UDT to improve overall performance. Once you choose the UDT template, Visual Studio generates a block of code that exposes prototypes for all of the required properties and functions, as well as a couple of optional methods that you might choose to implement. This generated template (in compressed form) is shown in Figure 13.54. As you can see, a UDT is defined as a Visual Basic .NET Structure or C# struct that implements INullable. Figure 13.54. The default CLR UDT template.Along with the critical functions and properties, the CLR UDT prototypes a pair of methods that you can flesh out if your UDT needs them. For example, if you need to call an initializer, you might want to implement it in one of these methods. Defining the CLR Public PropertiesWhen it comes time to use your UDT in code, you'll get direct access to the public properties you define. One of the most complex issues I had to figure out was how to make sure these properties are exposed as the desired datatype. As the CLR code is exposed to both SQL Server and .NET Framework applications, transmogrifying between these types can be problematic. I'll point out the extra code and special attributes I had to add to get this UDT working as it should.
One important point: When you instantiate any SqlClient datatype (SqlTypes), its default value is not 0it's NULL. This can be confusing, as without presetting a value, you'll discover that numbers added to a SqlDecimal don't seem to accumulate. That's because NULL + anything = NULL. You have some flexibility when defining a CLR UDT, or it would appear so. It turns out that if you choose to store fixed-length component values (like integers or floating-point numbers) that can be stored in data structures that don't expand and contract as values change, the task of creating a CLR UDT is a lot easier. However, if you need to store variable-length values such as strings, SqlDecimal, SqlMoney (and others), or XML in a UDT, you'll have to do the serialization yourself and expose appropriate functions (IBinarySerialize.Write and IBinarySerialize.Read) in the CLR UDT code to do the work. Don't get excited. Serialization can be done quite easily. The GPS UDT implemented next uses fixed-length values and "native" serialization. I'll use "custom" serialization for the typICurrencyV2 UDT built later in this chapter. Constructing the Custom Property CodeAt this point, you'll need to define the properties used to hold the component values. There are several important points here that you should not overlook:
Setting the UDT SqlUserDefinedTypeAttributeThe UDT attributes tell the compiler how the UDT is to be formatted and accessed. These attributes are coded within < > symbols ahead of the Function declaration, as shown in Figure 13.57. Basically, there are three types of "formatting" (set by the Format attribute) when defining UDTs that determine how SQL Server stores and retrieves data in the database column: Native (the default), UserDefined, and Unknown, as detailed shortly. The SqlUserDefinedTypeAttribute attribute is also used to define the StructLayout, which determines how the UDT is serialized (saved and restored from binary). The SqlUserDefinedTypeAttribute properties include:
Figure 13.57. CLR UDT attributes and Structure declaration.I'll implement the GPS UDT to illustrate how to code most of these attributes. The CLR UDT also includes the StructLayout attribute that defines how SQL Server should lay out the UDT structure in memory. This structure permits you to set the LayoutKind, CharSet, Pack, and Size. I won't be illustrating these finer points here. Suffice it to say that there are three options:
As you can see, you have a great deal of control over how the CLR UDT is configured, serialized, and laid out in memory. If you implement a UDT, I suggest you experiment with these options to find the optimal storage formatit can materially impact storage space, indexing performance (or ability), and fetch/store/search performance. The Public Structure DeclarationThe code shown in Figure 13.57 simply names the UDT structure[15] and defines the serialization scheme. It's the name used by T-SQL to refer to the UDT when extracting data or defining a SQL Server column using the UDT. Directly after this structure declaration, you'll likely want to declare your local variables that hold the UDT component elements.
The ToString FunctionThis code is responsible for implementing requests to return the binary UDT data as a string. You decide how the data is to be exposed. For example, if your UDT stores a phone number (country code, area code, prefix, number, extension), you could return a string such as "+011 (425) 555-1212 X123" when the UDT ToString method is called in code. For the GPS UDT example, you store eight parts: latitude and longitude degrees, minutes, seconds, and declination. These are returned as a string in the ToString function shown in Figure 13.58. Figure 13.58. Constructing a string to display the contents of the GPS UDT.The UDT ToString function is called when a T-SQL query executes a CONVERT function to retrieve the column, as shown in Figure 13.64, later in this section. The IsNull PropertyThis property is exceptionally simple; it just returns the state of the private null property. The Null PropertyThe Null property returns a new instance of the CLR UDT and sets the private null property to True, as shown in Figure 13.59. Figure 13.59. Managing NULL state with UDT properties.The Parse FunctionThe Parse function is by far the most complex part of the UDT. It's used to parse a formatted string submitted by a T-SQL query and transmogrify the values to the UDT public properties. Typically, this means your code has to convert from a string and save to another datatype. However, in this example, I simply parse a sexagesimal-formatted latitude and longitude string (such as "47° 40' 10" N 122° 3' 26" W") extracting the individual properties. Note that the string shown here in the book uses "smart quotes". This means that the single quote and double quote are different symbols than the ASCII single and double quote symbols. I incorporated this functionality into the UDT parse routine to make it easier to enter a GPS location and not have to worry about doubling up the single quotes. Yes, this is a variation in the "O'Malley" rule that I discussed when working with ADO.NET Command Parameter objects. The Parse function must walk through the input string (whose format you define) and set the value of each and every component of the UDT. In the GPS example, the Parse routine is particularly long, as it has to pick eight elements out of the string. It's shown in Figures 13.60 and 13.61. Figure 13.60. Finding the argument markers in the inbound formatted GPS string.Figure 13.61. Set the UDT component properties from the string values supplied.This code creates a number of integer pointers that are set based on the presence of specific markers for each of the eight component pieces of the latitude and longitude string. For example, you look for the degree symbol after the latitude and longitude degree value. If any of these markers are not found, the routine throws an exception and the operation is cancelled. That's an important part of the Parse function. It must reject any input string that does not comply with the specifications that you define. Sure, you can define as many different input string variations as you choose to support. When I implement the typICurrencyV2 UDT, I'll permit several variations of input string. This permits me to pass strings formatted for older versions to be accepted by the latest version. When the public properties defined for the GPS UDT are defined as SqlByte, I suggest using SqlTypes throughout to make movement of data into and out of the UDT easier. In several cases, I use the SqlByte.Parse function to convert the string representation of a number to the SqlByte value. If one of the Parse functions fails, the CLR throws an exception and returns control to the SQL Server operation attempting to execute the T-SQL. Yes, I could also add checks to ensure that the values supplied fall within the range of a proper latitude and longitude location. That is, the degrees latitude must be between 0 and 90, and degrees longitude must be between 0 and 180. While not implemented here for simplicity, I'll show how to add a validation routine to a UDT later in this section. Debugging the CLR UDTBecause of several serious shortcomings in Visual Studio 2005, you'll find that debugging a CLR UDT under construction is kinda tiresomeand dangerous[16]. I'll try to make it easy for you, but be prepared for an occasional lockup and Visual Studio crashing from time to time. Yes, save your work often. One of the problems you'll encounter is that the IDE always tries to redeploy your UDTeven though it might already be deployed from an earlier test (or an existing version). While SQL Server supports update-in-place for UDTs, Visual Studio does not use or give you the option of using this feature when starting a debug test run. This means you'll need to set up a separate T-SQL script and the means to execute it separate from the Visual Studio IDE. Nope, you can't just set up an "initialize" script and save it as a separate test.sql scriptthat's because the IDE tries to deploy on all scripts.
To build, deploy, and debug your CLR UDT, follow these steps:
Using SELECT to Query UDT DataEarly in the development of this functionality (during the numerous beta releases), developers had to jump through a lot of hoops to get T-SQL to return data from their UDTs. Along the way, I saw versions that required C++ addressing technique and a number of very strange ways to simply return the data stored in a UDT. The shipping version (thank goodness) makes it easy to get your data back using a SELECT statement. Figure 13.64 illustrates several ways to use SELECT to return data from a UDT column. The first technique first refers to the "GPS" column and next uses the T-SQL CONVERT function to return the data. The T-SQL CONVERT function calls the ToString function in your UDT so that the data is formatted as your code dictates. If you don't use CONVERT, the data returned is dumped out in hex, as shown in the "GPS" column in the first resultset. Figure 13.64. Using SELECT to return data from the GPS column.This hex notation needs a bit more scrutiny, so I created a detailed view of that data (as shown in Figure 13.65). Note that the SqlByte fields are actually stored as 16-bit values with the leading byte set to "01". If the column element is set to NULL, the leading byte is set to "00". Once you understand how the (all fixed-length datatype) UDT is stored, you can more easily debug and tune your code to optimize its use and database space consumption. As it is, I have been able to store eight (nullable) elements in a data structure that's only 17 bytes long. That's not too bad. Figure 13.65. Data stored in SQL Server for the GPS CLR UDT.
No, you don't have to use the CONVERT function to return the entire fully formatted GPS location value; you can instead choose to address the public UDT properties or simply supply the raw binary (perhaps to impress one of your coworkers). It could not be much simpler. Each UDT component is addressable by name (the name you chose when you defined the UDT properties) using dot notation, as shown in Figure 13.66. Yes, you might want to CONVERT the raw hex data (as done in the example), but that's up to you and the people (or programs) that have to consume the data. Note that public properties exposed by your UDT can have a NULL value (because I used SqlTypes). Figure 13.66. Using SELECT to address specific components of the UDT.Tuning Your UDT for PerformanceEvery line of code contained in your CLR UDT plays a role in how well or quickly it performs. First, consider that SQL Server's "native" datatypes also consume CPU resourcesI take that for granted. However, some native types consume more resources than others, and I also take that into account as I design my databases. For example, I chose tinyint over integer or varchar to save disk space and reduce I/O traffic and memory consumption. CLR UDTs are no different. As you add components to your type, keep this in mind. Use the smallest, lightest datatypes possible to get the job done. There are a number of things you can do to improve UDT performance, but these can mean compromised functionality. The version of the CLR UDT described earlier (let's call it "version 1") permits the inbound GPS location to be formatted in a human-readable format that's easy to visually scan for missing or out-of-place elements. It also supports NULL components, as I used the SqlTypes (SqlByte) to implement the type. If you choose a simpler (albeit harder to read) input string, you can use the Visual Basic .NET and C# Split function to quickly convert the delimited string to an array of strings that can be written to the UDT components. I implemented this change, and the code is shown in Figure 13.67. It's simpler and smaller (and faster). Figure 13.67. Using Split in the Parse function improves performance.Note that I also changed the UDT to store the CLR public property values in byte objects. The net result of changing to native Framework byte objects means that instead of 34 bytes per column entry, the same data is stored in 18 bytesa hefty savings. However, I did lose the ability to set the public properties to NULL. Yes, you can still set the entire UDT to NULL (assuming you set up the table column to permit NULL values). Since a GPS location does not really make sense if a component is missing, perhaps not supporting NULL components is a good compromise. Referencing and Updating UDT ValuesYou can code SELECT statements against the CLR UDT data in a number of ways. However, note that your UDT ToString is called only when using the CONVERT function. If you SELECT a UDT column, as in Figure 13.68 (line 2), without conversion, SQL Server does not call your UDT codeit simply returns the binary representation of the data stored in the database table column. This means that if you expect to reference the UDT public properties or the UDT column value as a whole, you'll need to include code in your project to decode itso any out-of-process application that consumes the UDT binary value must include a reference to your UDT in the project References section. Sure, T-SQL won't have any problem referencing and writing to your database's UDT. However, a Windows Forms, ASP, or Web Service application won't be able to reference it or even see it (in Intellisense) without adding a reference to the UDT structure. Figure 13.68. Returning data from a CLR UDT using SELECT.
Handling NULL ValuesWhen working with a structure like a UDT that exposes private properties cast as SqlTypes, remember that any one (or all) of these individual properties can be set to NULL. You have to decide if it makes sense that one of the properties has an "unknown" state. Does a GPS coordinate make sense if either the latitude or longitude hours are NULL? Not really. You can get a rough idea of a location given just the hours values without the minutes and seconds, but you still need the declination of both latitude and longitude for the coordinate to make sense. While I didn't go to the extreme of configuring the NULL handling in the GPS UDT in this manner, it's a good exercise for you to try. If the GPS column value is NULL, depending on your implementation, you won't be able to fetch the data for that row. In this case, you'll have to use the CONVERT routine (Figure 13.68, line 4) that calls your CLR UDT ToString function that interprets NULLs specifically, or, if the type supports it, SQL Server notes the column as NULL. You'll also get an exception "Mutator 'strSymbol' on '<UDT> variable' cannot be called on a null value." if you try to set a public property when the UDT is set to NULL (its default state). Just make sure to initialize the UDT value before trying to set public property values. Another approach is to set the OnNullCall method attribute. If this is set to False, the UDT returns NULL without evaluating the method when at least one of its input arguments is NULL. If OnNullCall is True (which is the default), the resulting value is determined by evaluating the method, regardless of the argument inputs. The public properties of the UDT can also be retrieved using dot notation and CONVERT, as shown in the SELECT in Figure 13.68, line 14. When you fetch specific public properties in code, your UDT property Get code is called. Likewise, when you set a value in code, your property Set code is called. Another approach is to invoke the UDT ToString function directly, as shown in line 24. Later in this chapter, I discuss how to reference a UDT from a "normal" (not hosted in SQL Server) application. Setting UDT Public PropertiesWhile it's possible to use the T-SQL CONVERT function against all of the public properties of a CLR UDT, it's also possible (and tempting) to use the SET statement to set specific public property values. The problem with this approach is that it's expensive, considering that each trip to the UDT code burns CPU resourceseach SET statement makes a round-trip through the SQL Server to CLR UDT interfaces and back. However, there may be places where you feel this works better as when you have an "instantiated" UDT and want to change a single component property. The other wrinkle here is that there is no "DECLARE X AS NEW Y" in T-SQL. This means you'll face the same object instantiation issues you face when declaring a variable in a CLR language like Visual Basic .NET or C#. You simply won't be able to reference the object without first creating an instance of the object. Because of this limitation, you'll still have to use the CONVERT function to set up the initial CLR UDT "instance" in a T-SQL applicationafter that, you can reuse it and set individual properties using the T-SQL SET statement, as shown in Figure 13.69. Of course, this is not an issue in a CLR language, as you can instantiate an instance of the UDT in code before referencing it. Figure 13.69. Using the SET operator to set individual CLR UDT component properties.
The UDT component property names are case-sensitive. This means you'll not only have to know the spelling of the property, but you'll also need to observe the same case as the defined CLR code. In this case, the code invokes the CLR UDT code four timesonce when I use the CONVERT and three more times when I use the SET statement on the value, symbol, and currency component properties. Tip You'll also find that you can't execute an UPDATE statement that sets more than one component property of the UDT. Stepping Through the UDT CodeWhen you reference your UDT from T-SQL or some other application, the UDT code might be called many (many) timesor not at all. If SQL Server can figure out what to write to the UDT column without calling your code, it will. For example (as illustrated in Figure 13.70), the UDT code is not called until line 20. At that point (when the code sets the value of the @Price variable), the following sequence takes place:
Figure 13.70. Setting UDT values.At this point, the UDT is stateless. It does not keep any values between invocations. When line 18 is called, the following sequence takes place:
Remember that a UDT does not maintain stateit can't. The data it knows how to parse and manage is kept in memory or stored in a column value in the database. A UDT might be called by a multitude of different processes, procedures, aggregates, and other server-side executables. Its entire function is to simply decode the values sent to it either in string form (when initially set) or in binary form when referenced from a query that returns a column value. Yes, it's also possible to set the value of a UDT directly via binaryif you feel brave. This bypasses the ToString initialization step, and it works something like Figure 13.71. Figure 13.71. Setting a UDT column value with a hex literal.Validating Your UDT DataThe process of storing your UDT data into the database is (of course) managed by SQL Server. It takes certain precautions to ensure that the data is valid and meets the serialization criteria, but if there are business rules or other criteria that you wish to impose on the data, it's up to you to implement those tests. The Framework exposes a mechanism for this, and it's hooked up when you first define your UDT. Unfortunately, it's not called very oftenapparently, it's called only when SQL Server has to do a "conversion," as when you convert from bytes or when a UDT comes in from an external source. The validation routine is not called when you call the CONVERT function or execute INSERT or UPDATE. This means you'll have to add it to your own Parse routine to do a last-second validation test. The ValidateGPSData function tests each component of the current UDT instance and returns False if it does not comply with the rules dictated by accepted latitude and longitude conventions. This routine and every other routine in your UDT code works on an instance of the UDT passed to it from SQL Server as it processes T-SQL statements. This means that several instances of this code could be running at onceeach processing its own instance of an in-progress UDT. Visual Studio and the language compilers make sure that the code conforms to this instance-centric approach. To "activate" the validation routine, you'll need to set the ValidationMethodName in the UDT attributes, as shown in Figure 13.73. Figure 13.72. Validating your UDT data structure using a custom function.
Figure 13.73. Activating UDT validation.
I put a call to the ValidateGPSData function just before returning the populated UDT instance in the Parse function, as shown in Figure 13.74. Your test script should include tests for each leg of your validation routine, to ensure that they all work. Figure 13.74. Testing the UDT structure just before returning it to SQL Server.Implementing UDT MethodsYou might feel the need to implement methods on your new UDTespecially since the Visual Studio UDT template creates a couple examples for you. The question is, do you want to implement additional code that might better be implemented elsewhere? Remember, each change to the UDT requires a complete tear-down of the column (including any data it contains), redeployment of the tested UDT binaries, and repopulation of the table. I implemented a simple method (as shown in Figure 13.75) to illustrate how to code and invoke UDT methods. Our method simply returns a string that describes the string format used to describe a GPS location. Figure 13.75. Implementing a simple CLR UDT method.Calling the method is also simple (sort of). As shown in Figure 13.76, without the TOP clause, it returns a row for each row in the GPSLocationV2 tablewhich is understandable since I didn't include a WHERE clause in the query. Figure 13.76. Calling the UDT QueryTemplate method.
Setting Custom Method AttributesHidden in the documentation, you might stumble across references to additional method attributes that can be required in some cases and simply helpful in others. These attributes are exposed by the SqlMethodAttribute property, and I've already discussed a few of them. For example, when you create a Table-Value Function (TVF), you must set the FillRowMethodName to identify the method used to return rows. The other property settings are shown in Table 13.2.
In the next section, I'll introduce SqlFacet, which is a method or property attribute that's used to configure the datatype being used to return data from a function, method, or attribute. I use SqlFacet to ensure that the SqlBinary Precision and Scale properties returned by the UDT value are coerced to the correct settings. |