Idioms | The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML

for RuBoard

As I've said, I believe every programming language is characterized by its idiomsthe techniques experienced developers use to solve common problemsand Transact-SQL is no exception. In this section, I'll discuss a few common Transact -SQL idioms. The list is by no means exhaustive, but there should be enough here to get you started thinking about the idioms that make up the language. Once you begin thinking this way, you'll begin to see idioms everywhere you look. A programming language's idioms are like the basic tools one might find in a toolbox. Every toolbox should have a certain basic set of them, although some toolboxes have more variety than others. Experienced developers tend to have lots of tools, just like the experienced craftsman or the veteran machinist.

Querying Meta-data

Although there are numerous ways to return meta-data about an object, experienced developers usually take this approach:

Use meta-data functions (e.g., OBJECTPROPERTY()) if possible.
If no function exists that returns the info you need, query the INFORMATION_SCHEMA views (e.g., INFORMATION_SCHEMA.PARAMETERS).
If no INFORMATION_SCHEMA view meets your needs, check the system catalog procedures (e.g., sp_tables).
If all else fails, query the system tables directly (e.g., sysobjects, sys columns , and so on).

Obviously, there are numerous ways to retrieve meta-data. For example, the two queries in Listing 3-1 are functionally identical:

Listing 3-1 Two ways of retrieving a database name using its identifier.

 SELECT dbid FROM master..sysdatabases where name='pubs' SELECT DB_ID('pubs')

The idiomatic , or conventional, technique among experienced Transact-SQL coders is the second one. It's not only shorter than the first technique, it's also impervious to changes in the sysdatabases table. This holds for most meta-data queries: As a rule, avoid querying the system tables directly. Instead, follow the four-step progression listed earlier.

Creating an Object

Experienced T-SQL coders check for the existence of an object before attempting to create it. There are three reasons for this. One, if the object already exists, but shouldn't, you may want to provide special handling (for example, you may want to exit the script or stop the procedure immediately). Two, if an object already exists but is not unexpected, you may want to drop it. Three, attempting to create an object that already exists will result in an error message being raised, but may go undetected by your script. For example, your T-SQL script may attempt to create a table that already exists within its own batch. When this fails, the batch will terminate, but processing will continue with the first statement in the next batch. If the remainder of the script expects columns or data to be present in the table that aren't, the result could be disastrous.

As with querying other types of meta-data, there are numerous ways to check for the existence of an object. Let's look a few of them. Listing 3-2 presents Method 1:

Listing 3-2 A method of checking for the existence of an object.

 IF EXISTS(SELECT * FROM sysobjects WHERE name = 'authors')   DROP TABLE dbo.authors GO CREATE TABLE dbo.authors ...

You see techniques like this mostly in code migrated from releases of SQL Server prior to 7.0. The code works, but has some fundamental flaws. The first flaw is that the code queries the sysobjects system table directly. If the layout of that table were to change, the code would break.

The second flaw is that the code doesn't check the owner of the object. So if a user other than dbo has created an object named authors, the DROP TABLE statement will be reached, regardless of whether dbo.authors exists. Let's look at another method (Listing 3-3):

Method 2:

Listing 3-3 Another method of checking for the existence of an object.

 IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'authors' AND TABLE_SCHEMA='dbo')   DROP TABLE dbo.authors GO CREATE TABLE dbo.authors ...

This method is much improved over the previous one because it does not query the system tables directly. Instead, it takes an ANSI-compliant approach: It uses the INFORMATION_SCHEMA.TABLES view to determine whether user dbo has a table named authors. Although this works, it's a little verbose and requires the use of a subquery and a view to check for the existence of an object. There's a much more common technique, as shown in Method 3 (Listing 3-4):

Listing 3-4 An idiomatic technique for checking object existence.

 IF OBJECT_ID('dbo.authors') IS NOT NULL   DROP TABLE dbo.authors GO CREATE TABLE dbo.authors ...

This technique is probably the most prevalent and is the one I use myself . It's not dependent on the layout of the sysobjects table; it's short and conciseas it should beand is efficient from an execution standpoint.

The only real problem with this technique is that it does not check the object type, an advantage Method 2 has because it implicitly eliminates all object types except tables and views by scanning INFORMATION_SCHEMA.TABLES.

A slight variation on this method solves the problem. Method 4 (Listing 3-5) uses the OBJECTPROPERTY() meta-data function to check the type of the object:

Listing 3-5 A refinement that also checks object type.

 IF (OBJECT_ID('dbo.authors') IS NOT NULL) AND (OBJECTPROPERTY(OBJECT_ID('dbo.authors'),'IsTable')=1)   DROP TABLE dbo.authors GO CREATE TABLE dbo.authors ...

Personally, I use Method 3 more than Method 4 because it's shorter and object names must be unique across types anyway. The bottom line is: Experienced T-SQL developers use either Method 3 or Method 4 almost exclusively. The idiomatic way of checking for the existence of an object is to use one of these two techniques.

Setting the Database Context

For scripts that depend on a specific database context, the conventional technique is to include USE as early as possible in the script to set the database context. Although you could change the current database using the combo box in Query Analyzer or via the -d OSQL command-line option, the idiomatic approachthe usual convention among capable developersis to include a USE statement.

That said, how do we check to see whether the USE statement worked? Not checking could have disastrous results. You may end up dropping objects in the master database, for example. The normal convention is to do something like that shown in Listing 3-6.

Listing 3-6 Changing the database context and checking it afterward.

 USE pubs2 GO IF DB_NAME()<>'pubs2' BEGIN   RAISERROR('Wrong database.',16,10)   RETURN END GO

Here, we use the DB_NAME() meta-data function to check the current database context. It may seem that we could just check the @@ERROR automatic variable after the USE to see whether it succeeded, but that's not the case. USE errors terminate the current command batch, so no error checking is possible. This means that code like this (Listing 3-7) doesn't work as we might like:

Listing 3-7 USE errors terminate the current batch, making error checking impossible .

 -- Bad TSQL - doesn't work USE pubs2 IF @@ERROR<>0 BEGIN   RAISERROR('Wrong DB.',16,10)   RETURN END GO

Emptying a Table

There are a couple of methods for completely emptying a table. The one you choose depends on your needs. The most obvious method of deleting all the rows in the table is to execute the DELETE statement with no filter (no WHERE clause), like this:

 DELETE Customers

This works, but every row deletion is recorded in the transaction log, making it impractical for large tables.

A faster and more conventional method of quickly deleting all the rows in a table is to use the TRUNCATE TABLE command, like this:

 TRUNCATE TABLE Customers

Normally, TRUNCATE TABLE will complete almost immediately, even for extremely large tables. TRUNCATE TABLE is faster than DELETE because it's a minimally logged operationonly its extent operations are recorded in the transaction log. Keep in mind that this means that using it impacts database recovery, so it's not without cost. Also keep in mind that you can't use TRUNCATE TABLE with a table with foreign key references.

So, the idiomatic or conventional method of quickly emptying a table is to use TRUNCATE TABLE unless its impact on database recovery is unacceptable or foreign keys reference the table.

Copying a Table

Copying the structure (and, optionally , the data) of an existing table to a new table is something we do fairly often in database applications. We start with a template of some type, then replicate it to a work table into which we can then insert new rows, add columns and indexes, define new constraints, and so forth.

There are several ways of copying a table, but only one of them is really conventional. The first method involves issuing a CREATE TABLE statement that happens to contain the same DDL as the original source table. There's no easy way to ensure that it actually does match the original table, and even if it does for now, changing the structure of the underlying table in the future will break the de facto link between the two.

Still another way is to use the sp_OA Automation stored procedures to instantiate a SQL-DMO Table object corresponding to the source table and call the object's Script method. This will generate a CREATE TABLE script for the table that can then be executed from T-SQL (either via SQL-DMO or by using xp_cmdshell to call OSQL.EXE). Although this would work (Chapter 21 includes an example procedure, sp_generate_script, that takes this approach), it's unnecessarily circuitous if all you want to do is copy a table. There's an easier wayand one that's more idiomatic too.

SELECTINTO is a wonderful T-SQL extension for easily creating a copy of a table (without its constraints) or a permanent copy of a query result, for that matter. Anything you can get to via SELECT, you can save to a permanent table. So, creating a copy of a table is as simple as Listing 3-8:

Listing 3-8 SELECTINTO is a quick way to copy a table.

 SELECT * INTO newtable FROM oldtable

To create an empty copy of the table, supply a false WHERE condition (Listing 3-9), like this:

Listing 3-9 SELECTINTO can also create an empty copy of an existing table.

 SELECT * INTO newtable FROM oldtable WHERE 0=1

Zero never equals one, so no data will be copied , but the table will be created, nonetheless. Like TRUNCATE TABLE, SELECTINTO is a minimally logged operation, so using it affects database recovery. That said, it's commonly used, especially to create temporary tables, and you should be able to recognize this idiom on sight. As used here, SELECTINTO is really an implementation of the GoF Prototype pattern, as I'll discuss later in the chapter.

Variable Assignment

Although SELECT can be used to assign variables, the SET command is increasingly becoming the preferred method of doing so. Even though they basically work the same way, SET is a tad briefer and can assign cursor variables as well as scalar variables. SELECT, on the other hand, can assign multiple variables at once and can assign values from a table or view without requiring a subquery, so it's still appropriate in certain situations, particularly for multivariable assignments.

Although the difference between the two may seem minor, and my discussion of it a bit fastidious, the acts of setting a variable versus that of returning a result set are fundamentally different actions, so it makes sense to use two different commands for them. When you see SET used in a block of T-SQL code, you know exactly what it's doingassigning a value to a variable. If you consistently follow a convention of assigning variables exclusively with SET and performing data selection with SELECT, you make your code easier to readboth for you and for those that follow you.

Looping

Although using the WHILE construct is easily the most straightforward way of looping in a T-SQL code block, there are other ways of repeatedly executing code. See Listing 3-10:

Listing 3-10 There are several ways to loop in Transact-SQL.

 DECLARE @var int SET @var=0 mytag: SET @var=@var+1 IF (@var<10) GOTO mytag

This works, but it's not idiomatic. Why not? Because it doesn't mesh with the approach capable developers would take to solve the same problem. It's not natural. It unnecessarily involves a nonstructured concept like GOTO with no real benefit. Listing 3-11 presents one that's a bit better, but that's also not idiomatic:

Listing 3-11 Using the WHILE construct isn't necessarily idiomatic.

 DECLARE @var int SET @var=10 WHILE (@var>0) BEGIN   SET @var=@var-1 END

Why isn't it idiomatic? Because it iterates backward through the loop for no good reason. Remember: A language's idioms are its set of natural approaches to common problems. Iterating backward through a loop without reason is not the most natural means of cycling through the loop; looping forward is. Here's the idiomatic form (Listing 3-12):

Listing 3-12 The idiomatic form of a loop in T-SQL.

 DECLARE @var int SET @var=0 WHILE (@var<10) BEGIN   SET @var=@var+1 END

Not only is the code shorter, it's more obvious. A capable developer can take one look at it and know intuitively that it performs the code between the BEGIN/END pair ten times. This is an example of what I mean when I say that T-SQL has idioms like any other language. Although you can "loop" using a variety of techniques, the forward WHILE technique is the most naturaland therefore the most idiomaticof the lot.

Nullability

The proper handling of NULL values has vexed database programmers for decades. The fact that the recommended methods of dealing with them have changed over the years and that they differ from vendor to vendor hasn't helped anything either. Still, there's a Transact-SQL idiom for dealing with NULL values in databases (Listing 3-13). It looks like this:

Listing 3-13 The nullability idiom.

 SELECT * FROM Customers WHERE Region IS NOT NULL

Notice that the WHERE clause isn't

 WHERE Region<>NULL

 WHERE ISNULL(Region,'')=''

Even though each of these might work (given the proper ANSI_NULLS setting), neither is the most straightforward or natural approach. The < > approach requires that ANSI_NULLS be set to FALSE because the ANSI/ISO SQL standard stipulates that comparisons with NULL always yield NULL. Failing to set ANSI_NULLS properly before the comparison (or, in the case of a stored procedure, before the procedure is compiled) results in no rows being returned. The ISNULL() approach needlessly converts NULL values to empty strings and fails to take into account the possibility that some Region values may indeed be empty strings. So, neither of the alternative approaches works as well or is as natural as the first one.

Retrieving the Topmost Rows

It's common to need to return the first n rows of a result set or table. There are several ways of doing this, but only one idiomatic way. The idiomatic form is shown in Listing 3-14:

Listing 3-14 The topmost idiom.

 SELECT TOP 10 * FROM Customers ORDER BY CompanyName

Because Transact-SQL provides a special extension just for returning the top n rows from a query, the most natural approach to the top n problem is to use that extension. As I've mentioned, there are other ways. Here's one of them (Listing 3-15):

Listing 3-15 A nonidiomatic topmost query.

 SET ROWCOUNT 10 SELECT * FROM Customers ORDER BY CompanyName SET ROWCOUNT 0

This technique needlessly involves SET ROWCOUNT even though the TOP extension is available. It's more code and doesn't perform as well in certain circumstances. Here's another alternative (Listing 3-16):

Listing 3-16 Another topmost approach.

 DECLARE c CURSOR FOR SELECT * FROM Customers ORDER BY CompanyName FOR READ ONLY OPEN c DECLARE @i int SET @i=0 FETCH c WHILE (@@FETCH_STATUS=0) AND (@i<9) BEGIN   SET @i=@i+1   FETCH c END CLOSE c DEALLOCATE c

Obviously, this approach is the worst of all. It's a horrendous mess. It involves a cursor, a variable, and a loopneedlessly and illogically. It returns each row as its own result set (by virtual of the separate calls to FETCH) and requires several times the code of the idiomatic form. It's also slower and more memory intensive . It's unfortunately the kind of approach inexperienced Transact-SQL coders sometimes take.

for RuBoard