Coding Conventions | The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML

for RuBoard

I'll cover common coding and design patterns in more detail in Chapter 3, but discussing a few basic coding conventions is certainly worthwhile here. You'll see these conventions applied throughout this book, so it makes sense to talk about a few of them in advance.

As I've said, I'm uncomfortable telling you exactly what conventions you should use. I think you should find a system that works for you. Nevertheless, I've structured the following recommendations as a series as dos and don'ts because you may want to adopt some of them as guidelines for your own work. You may not agree with or even use all of these. My advice would be to read through them and adopt the ones that you feel make the most sense.

Script Recommendations

The following recommendations apply to T-SQL scripts in general. Whether a script creates an object or constitutes a T-SQL command batch that you run on occasion, there are a number of conventions you can follow to make your life easier. Naturally, most of these are good practices regardless of whether you're authoring a script, a stored procedure, or some other type of SQL Server object.

Dropping Objects

I usually check for the existence of an object before I attempt to drop it. Not doing so needlessly generates error messages, even when the DROP command is segmented in its own T-SQL batch. An error message should be something that grabs your attention, not something you regularly ignore. I try to avoid generating unnecessary error messages to keep from becoming desensitized to them.

Comments

I decide what to "comment" in my code by balancing the need to clarify vague or ambiguous coding elements against the need to keep the code free of noise and needless clutter. Overcommenting is just as bad as undercommenting. I strive to write code that is self-documenting . Overcommenting a script can create substantially more work for you without really improving the readability of the code. The screen quickly fills with gobbledygook that anyone attempting to read the code later will have to wade through, not to mention the additional overhead of having to maintain these loquacious remarks every time you modify the code.

People working on overcommented code may be a bit confused by the sheer volume of comments because they won't know what's significant and what isn't. They won't know which comments to pay close attention to and which ones they can safely ignore. Comments that merely restate what is already abundantly clear from the code itself merely lengthen the script. And chances are, if code needs to be commented profusely in order to be readable, the code probably needs to be rewritten.

That said, when I'm forced to do something from a coding standpoint that isn't obvious and is something I feel those who work on the code down the road should know about, I comment it. I think every stored procedure of any significance should have a code block at the top of it that describes the procedure and what it does. As with any source code, tracking things like who changed the code last, when they changed it, and so forth, can also be quite handy. See Chapter 4 for more info .

And lest I forget to weigh in on the single biggest issue facing Transact-SQL developers in the 21st century: I don't have a problem with so-called "old-style" comments. I think slash-star (/*) comments are appropriate, even preferable, when a comment spans many lines. Whether this is the header at the top of a procedure or function (where you might occasionally reformat the text and, hence, find single-line comments a real pain), or deep within a procedure or script to prevent a block of code from executing, slash-star comments are certainly handier than double-hyphen comments in certain circumstances. As with all things programming: Use the right tool for the job. There's nothing wrong with slash-star comments when used properly.

Extended Properties

In the same way that logical names help make a database self-documenting, extended properties allow you to make your objects more descriptive. Extended properties are name /value pairs that you can associate with a database objecta user , a column, a table, and so forth. These are handy for adding descriptive text to the objects you create. You add extended properties using the sp_addextendedproperty stored procedure, or using the Enterprise Manager or Query Analyzer Object Browser tools. The system function fn_listextendedproperty() lists the extended properties for an object. The sp_dropextendedproperty procedure drops an extended property. Here's some code that demonstrates extended properties.

 USE Northwind GO CREATE TABLE CustomerList (c1 int identity, name varchar(30)) GO EXEC sp_addextendedproperty 'Label', 'Customer Number (NN-XX-NNNN)', 'user', dbo, 'table', CustomerList, 'column', c1 GO SELECT value FROM ::fn_listextendedproperty (NULL, 'user', 'dbo', 'table', 'CustomerList', 'column', default) GO DROP TABLE CustomerList

(Results)

 value ------------------------------------------------------------------- Customer Number (NN-XX-NNNN)

Script Files

There are certainly exceptions, but generally I store the source code for each database object I create in a separate script file. This provides maximum flexibility in terms of re-creating or modifying the object. It ensures that I don't accidentally drop or re-create objects I don't intend to, and that I don't apply procedure-specific settings such as QUOTED_IDENTIFIER to the wrong procedures. This also allows me to more easily manage the source code to the objects I create using a version control systema big plus with lots of objects.

Script Segments

If a script has multiple distinct segments that do not share local variables , I usually terminate each of them with GO in order to modularize the work the script is to perform. In this way, if there's an error in one of the segments, I can prevent it from causing the ones that follow it to fail. Conversely, if I want a script segment to execute only if its preceding segment doesn't generate an error, I leave out the GO and code the two batches as a single batch. If a serious enough error occurs early in the batch, the commands later in the batch will never be reached.

USE

If a script must be run from a given database, I include the appropriate USE command as early in the script as possible. As I said earlier in the book, this helps ensure that any objects the script creates end up where they're supposed to, and alleviates having to remember to change the database context manually each time the script is executed.

When I include a single USE in a script, I almost always put it at the top of the script. This makes it easy to find and sets the stage for the code that follows . Another developer looking at the script should have no problem determining where the objects it references reside.

Stored Procedures and Functions

The following recommendations relate specifically to stored procedures and functions. They're not unlike the recommendations you might encounter for any language: central variable declaration, modular routines, error checking, and so forth. For some reason, Transact-SQL developers often neglect these very basic practices. Following them can save you hours of bug chasing and can help you write code that is both maintainable and extendible.

Variable Declaration

If possible, I declare the variables a stored procedure or function will use centrallyin one locationpreferably at the start of the procedure or function. Although it's syntactically permissible to declare variables almost anywhere , having to search a routine to find a variable's declaration wastes time and makes code more difficult to follow.

Stored Procedure Return Values

If a stored procedure can return a value other than zero (many do), it's often worth the trouble to check this value and react appropriately. Accordingly , I try to remember to return values from the stored procedures I write. Generally, a value of 0 indicates success; nonzero values indicate that an error has occurred.

Parameters

I check the values passed into a stored procedure or UDF early, and return an error (or display a help message) when bad values are supplied. This makes the routine easier to use and prevents invalid values and/or operations from affecting my data.

Default Parameter Values

I think it's a good practice to provide default values for stored procedure and UDF parameters. It makes them easier to use, more flexible, and less error prone.

Errors

We'll cover this in more detail later in the book, but the mark of robust code is comprehensive error checking. I try to check for errors after key operations and respond accordingly. I usually check @@ERROR immediately after statements that may cause an error condition, and I check @@ROWCOUNT when the fact that a statement fails to affect any rows constitutes an error.

Modularity

A series of smaller, logically simple routines is easier to deal with and easier to understand than a single, extremely long behemoth of a stored procedure. When I can, I break complex routines into smaller ones. This increased granularity can help performance by allowing part of a routine to get kicked out of the cache without disposing of all of it, and it makes my code generally easier to work on.

Tables and Views

These recommendations relate to tables and views; specifically, how you use them in the code you write. Whether it's a script, a stored procedure, or a function, how you handle tablesthe data containers of your databaseis sometimes as important as what you do with them.

Temporary Tables

I try not to overuse temporary tables. There are two reasons for this: One, they can cause throughput problems because of resource contention in tempdb. Two, SQL Server is more aggressive with keeping statistics updated on temporary tables than it is with permanent tables. This also can cause performance problems as well as unexpected stored procedure recompilation. One strategy for avoiding temporary tables is to use table variables. You can do most of the things with them that you can do with temporary tables, and they cause fewer resource contention problems in tempdb. Like all variables, they're automatically dropped when they go out of scope.

Resource Cleanup

When I do use temporary tables, I try to remember to drop them when they're no longer needed. Leaving them lying around until a stored procedure returns or you log out of the server wastes system resources and may prevent future code from running in certain circumstances. The same is true for cursors : It's a good idea to close and deallocate them when you're finished with them. Your mother was right: Cleanliness is next to godliness. Clean up after yourself.

System Tables

Unless there's just no other way, I try to avoid querying system tables directly. Beginning with SQL Server 7.0, Transact-SQL sports a rich set of property functions (e.g., DATABASEPROPERTY(), COLUMNPROPERTY(), OBJECTPROPERTY(), etc.) that greatly reduce the need to access system tables for meta-data and system-level information. Querying system tables directly is bad for two reasons: One, system tables can change between releases. Code you write today may not run tomorrow if you depend on a particular system table layout. Two, referencing system tables directly to get system-level info is usually less readable than the equivalent property or meta-data functions. For example, assume we've created these column statistics on the Northwind Customers table:

 CREATE STATISTICS ContactTitle ON Customers(ContactTitle)

At some point in the future, we notice ContactTitle in an sp_helpindex listing and want to know whether it's a true index or merely a placeholder for statistics. Here are two queries that can tell us (Listings 2-16 and 2-17). One queries system tables directly; the other doesn't:

Listing 2-16 A meta-data query that references system tables directly.

 SELECT CASE WHEN i.status & 64 = 64 THEN 1 ELSE 0 END FROM sysindexes i JOIN sysobjects o ON (i.id=o.id) WHERE o.name='Customers' AND i.name='ContactTitle'

Listing 2-17 A meta-data query that's not only safer but also shorter.

 SELECT INDEXPROPERTY(OBJECT_ID('Customers'),'ContactTitle','IsStatistics')

Both queries return 1 if ContactTitle is a statistics index. Which one's more readable? Not only is the INDEXPROPERTY() query more readable, it's also considerably shorter. And it has the added advantage of being immune to changes in the system tables.

In addition to the property functions, SQL Server includes a number of views and system stored procedures to help you access meta-data info. For example, you can query INFORMATION_SCHEMA.VIEWS to retrieve a list of the views defined in a database (Listing 2-18):

Listing 2-18 You can use the INFORMATION_SCHEMA views to retrieve system-level info.

 SELECT TABLE_NAME AS VIEW_NAME, CHECK_OPTION, IS_UPDATABLE FROM INFORMATION_SCHEMA.VIEWS ORDER BY VIEW_NAME

(Results abridged)

 VIEW_NAME                              CHECK_OPTION  IS_UPDATABLE -------------------------------------  ------------  ------------- Alphabetical list of products          NONE          NO Category Sales for 1997                NONE          NO Current Product List                   NONE          NO Customer and Suppliers by City         NONE          NO Invoices                               NONE          NO Order Details Extended                 NONE          NO Order Subtotals                        NONE          NO Orders Qry                             NONE          NO Product Sales for 1997                 NONE          NO Products Above Average Price           NONE          NO Products by Category                   NONE          NO Quarterly Orders                       NONE          NO Sales by Category                      NONE          NO Sales Totals by Amount                 NONE          NO Summary of Sales by Quarter            NONE          NO Summary of Sales by Year               NONE          NO

And as I've said, a number of system stored procedures exist that return catalog and meta-data info. Using them is preferable and more convenient than querying the system tables directly. For example, sp_tables lists the tables in a database, and sp_stored_procedures lists stored procedure info. There are numerous others. See the topic "Catalog Stored Procedures" in the Books Online, and the script instcat.sql (located in the SQL Server install folder) for more info.

Transact-SQL

These last recommendations apply to Transact-SQL in general. Whether you're coding a stored procedure or just executing a T-SQL command batch from an application, following these practices can help you write better code.

Ad hoc T-SQL

When I can, I avoid executing ad hoc T-SQL via the EXEC() function. There are two reasons for this. One, the execution plans generated by ad hoc queries aren't as likely to be reused as those of stored procedures. Two, ad hoc T-SQL is notoriously difficult to debug. When I use it, I find myself executing lots of PRINT commands just to see what my T-SQL command variable contains at different points in the procedure. Because it all but defeats Query Analyzer's ability to locate coding errors automatically, you're pretty much on your own in terms of finding errors when you build and execute T-SQL dynamically.

So, if possible, I place my code in regular stored procedures, functions, etc., and call those objects. This is nearly always preferable to any of the dynamic T-SQL approaches. If I must execute T-SQL that's generated at runtime, I try to use the sp_executesql extended procedure. It's often considerably faster than EXEC(), and execution plans created to service sp_executesql are inserted into the procedure cache and can be reused.

COMPUTE and PRINT

COMPUTE is bad news because it actually causes multiple result sets to be created. You have to iterate through all of them to get all the rows returned by a COMPUTE query. The ROLLUP and CUBE operators are preferable because they do all that COMPUTE does and more, and they do it without requiring additional result sets.

PRINT is less than ideal because, as of this writing, ADO doesn't correctly return informational messages unless a message with a severity greater than 10 also happens to have been generated. In other words, when executing a query using ADO, you'll never see PRINT messages unless a real error message has also been generated for the same query.

for RuBoard