Source Formatting | The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML

for RuBoard

I'll begin by dealing with source code formatting. Remember: It's not important that you format code the way that I do. What's most important is that you (and your team) format code consistently and logically. Find a system that works for you; then follow it.

Capitalization

I uppercase Transact-SQL keywords, pretty much across the board. To me, this helps keywords stand out and lets me easily identify reserved words when I'm looking at a code listing, regardless of where that listing is. You might think this unnecessary given Query Analyzer's ability to highlight reserved words; however, I often need to view T-SQL code outside of Query Analyzer (for example, in the SQL Server error log, in a trace file, or in a text editor that doesn't highlight T-SQL-reserved words).

The exception to this would be data types: I lowercase data types. Why? Because I often create user-defined data types, and I don't uppercase user object names . By not uppercasing system-defined data types, I can keep all data types in the same case, which, to me, makes code more readable. The guiding principle I usually follow when deciding how to format code is that if a piece of code is formatted differently than other similar code, there should be a very good reason for it. Given that I already know which data types are system-defined, and can therefore distinguish them from user-defined types, uppercasing them serves no real purpose, and actually seems distracting when I view code formatted this way.

I use mixed case when referring to variables , parameters, column names, and object names such as tables, procedures, and views. To me, this helps set them apart from reserved words and supports the easy use of multiword names without requiring spaces or underbars.

I make an exception to this when I deal with system-defined objects or other objects that I did not create. With these objects, I usually match the original case of the object. So I type sysobjects, not SysObjects, and crdate, not CrDate. I do this so that my code will work on systems in which case sensitivity has been enabled, and because I often code in languages such as C++ and XML that are case sensitive. Almost unconsciously, I find myself matching the original case of the objects I reference in my code.

Naturally, this makes naming these objects sensibly in the first place all the more important. This is why I recommend you name the objects you create consistently and logically. Being consistent alone is not enough. It's possible to be consistently wrong. As Emerson said, "a foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." ^[1]

^[1] Emerson, Ralph Waldo. "Self Reliance." Self Reliance and Other Essays . Mineola, NY:Dover Publications, 1993.

Indentation and White Space

I try to be flexible as well as economical with the way that I indent code. I think a good formatting style should balance the need to make code easily readable while minimizing script length, because longer scripts tend to be more difficult to wade through. In other words, you can get carried away with spreading out code and actually make it more difficult to read because you've made it substantially longer. Someone trying to read it will have to scroll through a much larger number of script lines than he otherwise would have to.

Select

To me, there's no reason to break a short SELECT statement into multiple lines just so that it can be indented consistently with a much larger one. This is especially true of subqueries and derived columns . For example, I do this (Listing 2-1):

Listing 2-1 I use simple formatting for simple queries.

 IF EXISTS(SELECT * FROM Northwind.dbo.Customers)

I find this more readable and more sensible than the code shown in Listing 2-2.

Listing 2-2 A style that I don't care much for.

 IF EXISTS    (    SELECT    *    FROM    Northwind.dbo.Customers    )

As long as vertical space is at a premium (and it always will be as long as code tends to flow vertically rather than horizontally), I do what I can to balance readability with economy of space. To me, there's no need to break a simple SELECT statement like the one in Listing 2-2 into multiple lines: It doesn't make the code more readable (in fact, it makes it less readable, in my opinion), and it belies the fact that what's really happening in the code is a very simple operation. In other words, the code looks more complicated than it is. It distracts from the fact that all we're doing in the IF statement is checking the table for the existence of rows. In my opinion, conveying the relative significance of a block of code is just as important as formatting code consistently. You have to balance standardization with common sense. If a piece of code is relatively trivial, you shouldn't have to scan the page just to locate its logical boundaries (for example, its parentheses or its BEGIN END).

And even though formatting code using a style like the one in Listing 2-2 would certainly make this book longer and seemingly more substantial, I couldn't, with good conscience, do it. Substituting a single character or a single word (for example, " * " in Listing 2-2) for an entire line of text would obviously lengthen this book, but it wouldn't make the code any more readable. I try to conserve vertical space in this book just as you likely will in the code you write.

With more elaborate SELECT statements, I generally place each major clause on a separate line and left-align them. I usually place columns immediately to the right of the SELECT reserved word, and delimit them with commas as necessary. If there are too many columns to fit on the line, I merely continue the list on the next line, often indenting those on the second and following lines so that they line up with those on the first. Here's an example (Listing 2-3):

Listing 2-3 I align the columns in T-SQL statements when I can.

 SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address, City,        Region, PostalCode, Country, Phone, Fax FROM Northwind.dbo.Customers WHERE City IN('London', 'Madrid')

Here's another example featuring a bigger SELECT (Listing 2-4):

Listing 2-4 I usually left-align the major clauses in T-SQL commands.

 SELECT Region, COUNT(*) AS NumberOfCustomers FROM Northwind.dbo.Customers WHERE Region IS NOT NULL GROUP BY Region HAVING COUNT(*) > 1 ORDER BY Region

Clauses and Predicates

As with the columns in a SELECT statement, I usually align the subclauses and predicates that make up the major clauses in a T-SQL command. For example, I usually left-align the various clauses in a multitable join and the predicates in a compound WHERE or HAVING clause. To me, this all seems logical. If we're only going to left-align the major clauses, the minor ones that make them up obviously can't be left-aligned. If they can't be left-aligned, we could either not worry about aligning them at all, or align them with one another. I chose the latter because I think it makes code easier to read. So, I often do something like this (Listing 2-5) when I have a compound clause:

Listing 2-5 I often align the phrases in a compound clause.

 SELECT CustomerID, CompanyName, ContactName, ContactTitle, Address, City,        Region, PostalCode, Country, Phone, Fax FROM Northwind.dbo.Customers WHERE City='London'    OR City='Madrid'    OR City='Paris'

If I place the parts of a compound clause on the same line, I'll often delimit them with parentheses. See the section entitled "Parentheses" for more information.

Expressions

If a CASE expression is relatively simple, I'll often embed it within a single line of code. If it's more complex, I'll break it into multiple lines. Listings 2-6 and 2-7 provide an example of each:

Listing 2-6 Simple CASE expressions usually reside on a single line.

 SELECT CustomerID, CompanyName, ContactName, ContactTitle,        Phone, CASE WHEN Fax IS NULL THEN 'N' ELSE 'Y' END AS [Fax?] FROM Northwind.dbo.Customers WHERE City = 'London'

Listing 2-7 More complex CASE expressions often span several lines.

 SELECT CASE Region        WHEN 'WA' THEN 'Phil'        WHEN 'SP' THEN 'Xavier'        WHEN 'BC' THEN 'Jean-Marc'        ELSE 'Unknown'        END AS Salesman,        CustomerID, CompanyName, ContactName FROM Northwind.dbo.Customers ORDER BY Salesman

I basically take the same approach with functions and other types of expressions. If the expression is simple, I use simple formatting for it. If it's more complex, I allow it to span multiple lines and format it accordingly .

I regularly nest functions horizontally on a line, and I don't shy away from complex expressions if the need arises. For example, I often format code like this (Listing 2-8):

Listing 2-8 Relatively simple expressions are usually formatted as a single line.

 SELECT ContactName+'''s title is '+REPLACE(UPPER(ContactTitle),'SALES','MARKETING') FROM Northwind.dbo.Customers

Certainly, I could break the nested REPLACE expression into multiple lines, but a competent developer should be able to glance at the code and know intuitively that it merely changes the word SALES to MARKETING in the ContactTitle column before concatenating it with ContactName. If this is the case, spreading the expression across multiple lines would serve no purpose other than to lengthen the script.

BEGIN/END

I don't enclose stored procedures within an outer BEGIN/END pair. Doing so is unnecessary and just adds lines to a script. I also place BEGIN on the same line as the flow-control command with which it corresponds, and I align END with that same command. In other words, if BEGIN is being used by the Transact-SQL IF command to delimit a block of code, I place it on the same line as the IF and align END with the IF, not BEGIN. This differs from the way most people format Transact-SQL, but I have some pretty good reasons for this deviation from common practice. As I've said, I like to conserve vertical space in the scripts I write. If a piece of code doesn't deserve its own line of code from a significance standpoint, I often don't give it one. Because BEGIN and END do not actually correspond to executable code (you can't set a breakpoint in the Transact-SQL debugger on BEGIN or END, for example), they have less significance to me than the flow-control command to which they correspond. Visually pairing up END with BEGIN doesn't help me much, but matching it with its corresponding IF or WHILE might. For example, consider the stored procedure presented in Listing 2-9:

Listing 2-9 You can format BEGIN/END pairs in a variety of ways.

 CREATE PROC testd @var int AS BEGIN        IF @var=1        BEGIN          PRINT '1'        END        ELSE        BEGIN          PRINT 'not 1'        END END

There are only three lines of executable code in the entire procedure: the IF and the two PRINT statements. If you ran this procedure under the debugger in Query Analyzer, you could only set breakpoints on these three lines. By spreading the procedure out this way, your brain is forced to check through lots of "noise" lineslines of code that don't really do much. Sure, the BEGIN/END lines indicate flow-control, so they're certainly more significant than, say, a comment, but they only have meaning within the context of other commands. So, for the same reason and in a manner very similar to the way I format curly braces in C++, I demote these noise lines to the dependent commands they are and format them accordingly (Listing 2-10):

Listing 2-10 The previous procedure with fewer "noise" lines.

 CREATE PROC testd @var int AS IF @var=1 BEGIN   PRINT '1' END ELSE BEGIN   PRINT 'not 1' END

Note the placement of ELSE on the same line as the IF command's END. ELSE is another noise command in that it doesn't actually indicate executable codeyou can't place a debugger breakpoint on ELSE. It delimits executable code and indicates the flow of execution through the procedure, but it does not execute itself.

In cases when the code corresponding to an IF or ELSE condition requires a single line only, I usually drop the BEGIN/END pair and put that code on the same line as the IF or ELSE, like this (Listing 2-11):

Listing 2-11 The test procedure with no noise lines.

 CREATE PROC testd @var int AS IF @var=1 PRINT '1' ELSE PRINT 'not 1'

An argument against this convention is that the IF line combines two executable lines into one script linethe IF and the PRINTmaking the code difficult to trace in a debugger. However, the Query Analyzer debugger correctly indicates this by remaining on a line until all executable pieces have completed. In the case of the IF line, this means that stepping from the line with F10 will cause the debugger to remain on the line for an extra cycle so that the execution of the PRINT command can be shown (if 1 is passed into the procedure). In the case of the ELSE line, you simply won't reach the ELSE line unless the second PRINT executes. In other words, you don't need to place the code corresponding to the ELSE on its own line (to determine when it, rather than the ELSE, is being executed) because the ELSE will not otherwise be reached. Unlike the IF line, it does not itself correspond to executable code.

As you can see, reducing the number of noise lines in a script can have a significant impact on the total number of lines in the procedure. In this case, we reduced the procedure from eleven lines to just three.

Parentheses

I have a habit of using parentheses more often than necessary, particularly with logical expressions. For example, I often use parentheses to delimit the predicate phrases in a compound WHERE or HAVING clause, especially when I place them on the same line. I also usually enclose the terms of a JOIN clause's ON in parentheses, and I tend to wrap the logical conditions of the IF and WHILE commands in parentheses, as well. For me, this makes the code a bit more readable, although I'm sure the ultimate source of my tendency to do this is my work in languages such as C and C++, which require parentheses around logical expressions. Listing 2-12 presents an example:

Listing 2-12 I often enclose logical expressions in parentheses.

 CREATE PROC testd @var int AS IF (@var=1)   SELECT C.CompanyName, SUM(O.Freight) AS Freight   FROM Customers C JOIN Orders O ON (C.CustomerID=O.CustomerID)   WHERE (C.City='London') OR (C.City='Portland')   GROUP BY C.CompanyName   ORDER BY C.CompanyName ELSE   SELECT C.CompanyName, SUM(O.Freight) AS Freight   FROM Customers C JOIN Orders O ON (C.CustomerID=O.CustomerID)   WHERE (C.City='Paris') OR (C.City='Barcelona')   GROUP BY C.CompanyName   ORDER BY C.CompanyName GO

The most important thing to remember regarding parentheses is that they help ensure order of precedence. They control the order in which clauses within an expression are evaluated. So, beyond the merely aesthetic , parentheses actually affect how your code executes.

Horizontal Spacing

In the spirit of eliminating noise from T-SQL scripts, I'm often rather parsimonious when it comes to horizontal spacing. I don't place extra spaces around operators (e.g., "+," "=," "<>," and so on) or between parentheses and expressions. There's usually just one space between the SELECT keyword and its column list, and between the other major T-SQL command clauses and their subclauses.

Column and Table Aliases

For column and table aliases, I take an approach that combines ANSI and non-ANSI elements. For column aliases, I take the ANSI approach. Despite my per sonal preference for the Label=ColumnName format, I usually use ColumnName AS Label. Although the former is more compact and seems more straightforward to me, the ANSI method has grown enough in popularity that I normally use it. Table aliases, however, are a different matter. With table aliases, I omit the AS and simply follow the table or view name with its alias. To me, this helps distinguish table aliases from column aliases, and satisfies my predilection for avoiding noise words in the code I write. Note that I don't always use aliases of either type in my code. It depends on the situation. For short queries over a single table, I'll often omit table aliases altogether.

When a query involves at least two tables, views, or table-valued functions, it's a good idea to prefix column references with their respective table aliases, even when a given column appears only once in the objects referenced by a query. The reason for this is twofold: First, it makes the code more readable. No guesswork is required to determine where a column originates. Second, it makes the code more robust. If you later add a table to the query that happens to contain a column with the same name as one of the unqualified columns, you'll get the dreaded "ambiguous column name" error message, and nobody likes those. Do yourself a favor: Disambiguate the column names in your queries before SQL Server forces you to.

To keep from confusing myself, my normal habit is to use one- or two-character abbreviations for table aliases. If I have more than one instance of a table within a query, I'll often append a number to the alias to indicate its nesting level. This encourages me to always qualify column references with table aliases (because the aliases are so short) and helps me follow my own code when nested queries, subqueries, and derived tables enter the picture.

DDL

I follow the same sorts of formatting conventions with DDL statements as I do with other types of T-SQL. You've probably noticed that I put the parameter list and the AS for a stored procedure on the same line as the CREATE PROCEDURE. Again, this is in keeping with my philosophy of avoiding unnecessary noise in my code. AS doesn't represent executable code in a stored procedure, therefore it's relegated to share a line with its benefactor .

As for CREATE TABLE, I left-align the column names and sometimes the data types, but I'm not persnickety about it. As long as the formatting of a CREATE TABLE statement doesn't obscure the table layout, I don't think it's something worth spending a lot of time thinking about. Typically, you'll create far fewer tables than procedures and other types of objects. So, a basic CREATE TABLE statement looks like this in my code (Listing 2-13):

Listing 2-13 I keep DDL pretty simple.

 CREATE TABLE dbo.Customer (CustomerID int identity PRIMARY KEY, CustomerName varchar(40) NOT NULL, Address varchar(60) NULL, City varchar(20) NULL, State char(2) NULL, Zip varchar(10) NULL DEFAULT 'TX', Country varchar(20) NULL, Phone varchar(24) NULL, Fax varchar(24) NULL )

Owner Qualification

Because qualifying an object name with its owner can actually improve performance, I try to remember to owner-qualify all object references. Not only does this avoid ambiguity in object name references, it can actually speed up access to a stored procedure because not owner-qualifying a stored procedure reference results in a momentary compile lock being taken out on the procedure, then released when the procedure is finally located in the cache. The lock duration may be quite short; however, if the stored procedure name is owner-qualified in the first place, the lock won't occur at all unless the procedure actually needs to be recompiled. This means that

 EXEC dbo.sp_who

is preferable to

 EXEC sp_who

even though both will work. Likewise,

 CREATE PROCEDURE dbo.MyProc

is preferable to

 CREATE PROCEDURE MyProc

because it alleviates any ambiguity there may be in the object reference.

Owner-qualifying names is simply a good habit to get into, regardless of whether it materially affects the performance of your system. (Note that scalar UDF references must be owner-qualified. Unlike other kinds of objects, the owner prefix isn't optional.)

Abbreviations and Optional Keywords

I often abbreviate the keyword PROCEDURE in commands like CREATE PROCEDURE and DROP PROCEDURE. It's a minor point, but one worth mentioning: If syntax is optional and doesn't really make code any clearer, I see little reason to include it.

Keywords

This is also true with optional keywords. I often omit them from my code. Examples include the INTO keyword with the INSERT command and the FROM keyword with DELETE. ANSI compatibility concerns aside, I often omit unnecessary syntax from the code I write. The most reliable, most robust code around is the code that isn't there in the first place. It never produces a syntax error, never breaks, is never made obsolete by changes in the tool, and it takes no precious screen real estate.

Abbreviations for Common Words

When you abbreviate common words in the object names you create, try to be consistent. If you abbreviate number as "Num" in one name, abbreviate it similarly for all objects. Don't make it "No" in one table (e.g., CustNo) and "Number" in another (e.g., InvoiceNumber). Be consistent. It's a good idea to establish a standard set of these types of abbreviations before you even begin constructing objects.

Passing Parameters

I usually pass stored procedure parameter values by position rather than name unless the procedure has a large number of parameters and I only want to pass a few of them, or the meaning of the values I'm passing isn't obvious from their positions . I suppose this mostly goes back to laziness : I don't like typing what I don't have to, and omitting parameter names certainly requires less typing than including them.

Choosing Names

When I name an object, I'm careful to avoid naming conflicts between tables, views, UDFs, procedures, triggers, default and rule objects because their names must be unique. You can't, for example, have a stored procedure and a table with the same name. As I've mentioned before, I try to be descriptive without getting carried away.

One factor that's often overlooked when choosing names is what I call "speakability"how easy a name is to speak in common conversation. If a name is rooted too deeply in computerese and technobabble, you may sound foolish when saying it and have difficulty talking about it with other people. Consider the following two names: SWCustomers and CUSTOMERS_IN_THE_SOUTHWEST_REGION.

The second name is a real mouthful. It's simply too long and unwieldy to use in common conversation. People attempting to use it in conversation will instinctively shorten it. Why not start with a handier name in the first place? What real purpose does having such a long name serve? To follow a standard? Why not change the standard to follow common sense?

Tables

I often use plural one-word entity-type names for tables and views (e.g., Customers). For a table that links two other tables in a many-to-many relationship, I'll often name the linking table after the tables it joins, as in CustomerSuppliers.

Indexes

I name indexes after their keys. So, if an index has been built over the Customers table on the CompanyName and ContactName columns, I'll likely name it CompanyNameContactName, or something similar. Because index names do not have to be unique across a database, this strategy works well and lets me see, at a moment's glance, what the index's keys are. Occasionally, I also include a prefix indicating whether the index is clustered or nonclustered.

Triggers

For triggers, I use a name that indicates the actions that fire the trigger, as well as the name of the table with which the trigger is associated (e.g., DeleteCustomer or InsertUpdateOrder). If the trigger has special characteristics (e.g., it's an INSTEAD OF trigger), I usually indicate this via a prefix to the name (e.g., InsteadOfDeleteCustomer).

Variables

I never use more than a single at sign (@) when naming a local variable, and I frequently name variables after the columns to which they correspond (if applicable ). If a variable represents a counter of some type, I'll often name it with a single character, just as you might name a loop control variable "i" or "x" in C++ or Java.

Procedures

I often give procedures a verb-based name, as in PostPurchases or BuildHistory. Sometimes I give procedures and views a special prefix (e.g., sp or V_ ), depending on the number of procedures and views I have and the similarity of their names to other object names.

UDFs

I name UDFs similarly to stored procedures. Sometimes I find myself prefixing UDF names with Get because they return a value of some sort , but not often, because the fact that a UDF returns a value is implicit.

Constraints

With constraints, I often allow the system to name my constraints because I usually use GUI tools to manipulate them. When I name them myself, I usually prefix primary key constraints with PK_, foreign key constraints with FK_, unique key constraints with UK_, and check constraints with CK_.

Occasionally, I get a little crazy and give a constraint an extremely long name that indicates exactly what it does. This provides the nifty side benefit of causing a useful "message" to display in client applications when the constraint is violated. For example, sometimes I do something like this (Listing 2-14):

Listing 2-14 You can use long constraint names to implement rudimentary messages.

 CREATE TABLE Samples (SampleDate datetime NULL DEFAULT getdate(), EmployeeID int NULL, SampleAmount int NULL CONSTRAINT  [Sample Amount must not equal 0]  CHECK (SampleAmount<>0), CONSTRAINT  [Invalid Employee ID]  FOREIGN KEY (EmployeeID) REFERENCES Employees (EmployeeID) ) GO INSERT Samples (SampleAmount) VALUES (0) INSERT Samples (EmployeeID) VALUES (0)

(Results)

 Server: Msg 547, Level 16, State 1, Line 1 INSERT statement conflicted with COLUMN CHECK constraint '  Sample Amount must   not equal 0  '. The conflict occurred in database 'Northwind', table 'Samples', column 'SampleAmount'. The statement has been terminated. Server: Msg 547, Level 16, State 1, Line 1 INSERT statement conflicted with COLUMN FOREIGN KEY constraint '  Invalid Employee ID  '. The conflict occurred in database 'Northwind', table 'Employees', column 'EmployeeID'. The statement has been terminated.

Because a constraint's name is included in the error message generated when it's violated, you get some idea of what the problem is from the constraint name itself, even if you don't process the message further. Because object names can be up to 128 characters long, this is a poor man's method of "attaching" a user-defined message to a constraint violation (a feature Sybase has had for years ). Note that you can embed carriage returns in these ad hoc messages simply by embedding them in the constraint name, like this (Listing 2-15):

Listing 2-15 You can include carriage returns in long object names.

 CREATE TABLE Samples (SampleDate datetime NULL DEFAULT getdate(), EmployeeID int NULL, SampleAmount int NULL CONSTRAINT [  Sample Amount must not equal 0  ] CHECK (SampleAmount<>0), CONSTRAINT [  Invalid Employee ID  ] FOREIGN KEY (EmployeeID) REFERENCES Employees (EmployeeID) ) GO INSERT Samples (SampleAmount) VALUES (0) INSERT Samples (EmployeeID) VALUES (0)

(Results)

 Server: Msg 547, Level 16, State 1, Line 1 INSERT statement conflicted with COLUMN CHECK constraint '  Sample Amount must not equal 0  '. The conflict occurred in database 'Northwind', table 'Samples', column 'SampleAmount'. The statement has been terminated. Server: Msg 547, Level 16, State 1, Line 1 INSERT statement conflicted with COLUMN FOREIGN KEY constraint '  Invalid Employee ID  '. The conflict occurred in database 'Northwind', table 'Employees', column 'EmployeeID'. The statement has been terminated.

Of course, it's preferable to trap errors like this in your applications and display meaningful messages instead, but a human-readable constraint name is certainly better than one that reveals very little about a problem.

for RuBoard