Database Standards | ppk on JavaScript, 1/e

Good naming and data type standards in your company can save you tons of time and also speed the delivery time for your applications. Developing a company strategy for naming database objects can save turnover time for new employees and help when your developers move from project to project. Most companies have a high degree of developer movement in departments, and if a developer must relearn a new standard each time he moves to a new project, that's a lot of lost production time.

Naming Conventions

It's vital to have a good set of published naming conventions for your SQL Server objects. In the long duration of a business, it saves money and time as programmers are transferred internally and don't need to relearn object names. As learning curves are lowered, costs are lowered. This section covers some of the standards that work for me.

I receive lots of e-mails about this topic from people who disagree and agree with my standard. As we were developing standards and conventions at my company, I quickly realized how polarizing this topic can be. When you get ten DBAs in a room to discuss a standard, you're likely to come up with ten different options on the standard. I designed this standard for the lowest common denominator across all database platforms, hoping to find a more portable database model that would work just as well against DB2 as it did in SQL Server. By choosing the lowest common denominator, you lose some of SQL Server's nice features that you take for granted, such as upper- and lowercase object names. Before you send me hate mail on my standard, adapt whatever standard works for you, as long as you have one.

Note

As you come up with company or enterprise standards, it is of utmost importance to document them. After all, what good are standards if no one knows about them? Make sure your documents are posted on your intranet.

Table Names

The most important thing to keep in mind when it comes to naming any SQL Server object is that your business needs may change down the road. You may be thinking your application has a relatively small backend, but your next release may include some type of auditing, which makes your database grow a hundredfold.

It is because of this type of unpredictability that I generally choose the lowest common denominator. For example, Oracle and DB2 both have restrictions that will make your portability to them quite expensive if you don't take the lowest common denominator of both of the database systems.

It would be nice if your application could always be on SQL Server, but as politics move in your company, so may the database. I'm sure many of you have cringed at the sight of a database vendor taking your boss out to play golf. No good ever comes out of that (other than a possible shirt and coffee mug).

Two compatibility issues occur with DB2 and Oracle. For DB2, versions 5.0 and 6.0 permit only 18 characters. Version 7.0 supports a 30-character limit, but most companies have not yet upgraded to that version. I play it safe by using an 18-character limit.

Oracle supports only uppercase table and element names. This is the most painful of all the limitations for me because I love table names like InvestmentBanks that capitalize the first letter in each subject. If Oracle is even on your radar, make sure your SQL Server table names and column names are always in uppercase. If you issue a create table statement in Oracle, Oracle converts it all to uppercase. However, some of the migration utilities do not provide this luxury.

If you have application components, such as an accounting system you're going to integrate with other applications, you need to prefix those components for easy migration. For example, if you have a workflow system that you use in eight different applications, prefix those tables with the two-letter abbreviation WF (such as WF_QUEUE). If you do this, you can easily find the tables in Enterprise Manager. At the table level, avoid using abbreviations in other cases.

Column Names

Column names should be all uppercase and limited to 18 characters due to the previously mentioned DB2 and Oracle restrictions. If you have no intention of migrating your application to another DBMS, you can bend this rule. Column names should be descriptive to their purpose, and you should only use abbreviations for a select few names. I have a list of 24 abbreviations that I use on a regular basis (NAME=NM, DESCRIPTION=DE, FLAG=FG, and so on). For anything else, I use the entire word. Use underscores to separate words, such as FIRST_NM and APPROVED_FG. The underscores are for readability purposes. If you're using upper- and lowercase column names, you fulfill that requirement by just making the first letter of each word uppercase, such as FirstNm.

Stored Procedures

The same field length rules apply to stored procedures. Additionally, if you're developing a component to an application, versus the entire application, prefix the stored procedure name with a three-letter prefix. I begin my stored procedures with usp_ followed by the component prefix. You can reduce the learning curve tremendously by using descriptive names, and by making sure the name denotes whether your stored procedure selects, inserts, updates, or deletes data. The following are some examples:

usp_wfselectborrower (wf is the workflow system)
usp_updatepayeestatus
usp_apinsertbillrecord (ap is the accounts payable system)

Triggers

Triggers, like stored procedures, should designate what happens to the data (I = insert, U = update, D = delete). I begin all triggers with the letter T and the table name. For example, a trigger that inserts into the payable table would be TI_PAYABLE.

Databases

In the service industry, where you may be mass-producing applications, the common debate is whether to go with one database per client or a shared database. This really is a political decision more than anything technical. However, once you make the decision, stick to it.

My rule of thumb is to have a separate database for each client for anything that contains financial or client-level data. For data that can be shared, such as code tables or forms, I have a shared database.

This approach means you have the luxury of taking individual clients down as opposed to an entire system. In addition, you can rest comfortably at night knowing that a programmer didn't forget his WHERE ClientID=XXX clause, allowing Client A to see a competitor's data. Also you can back up the database in phases rather than having to back up a 100GB database. In the event of a disaster, you can restore a database and bring up one client as someone else is restoring the next client without having to bring your entire client base down.

Tip

It's not unusual to have a contractual obligation to separate clients.

Numeric Data Type Decisions

In this section, I'll discuss the numeric data types of SQL Server, and the pros and cons of each type. I'll start with exact numeric data types. This group gets its name from its ability to set the precision and scale. Precision is the total amount of digits to the right and left of the decimal, and scale is the number of places to the right of the decimal that SQL Server will store. SQL Server uses rounding to estimate any decimal spots past the scale. There are two exact data types:

Decimal or Numeric Can store values between -10^38+1 and 10^38-1. This type uses between 2 and 17 bytes, depending on the precision.

You probably noticed that the decimal and numeric data types are exactly the same. The numeric type is kept in SQL Server for backward compatibility. You may want to consider stopping the use of numeric data types so you are prepared in case the data type is retired (although chances are slim that this will ever occur).

The next category is absolute, or integer, numeric data types. These data types cannot be adjusted using a precision, and they can only hold whole numbers. There are four types of fixed data types:

BigInt Can store numbers between -9,223,372,036,854,775,807 and 9,223,372,036,854,775,807. This type uses 8 bytes of storage.
Int Can store numbers between -2,147,483,648 and 2,147,483,647. This type uses 4 bytes of storage.
SmallInt Can store numbers between -32,768 and 32,767. This type uses 2 bytes of storage.
TinyInt Can store numbers between 0 and 255. This type uses 1 byte of storage.

Since Intel processors work with 4-byte chunks of data at a time, the Int data type is the optimal type, as long as disk space is not a concern.

Other specialized data types exist, such as money and small money. Money can store values between -$922,337,203,685,477.5808 and $922,337,203,685,477.5807, and uses 8 bytes of storage. Small money, on the other hand, stores values between -$214,748.3648 and $214,748.3647, and uses 4 bytes of storage. Both of these data types will display the value 20.5 as $20.5. I prefer to use exact data types to represent these values and perform the data manipulation at the application level.

Tip

Try to use the smallest available data type possible. If you only need a yes and no flag, you can use a bit field to save space and overall performance because SQL Server will be able to fit more on each data page.

Character Data Types

The data types you use in your schema could impact the performance and the accuracy of your database. For obvious reasons, the decision to use a float versus an int field is a big one. The primary decision is between using varchar and char data types.

A varchar(10) or char(10) field named FName that stores a user's first name stores the value of 'Steve' in two completely different ways. Using a char(10), SQL Server uses 10 bytes to store the value 'Steve.' A varchar(10) field truncates trailing spaces, so it stores 'Steve' using only 5 bytes.

You take a performance hit when you use varchar fields. However, the benefit you get in storage space usually outweighs the performance hit at about 8 bytes. In other words, for a character field less than 8, use a char field. For a character field greater than 8, use a varchar.

The nvarchar data type is used to store Unicode data. If you do not plan to store Unicode data, do not use Unicode data types. A Unicode data type uses twice the amount of space as a character data type and performs considerably slower. For example, a nvarchar(10) storing the value 'Gold' uses 8 bytes. Another gotcha with nvarchar fields is that a nvarchar can only store 4,000 characters due to the fact that each character is stored into 2 bytes.

In SQL Server 6.5, it is not recommended that you use text data types, because of rare occurrences of data corruption. This problem is minimized in SQL Server 7.0, and even more so in SQL Server 2000. You can store up to 2 gigabytes of data in a text or image field. If you can avoid using text or image fields, then do so. Throughout this book, I've mentioned some of the limitations these data types will cause you. Rather than store an image in SQL Server, use a varchar field to point to the image field.

In SQL Server 7.0 and SQL Server 2000, with the increase of page size, you can store up to 8000 bytes into a char or varchar field. This makes it viable to store large chunks of data into fields of those data types. Where you would previously have stored a description in a text field, you should now use a varchar field. Although text field storage is much improved in 7.0, varchar is still more optimized.