Performance Tuning Your Database Model

In a database application, performance begins with a sound database model. With that in mind, here are a few ways you can performance tune your database model, so your queries will run more efficiently:

A little denormalization goes a long way. I hate seeing database models that have a table called Gender with three values in it (I'll let you guess what the third value is). Additionally, if you have a one-to-one table that is accessed with its parent table constantly, consider merging the tables. Experiment often with this type of change. Sometimes you'll gain performance; often, you'll make it worse.
Put more responsibility back on the application. Why develop a view to force the data to look a certain way when you could have the application smooth it much easier?
Horizontally partition your data. If you have a table that represents Web site hits, consider separating it into 12 tables by month. If you have even distribution of data, your load becomes 1/12 the size. This also gives you the option of placing some of the data on separate servers, and using distributed partitioned views to update and view it. (I'll discuss this in detail in Chapter 9.) Finally, by doing this, you can use more aggressive index strategies on the current month and use higher fill factors for past months.
Keep your rows as thin as possible. In other words, don't use a char(255) to store the value for a user's name; it wastes space.
Avoid using triggers for referential integrity. It is much quicker to use foreign key constraints, and constraints are better equipped to handle bulk operations.
Avoid using text fields. Try to find out exactly what the requirement is for the field. If you're inserting a maximum of 2,000 characters, it's more efficient to use a varchar(2000) column.
Unless you have a multilingual need, don't use Unicode data types such as nvarchar and ntext. However, keep in mind that when you use DTS to convert data from another data source, such as Access, DTS often creates Unicode columns on your server. If your company aims to be multilingual, you will need these data types.
Avoid creating clustered indexes on identity columns. Clustered indexes perform better on range queries, such as a date. When you have a clustered index on an identity column, you risk your data receiving hot spots, which are caused by many people updating the same data page.
Disallow NULLS on columns whenever possible. If you know you're always going to have address information, you don't need NULLS on the columns. Handling NULLS causes added overhead.
In some cases, it makes sense to avoid identity columns if you have unique data in the table. For example, if the user name is the unique value, don't create an extra unneeded key. This prevents the need to create unnecessary joins to determine the user name from child tables. This may cost you a little in storage, but it saves you tons of query time. Be flexible in this practice, but if your row already has a unique value, why artificially assign one?

Row Size

Beginning in SQL Server 7.0, the amount of space that could fit on a data page was extended to 8K (8060 bytes). This extended the amount of space that could fit on a single row to 8K, and increased the available size of character data to 8,000 characters per column in some cases.

When you create tables that hold a wide variety of information, make sure you don't risk breaching this 8K limit on row size. You can have a table that is wider than 8K, as long as you have variable columns, such as varchar columns. varchar columns only use the amount of storage space that is being used by the column. For example, if you have a varchar(40) and you store the value 'Brian,' the column is only storing 5 bytes.

A char field, on the other hand, is a fixed-width column and uses all the space given to it. The value of 'Brian' stored into a char(40) field is 40 bytes. These fields perform slightly faster than varchar fields, because there is no overhead to manage the variable length.

If you try to create a table that overruns the 8K storage limitation, you receive an error message. The following table exceeds that limit by far:

CREATE TABLE CustomerDemographics      (CustomerTypeID nchar(10) NOT NULL,      CustomerShortDesc char(8000) NULL,      CustomerAdd char(8000) NULL)

The above table will result in the following error upon creation:

Server: Msg 1701, Level 16, State 2, Line 1 Creation of table 'CustomerDemographics' failed because the  row size would be 16041, including internal overhead.  This exceeds the maximum allowable table row size, 8060.

You can create a table that is larger than 8K in width, as long as you use variable columns as shown here:

CREATE TABLE CustomerDemographics      (CustomerTypeID nchar(10) NOT NULL,      CustomerShortDesc varchar(8000) NULL,      CustomerAdd varchar(8000) NULL)

This will output the following warning after creating the table:

Warning: The table 'CustomerDemographics' has been created but its maximum row size (16045) exceeds the maximum number of bytes per row (8060). INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes.

If you try to insert more than 8K of data into the row, you receive an error and the query terminates with the following error:

Server: Msg 511, Level 16, State 1, Line 1 Cannot create a row of size 8093 which is greater  than the allowable maximum of 8060. The statement has been terminated.

The only workaround for this is to separate your table into multiple tables and create a one-to-one relationship among them.

Text, Ntext, and Image Fields

Text, ntext, and image data is normally not directly stored in the main page of your table's row. Instead, the column points to a separate location inside the database where the larger field is found. The pointer that directs requests to the other pages is 16 bytes.

This data is in a collection of pages where data for tables is not located. Since this type of data is not directly stored in the row, you are not restricted to the 8K limit, and these fields can be up to 2GB in size. This is also one of the reasons that text, ntext, and image fields are so slow. A handy new option was sneaked into SQL Server 2000 that gives you the ability to optionally store some small text, ntext, and image data in the actual data page of the row.

You can use the sp_tableoption stored procedure with the Text in Row parameter to turn on this option. Once turned on, small amounts of data (256 bytes by default) are stored in the row. This increases the performance of these fields significantly, since SQL Server doesn't have to leave its page to find the text field. If the data you try to insert into one of these fields is too large, only the 16-byte pointer is inserted. The text, ntext, or image field is placed in the collection of pages outside the row. To turn on this option, use the following syntax:

sp_tableoption N'<table name>', 'Text in Row', 'ON'

You can also specify a size limit in bytes for the column. For example, to set the table's text in row limit to 512 bytes, use the following syntax:

sp_tableoption N'<table name>', 'Text in Row', '512'

Tip

SQLServerCentral.com has a large collection of scripts stored with a text column. When tuning the script and FAQ section, I turned this option on and experienced about a 35 percent performance enhancement in these two sections when accessing the text data.

Once you convert to this option, text fields are not immediately converted to the in-row format. The data is slowly converted as UPDATE statements are issued for the rows. Any new rows are in the in-row format if they fall below the size requirement. You can expedite this process by issuing an UPDATE statement like this:

UPDATE <table name> SET <text column name> = <text column name>

For example, the following command will accelerate moving old data to the in-row format in the employees table:

UPDATE EMPLOYEES SET Notes = Notes

Caution

If you turn off the Text in Row option, there may be a long delay as the data is moved into the new area outside the table. While this is occurring, database performance slows significantly. Make sure you only do this during off-peak hours. This is a logged event and the table is locked.

Once you turn the Text in Row option on, you'll see some minor side effects. The rarely used READTEXT, UPDATETEXT, or WRITETEXT statements will no longer work on tables. Also, DBLibrary options such as dbreadtext and dbwritetext, which handle text, ntext, and image fields, will not work.

Minimizing Wasted Column Space

Don't use more column space than is necessary. For example, you don't want to have a varchar(255) column to store a user's name. So how wide do your columns really need to be to hold your data? One approach is to overestimate the amount of space you need, and then scale back when you start to see real data.

In the Trenches

Anyone who uses ADO may experience problems when selecting data from text, ntext, and image fields using ADO in ASP (see Knowledge Base article Q175239). Buggy behavior appears, as if data doesn't exist in the table when you select from it, or the procedure forces the following error:

Microsoft OLE DB Provider for ODBC Drivers error '80020009'

This is actually by design. You can avoid this problem by selecting these types of fields last in your query. If you have more than one of these field types, select the first large field first, followed sequentially by the others.

This problem was corrected in MDAC 2.1 SP 2 or with the 3.7 SQL Server driver. You may have the problem where developers run a query returning all rows with the SELECT * FROM <table name> command. If this is the case, always place your text, ntext, and image fields at the end of the table.

To find out how much space you're actually using in a column, you can use the max() and len() functions. The max() function tells you the largest amount of space used by the column in the table. The len() function tells you how much of the character field you're actually using. For example, let's look at the amount used in the Customers table in the Northwind database with the following query:

SELECT Max(Len(CompanyName)) MaximumLength from Customers

This query outputs the following results:

MaximumLength  -------------  36

Summarizing Data

On my Web site (http://www.sqlservercentral.com/), I have a voting mechanism where users can vote on how much they like or dislike an article. The user's vote goes into a table called UserVotes. Once every six hours, this number is averaged, and the table that holds the articles is updated. This process is scheduled via a SQL Server Agent job.

With information like this, summarization makes a lot of sense. You would never want your Web users calculating information like this on the fly. Amazon.com has a similar system with their sales ranking system.