Creating an Index | Inside Microsoft SQL Server 2000

The typical syntax for creating an index is straightforward:

 CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name ON table_name (column_name [ASC | DESC][,...n])

When you create an index, you must specify a name for it. You must also specify the table on which the index will be built, and then one or more columns. For each column, you can specify that the leaf level will store the key values sorted in either ascending (ASC) or descending (DESC) order. The default is ascending. You can specify that SQL Server must enforce uniqueness of the key values by using the keyword UNIQUE. If you don't specify UNIQUE, duplicate key values will be allowed. You can specify that the index be either clustered or nonclustered. Nonclustered is the default.

CREATE INDEX has some additional options available for specialized purposes. You can add a WITH clause to the CREATE INDEX command:

 [WITH [FILLFACTOR = fillfactor] [[,] [PAD_INDEX] [[,] IGNORE_DUP_KEY] [[,] DROP_EXISTING] [[,] STATISTICS_NORECOMPUTE] [[,] SORT_IN_TEMPDB] ]

FILLFACTOR is probably the most commonly used of these options. FILLFACTOR lets you reserve some space on each leaf page of an index. In a clustered index, since the leaf level contains the data, you can use FILLFACTOR to control how much space to leave in the table itself. By reserving free space, you can later avoid the need to split pages to make room for a new entry. Chapter 3 includes some discussion of index management and page splitting, and you'll find more details in Chapter 9. But remember that FILLFACTOR is not maintained; it indicates only how much space is reserved with the existing data at the time the index is built. If you need to, you can use the DBCC DBREINDEX command to rebuild the index and reestablish the original FILLFACTOR specified.

TIP
If you plan to rebuild all of a table's indexes, simply specify the clustered index with DBCC DBREINDEX. Doing so internally rebuilds the entire table and all nonclustered indexes.

FILLFACTOR isn't usually specified on an index-by-index basis, but you can specify it this way for fine-tuning. If FILLFACTOR isn't specified, the serverwide default is used. The value is set for the server via sp_configure, fillfactor. This value is 0 by default, which means that leaf pages of indexes are made as full as possible. FILLFACTOR generally applies only to the index's leaf page (the data page for a clustered index). In specialized and high-use situations, you might want to reserve space in the intermediate index pages to avoid page splits there, too. You can do this by specifying the PAD_INDEX option, which uses the same value as FILLFACTOR.

The DROP_EXISTING option specifies that a given index should be dropped and rebuilt as a single transaction. This option is particularly useful when you rebuild clustered indexes. Normally, when a clustered index is dropped, every nonclustered index has to be rebuilt to change its bookmarks to RIDs instead of the clustering keys. Then, if a clustered index is built (or rebuilt), all the nonclustered indexes need to be rebuilt again to update the bookmarks. The DROP_EXISTING option of the CREATE INDEX command allows a clustered index to be rebuilt without having to rebuild the nonclustered indexes twice. If you are creating the index on the exact same keys that it had previously, the nonclustered indexes do not need to be rebuilt at all. If you are changing the key definition, the nonclustered indexes are rebuilt only once, after the clustered index is rebuilt.

You can ensure the uniqueness of an index key by defining the index as UNIQUE or by defining a PRIMARY KEY or UNIQUE constraint. If an UPDATE or INSERT statement would affect multiple rows, and if even one row is found that would cause duplicate keys defined as unique, the entire statement is aborted and no rows are affected. Alternatively, when you create the unique index, you can use the IGNORE_DUP_KEY option so that a duplicate key error on a multiple-row INSERT won't cause the entire statement to be rolled back. The nonunique row will be discarded, and all other rows will be inserted or updated. IGNORE_DUP_KEY doesn't allow the uniqueness of the index to be violated; instead, it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows.

I'll discuss the STATISTICS_NORECOMPUTE option in Chapter 15, when I discuss statistics maintenance.

The SORT_IN_TEMPDB option allows you to control where SQL Server performs the sort operation on the key values needed to build an index. The default is that SQL Server uses space from the filegroup on which the index is to be created. While the index is being built, SQL Server scans the data pages to find the key values and then builds leaf-level index rows in internal sort buffers. When these sort buffers are filled, they are written to disk. The disk heads for the database can then move back and forth between the base table pages and the work area where the sort buffers are being stored. If, instead, your CREATE INDEX command includes the option SORT_IN_TEMPDB, performance can be greatly improved, particularly if your tempdb database is on a separate physical disk from the database you're working with, with its own controller. You can optimize head movement because two separate heads read the base table pages and manage the sort buffers. You can get even more improvement in index creation speed if your tempdb is on a faster disk than your user database and you use the SORT_IN_TEMPDB option. As an alternative to using the SORT_IN_TEMPDB option, you can create separate filegroups for a table and its indexes. (That is, the table is on one filegroup and its indexes are on another.) If the two filegroups are on different disks with their own controllers, you can also minimize the disk head movement.

Constraints and Indexes

As I mentioned in Chapter 6, when you declare a PRIMARY KEY or UNIQUE constraint, a unique index is created on one or more columns, just as if you had used the CREATE INDEX command. The names of indexes that are built to support these constraints are the same as the constraint names. In terms of internal storage and maintenance of indexes, there is no difference between unique indexes created using the CREATE INDEX command and indexes created to support constraints. The query optimizer makes decisions based on the presence of the unique index rather than on the fact that a column was declared as a primary key. How the index got there in the first place is irrelevant to the query optimizer.

When you create a table that includes PRIMARY KEY or UNIQUE constraints, you can specify whether the associated index will be clustered or nonclustered, and you can also specify the fillfactor. Since the fillfactor applies only at the time the index is created, and since there is no data when you first create the table, it might seem that specifying the fillfactor at that time is completely useless. However, if after the table is populated you decide to use DBCC DBREINDEX to rebuild all your indexes, you can specify a fillfactor of 0 to indicate that SQL Server should use the fillfactor that was specified when the index was created. You can also specify a fillfactor when you use ALTER TABLE to add a PRIMARY KEY or UNIQUE constraint to a table, and if the table already had data in it, the fillfactor value is applied when you build the index to support the new constraint.

If you check the documentation for CREATE TABLE and ALTER TABLE, you'll see that the SORT_IN_TEMPDB option is not available for either command. It really doesn't make sense to specify a sort location when you first create the table because there's nothing to sort. However, the fact that you can't specify this alternate location when you add a PRIMARY KEY or UNIQUE constraint to a table with existing data seems like an oversight. Also note that SORT_IN_TEMPDB is not an option when you use DBCC DBREINDEX. Again, there's no reason why it couldn't have been included, but it isn't available in this release.

The biggest difference between indexes created using the CREATE INDEX command and indexes that support constraints is in how you can drop the index. The DROP INDEX command allows you to drop only indexes that were built with the CREATE INDEX command. To drop indexes that support constraints, you must use ALTER TABLE to drop the constraint. In addition, to drop a PRIMARY KEY or UNIQUE constraint that has any FOREIGN KEY constraints referencing it, you must first drop the FOREIGN KEY constraint. This can leave you with a window of vulnerability while you redefine your constraints and rebuild your indexes. While the FOREIGN KEY constraint is gone, an INSERT statement can add a row to the table that violates your referential integrity.

One way to avoid this problem is to use DBCC DBREINDEX, which drops and rebuilds all your indexes on a table in a single transaction, without requiring the auxiliary step of removing FOREIGN KEY constraints. Alternatively, you can use the CREATE INDEX command with the DROP_EXISTING option. In most cases, you cannot use this command on an index that supports a constraint. However, there is an exception. For example, the following command attempts to rebuild the index on the title_id column of the titles table in the pubs database:

 CREATE CLUSTERED INDEX UPKCL_titleidind ON titles(title_id) WITH DROP_EXISTING

SQL Server returned this error message to me:

 Server: Msg 1907, Level 16, State 1, Line 1 Cannot re-create index 'UPKCL_titleidind'. The new index definition does not match the constraint being enforced by the existing index.

How could I have known that this index supported a constraint? First of all, the name includes UPKCL, which is a big clue. However, the output of sp_helpindex tells us only the names of the indexes, the property (clustered or unique), and the columns the index is on. It doesn't tell us if the index supports a constraint. However, if we execute sp_help on a table, the output will tell us that UPKCL_titleidind is a PRIMARY KEY constraint. The error message indicates that we can't use the DROP_EXISTING clause to rebuild this index because the new definition doesn't match the current index. We can use this command as long as the properties of the new index are exactly the same as the old. In this case, I didn't specify that the index was to be UNIQUE. We can rephrase the command as follows so that the CREATE INDEX is successful:

 CREATE UNIQUE CLUSTERED INDEX UPKCL_titleidind ON titles(title_id) WITH DROP_EXISTING