Creating an Index | Inside Microsoft SQL Server 2005: The Storage Engine (Solid Quality Learning)

The basic syntax for creating an index is straightforward:

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name   ON table_name (column_name [ASC | DESC][,...n])

When you create an index, you must specify a name for it. You must also specify the table on which the index will be built and one or more columns. For each column, you can specify that the leaf level store the key values sorted in either ascending (ASC) or descending (DESC) order. The default is ascending. You can specify that SQL Server must enforce uniqueness of the key values by using the keyword UNIQUE. If you don't specify UNIQUE, duplicate key values will be allowed. You can specify that the index be either clustered or nonclustered. Nonclustered is the default.

CREATE INDEX has some additional options available for specialized purposes. You can add a WITH clause to the CREATE INDEX command:

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name   ON table_name (column_name [ASC | DESC][,...n]) [WITH [FILLFACTOR = fillfactor] [[,] [PAD_INDEX] = { ON | OFF }] [[,] DROP_EXISTING  = { ON | OFF }] [[,] IGNORE_DUP_KEY = { ON | OFF }] [[,] SORT_IN_TEMPDB = { ON | OFF }] [[,] STATISTICS_NORECOMPUTE  = { ON | OFF }] [[,] ALLOW_ROW_LOCKS = { ON | OFF }] [[,] ALLOW_PAGE_LOCKS = { ON | OFF }] [[,] MAXDOP = max_degree_of_parallelism] [[,] ONLINE = { ON | OFF }] ]

Note

SQL Server 2000 does not use ON and OFF. If you want an option set to ON, specify the option; if you don't want the option, don't specify it. SQL Server 2005 offers a backward-compatible syntax without the ON and OFF keywords, but for new development the new syntax is recommended.

FILLFACTOR is probably the most commonly used of these options. FILLFACTOR lets you reserve some space on each leaf page of an index. In a clustered index, because the leaf level contains the data, you can use FILLFACTOR to control how much space to leave in the table itself. By reserving free space, you can later avoid the need to split pages to make room for a new entry. An important fact about FILLFACTOR is that the value is not maintained; it indicates only how much space is reserved with the existing data at the time the index is built. If you need to, you can use the ALTER INDEX command to rebuild the index and reestablish the original FILLFACTOR specified. I'll talk about rebuilding indexes in the section titled "Managing Indexes."

FILLFACTOR isn't usually specified on an index-by-index basis, but you can specify it this way for fine-tuning. If FILLFACTOR isn't specified, the serverwide default is used. The value is set for the server via the sp_configure procedure, with the fillfactor option. This configuration value is 0 by default, which means that leaf pages of indexes are made as full as possible. FILLFACTOR generally applies only to the index's leaf page (the data page for a clustered index). In specialized and high-use situations, you might want to reserve space in the intermediate index pages to avoid page splits there, too. You can do this by specifying the PAD_INDEX option, which instructs SQL Server to use the same FILLFACTOR value at all levels of the index. Just like for FILLFACTOR, PAD_INDEX is only applicable when an index is created (or re-created).

The DROP_EXISTING option specifies that a given index should be dropped and rebuilt as a single transaction. This option is particularly useful when you rebuild clustered indexes. Normally, when a developer drops a clustered index, SQL Server must rebuild every nonclustered index to change its bookmarks to RIDs instead of the clustering keys. Then, if a developer builds (or rebuilds) a clustered index, SQL Server must again rebuild all nonclustered indexes to update the bookmarks. The DROP_EXISTING option of the CREATE INDEX command allows a clustered index to be rebuilt without having to rebuild the nonclustered indexes twice. If you are creating the index on exactly the same keys that it had previously, the non-clustered indexes do not need to be rebuilt at all. If you are changing the key definition, the non-clustered indexes are rebuilt only once, after the clustered index is rebuilt. Instead of using the DROP_EXISTING option to rebuild an existing index, you can use the ALTER INDEX command, which is new in SQL Server 2005. I'll discuss this command in the section titled "ALTER INDEX" later in this chapter.

You can ensure the uniqueness of an index key by defining it as UNIQUE or by defining a PRIMARY KEY or UNIQUE constraint. If an UPDATE or INSERT statement would affect multiple rows, or if even one row is found that would cause duplicates of keys defined as unique, the entire statement is aborted and no rows are affected. Alternatively, when you create the unique index, you can use the IGNORE_DUP_KEY option so that a duplicate key error on a multiple-row INSERT won't cause the entire statement to be rolled back. The nonunique row will be discarded, and all other rows will be inserted or updated. IGNORE_DUP_KEY doesn't allow the uniqueness of the index to be violated; instead, it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows.

The SORT_IN_TEMPDB option allows you to control where SQL Server performs the sort operation on the key values needed to build an index. The default is that SQL Server uses space from the filegroup on which the index is to be created. While the index is being built, SQL Server scans the data pages to find the key values and then builds leaf-level index rows in internal sort buffers. When these sort buffers are filled, they are written to disk. If the SORT_IN_TEMPDB option is specified, the sort buffers are allocated from tempdb, so much less space is needed in the source database. If you don't specify SORT_IN_TEMPDB, not only will your source database require enough free space for the sort buffers and a copy of the index (or the data, if a clustered index is being built), but the disk heads for the database will need to move back and forth between the base table pages and the work area where the sort buffers are stored. If, instead, your CREATE INDEX command includes the option SORT_IN_TEMPDB, performance can be greatly improved if your tempdb database is on a separate physical disk from the database you're working with. You can optimize head movement because two separate heads read the base table pages and manage the sort buffers. You can speed up index creation even more if your tempdb database is on a faster disk than your user database and you use the SORT_IN_TEMPDB option. As an alternative to using the SORT_IN_TEMPDB option, you can create separate filegroups for a table and its indexesthat is, the table is on one filegroup and its indexes are on another. If the two filegroups are on different disks, you can also minimize the disk head movement.

The STATISTICS_NORECOMPUTE option determines whether the statistics on the index should be updated automatically. Every index maintains a histogram representing the distribution of values for the leading column of the index. SQL Server's query optimizer uses these statistics to determine the usefulness of a particular index when determining a query plan. As data is modified, the statistics can get out of date, and this can lead to less than optimal query plans if the statistics are not updated. In Chapter 4, I told you about a database option to enable all statistics in a database to be automatically updated when needed. The STATISTICS_NORECOMPUTE option allows a particular index to not have its statistics automatically updated when the data changes. Setting this option to OFF overrides an ON value for the AUTO_UPDATE_STATISTICS database option. If the database option is set to OFF, you cannot override that behavior for a particular index, and in that case, all statistics in the database must be manually updated using UPDATE STASTISTICS or sp_updatestats. I'll discuss statistics maintenance in a lot more detail in Inside Microsoft SQL Server 2005: Tuning and Optimization. I'll discuss the index options to ALLOW_PAGE_LOCKS or ALLOW_ROW_LOCKS in Chapter 8.

The option MAXDOP controls the maximum number of processors that can be used for the index creation operation. It can override the server configuration option max degree of parallelism for index building. Allowing multiple processors to be used for the index creation operation can greatly enhance the performance of index build operations. As with other parallel operations, the SQL Server optimizer determines at run time the actual number of processors to use, based on the current load on the system. The MAXDOP value just sets a maximum. Multiple processors can be used for index creation only when you run SQL Server 2005 Enterprise or Developer edition.

The final option available with the CREATE INDEX command is the ONLINE option, which is new in SQL Server 2005. This option actually deserves an entire section of its own, so I'll discuss it in detail later in the section titled "Online Index Building."

Included Columns

The key columns in SQL Server 2005 indexes are limited to 16 columns and a total of 900 bytes, just like in previous versions. However, SQL Server 2005 also allows you to define an index with included columns.

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name   ON table_name (column_name [ASC | DESC][,...n])      [ INCLUDE ( column_name [ ,...n ] ) ]

These columns listed after the keyword INCLUDE allow you to exceed the 900-byte or 16-key column limits in the leaf level of a nonclustered index. The included columns appear only in the leaf level and do not control the sort order of the index rows in any way. Their purpose is to allow more information in the leaf level so you have more opportunity to use an index-tuning capability called covering indexes. A covering index is a nonclustered index in which all the information needed to satisfy a query can be found in the leaf level, so SQL Server doesn't have to access the data pages at all. In certain situations, SQL Server can silently add an included column to your indexes. This might happen when an index is created on a partitioned table and no ON filegroup or ON partition_scheme is specified. I'll discuss this situation in the section titled "Table and Index Partitioning."

Covering indexes are discussed in Inside Microsoft SQL Server 2005: T-SQL Querying and will be discussed in more detail in Inside Microsoft SQL Server: Tuning and Optimization.

Index Placement

A final clause in the CREATE INDEX command allows you to specify the placement of the index. You can specify that an index should either be placed on a particular filegroup or should be partitioned according to a predefined partition scheme. By default, if no filegroup or partition scheme is specified, the index will be placed on the same filegroup as the base table. I discussed filegroups in Chapter 4, and I'll tell you about partitioning later in this chapter.

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name   ON table_name (column_name [ASC | DESC][,...n]) [ ON { partition_scheme_name ( column_name )        | filegroup_name       }

Constraints and Indexes

When you declare a PRIMARY KEY or UNIQUE constraint, a unique index is created on one or more columns, just as if you had used the CREATE INDEX command. The names of indexes that are built to support these constraints are the same as the constraint names. In terms of internal storage and maintenance of indexes, there is no difference between unique indexes created using the CREATE INDEX command and indexes created to support constraints. The query optimizer makes decisions based on the presence of the unique index rather than on the fact that a column was declared as a primary key. How the index got there in the first place is irrelevant to the query optimizer.

When you create a table that includes PRIMARY KEY or UNIQUE constraints, you can specify whether the associated index will be clustered or nonclustered and you can also specify the fillfactor. Because the fillfactor applies only at the time the index is created, and because there is no data when you first create the table, it might seem that specifying the fillfactor at that time is completely useless. However, if after the table is populated you decide to rebuild your indexes, if no new fillfactor is specified, the original value will be used. You can also specify a fillfactor when you use ALTER TABLE to add a PRIMARY KEY or UNIQUE constraint to a table; if the table already has data in it, the fillfactor value is applied when you build the index to support the new constraint.

The biggest difference between indexes created using the CREATE INDEX command and indexes that support constraints is in how you can drop the index. The DROP INDEX command allows you to drop only indexes that were built with the CREATE INDEX command. To drop indexes that support constraints, you must use ALTER TABLE to drop the constraint. In addition, to drop a PRIMARY KEY or UNIQUE constraint that has any FOREIGN KEY constraints referencing it, you must first drop the FOREIGN KEY constraint. This can leave you with a window of vulnerability if your goal is to drop indexes and immediately rebuild them, perhaps with a new fillfactor. While the FOREIGN KEY constraint is gone, an INSERT statement can add a row to the table that violates your referential integrity.

One way to avoid this problem is to use ALTER INDEX, which allows you to drop and rebuild one or all of your indexes on a table in a single statement, without requiring the auxiliary step of removing FOREIGN KEY constraints. Alternatively, you can use the CREATE INDEX command with the DROP_EXISTING option if you want to rebuild existing indexes without having to drop and re-create them in two steps. Although you can normally use CREATE INDEX with DROP_EXISTING to redefine the properties of an indexsuch as the key columns or included columns, or whether the index is uniqueif you use CREATE INDEX with DROP_EXISTING to rebuild an index that supports a constraint, you cannot make these kinds of changes. The index must be re-created with the same columns, in the same order, and the same values for uniqueness and clustering. We'll look at more details regarding rebuilding indexes in the section titled "Index Maintenance."

The issue of whether a unique index should be defined using a UNIQUE or PRIMARY KEY constraint is a common concern and a frequent cause of confusion. As I mentioned earlier, there is no internal difference in structure, or in the optimizer's choices, for a unique clustered index built using the CREATE INDEX command or one that was built to automatically support a PRIMARY KEY constraint. The difference is really a design issue, so it is beyond the scope of this book, which deals with internals. However, I will point out that a constraint is a logical construct and an index is a physical one. When you build an index, you are asking SQL Server to create a physical structure that takes up storage space and must be maintained during data modification operations. When you define a constraint, you are defining a property of your data and expecting SQL Server to enforce that property, but you are not telling SQL Server how to enforce it. In the current version, SQL Server supports PRIMARY KEY and UNIQUE constraints by creating unique indexes, but there is no requirement that it do so. In a future version, SQL Server might have some other way of enforcing uniqueness besides building an index, but SQL Server 2005 does not.