Indexes are the other significant user -defined, on-disk data structure (in addition to tables). An index provides fast access to data when the data can be searched by the value that is the index key. We discussed indexes in Chapter 3, but some of that information bears repeating and elaboration here in our discussion of tables. (If indexing is a topic of interest to you, make sure you read "The Index Manager" section of Chapter 3. Indexing is also discussed in the context of query tuning in Chapter 14.) Think of indexes in your everyday life. You're reading a SQL Server book, and you want to find entries for the word SELECT . You have two basic choices for doing this: you can open the book and scan through it page by page, or you can look in the index in the back, find the word SELECT , and then turn to the page numbers listed. That is exactly how an index works in SQL Server. SQL Server supports clustered and nonclustered indexes (discussed further in the next two sections). Both types use standard B-trees, as shown in Figure 6-10.
A B-tree provides fast access to data by searching on a key value of the index. B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree's usefulness . The trees are managed, and branches are grafted as necessary so that navigating down the tree to find a value and locate a specific record always takes only a few page accesses . Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed will be consistent because the index has the same depth throughout.
Figure 6-10. A standard B-tree for a SQL Server index.
An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. The index is used to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified retrieval or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. In any index, the leaf level contains every key value, in key sequence.
The leaf level of a clustered index contains the data pages, not just the index keys. A clustered index keeps the data in a table physically ordered around the key. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to .
Because data can be physically ordered in only one way, a table can have only one clustered index. The query optimizer strongly favors a clustered index because it allows the data to be found directly at the leaf level. Because it defines the actual order of the data, a clustered index allows especially fast access for queries looking for a range of values. The query optimizer detects that only a certain range of data pages must be scanned. Most tables should have a clustered index. If your table will have only one index, it generally should be clustered.
In SQL Server 7, all clustered indexes are unique. If a clustered index is built without specifying the unique keyword, SQL Server will force uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as a secondary sort key to only the rows that have duplicates of their primary sort key.
In a nonclustered index, the lowest level of the tree (the leaf level) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. As we saw in Chapter 3, a bookmark can have one of two forms. If the table has a clustered index, the bookmark is the clustered index key. If the table is a heap (in other words, it has no clustered index), the bookmark will be a RID, which is an actual row locator in the form File#:Page#:Slot#. (In contrast, in a clustered index, the leaf page is the data page.)
The presence or absence of a nonclustered index doesn't affect how the data pages are organized, so you're not restricted to having only one nonclustered index per table, as is the case with clustered indexes. Each table can include as many as 249 nonclustered indexes, but you'll usually want to have far less than this number.
If you're using an index to enforce uniqueness, there's a better way. PRIMARY KEY and UNIQUE constraints make use of indexing for enforcement. We'll discuss these constraints shortly.
Searching for data using a nonclustered index requires first that the index is traversed and then that the record pointed to is retrieved. For example, to get to a data page using an index with a depth of three ”a root page, one intermediate page, and the leaf page ”all three index pages must be traversed. If the leaf level contains a clustered index key, all the levels of the clustered index will be traversed to locate the specific row. The clustered index will probably have two levels because in most cases, the clustered index is one level shallower than a nonclustered index. The data page still must be retrieved, although it has been exactly identified, so there's no need to scan the entire table. Still, it takes six logical I/O operations to get one data page. You can see that a nonclustered index is a win only if it's highly selective.
Index pages are structured much like data pages. As with all other types of pages in SQL Server, index pages have a fixed size of 8 KB, or 8192 bytes. Index pages also have a 96-byte header but, unlike data pages, no offset array appears at the end of the page. Each index has a row in the sysindexes table, with an indid value of either 1, for a clustered index, or a number between 2 and 250, indicating a nonclustered index. (An indid value of 255 indicates text or image information.) The root column value contains a file number and page number where the root of the index can be found. You can then use DBCC PAGE to examine index pages, just as you do for data pages.
The typical syntax for creating an index is straightforward:
CREATE [UNIQUE] [CLUSTERED NONCLUSTERED] INDEX index_name ON table_name ( column_name [, column_name ]...)
CREATE INDEX has some additional options available for specialized purposes:
[WITH [PAD_INDEX] [[ , ] FILLFACTOR = fillfactor ] [[ , ] IGNORE_DUP_KEY] [[ , ] DROP_EXISTING] [[ , ] STATISTICS_NORECOMPUTE] ] [ON filegroup ]
FILLFACTOR is probably the most commonly used of these options. FILLFACTOR lets you reserve some space on each leaf page of an index. (In a clustered index, this equals the data page.) By reserving some free space with FILLFACTOR, you can later avoid the need to split pages to make room for an entry. (Refer to the discussion of index management and page splitting in Chapter 3.) But remember that FILLFACTOR is not maintained ; it indicates only how much space is reserved with the existing data. If you need to, you can use the DBCC DBREINDEX command to rebuild the index and to reestablish the original FILLFACTOR specified.
If you'll be rebuilding all of a table's indexes, simply specify the clustered index with DBCC DBREINDEX. Doing so internally rebuilds the entire table and all nonclustered indexes.
FILLFACTOR isn't usually specified on an index-by-index basis, but you can specify it this way for fine-tuning. If FILLFACTOR isn't specified, the serverwide default is used. The value is set for the server via sp_configure, fillfactor . This value is 0 by default, which means that leaf pages of indexes are made as full as possible. FILLFACTOR generally applies only to the index's leaf page (the data page for a clustered index). In specialized and high-use situations, you might want to reserve space in the intermediate index pages to avoid page splits there, too. You can do this by using the PAD_INDEX option, which uses the same value as FILLFACTOR.
The DROP_EXISTING option is valid only when creating a clustered index and specifies that the given table's clustered index should be dropped and rebuilt. Then all existing nonclustered indexes are updated. Because nonclustered indexes must go through the clustered index to gain access to rows in a table, the DROP_EXISTING option prevents the nonclustered indexes from having to be rebuilt twice. Normally, when a clustered index is dropped, every nonclustered index has to be rebuilt to change its bookmarks to RIDs instead of the clustering keys. Then, if a clustered index is built (or rebuilt), all the nonclustered indexes must be rebuilt again to update the bookmarks. The DROP_EXISTING option to the CREATE INDEX command allows a clustered index to be rebuilt without affecting any nonclustered indexes on the table.
You can ensure the uniqueness of a key by using the PRIMARY KEY and UNIQUE constraints, which we'll discuss in "PRIMARY KEY and UNIQUE Constraints" These constraints work by making a unique index on the key value or values. If an UPDATE or INSERT statement would affect multiple rows, and if even one row is found that would cause duplicate keys in the table, the entire statement is aborted and no rows are affected. With a unique index, you can use the IGNORE_DUP_KEY option so that a nonunique error on a multiple-row UPDATE or INSERT won't cause the entire statement to be rolled back. The nonunique row will be discarded, and all other rows will be inserted or updated. IGNORE_DUP_KEY doesn't allow the uniqueness of the index to be violated; instead, it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows.
The STATISTICS_NORECOMPUTE option will be discussed in Chapter 14 of the book, when we discuss statistics maintenance.