Designing Effective Indexes | Microsoft SQL Server 2000 Administrators Companion

3 4

The effectiveness—defined as the maximum efficiency and performance—of an index is determined by the design of the index and of the SQL statements that take advantage of it. It's not enough to just create indexes; you must also tailor the SQL statements to indexes' strengths. An index is used only when one or more of the index keys are included in the WHERE clause of the SQL statement. In this section, you'll learn the properties of a good index, as well as the best and worst times to create indexes.

Characteristics of an Effective Index

As we've seen, a good index helps you to retrieve your data by using fewer I/O operations and system resources than a table scan. Because an index scan requires traversing the tree to find an individual value, using an index is not efficient when you are retrieving large amounts of data.

NOTE
If a query accesses more than 20 percent of the rows in a table, a table scan is more efficient than using an index.

An effective index retrieves only a few rows—in fact, most queries end up using only a few rows anyway. To perform effectively, an index should be designed with good selectivity. The selectivity of an index is based on the number of rows per index key value. An index with poor selectivity has multiple rows per index key value; an index with good selectivity has a few rows or one row per index key value. A unique index has the highest selectivity. The selectivity of the index is stored within the index distribution statistics. You can view the selectivity of an index by using the command DBCC SHOW_STATISTICS. An index with good selectivity is more likely to be used by Query Optimizer.

You can enhance the selectivity of an index by using multiple columns to create a composite index. Several columns with poor selectivity can be joined in a composite index to form one index that has good selectivity. Although a unique index provides the best selectivity, be sure to choose an index type that fits your data model. For example, if you have several entries for the last name "Smith" in a Customers table, you won't be able to create a unique index on last names, but you might still find such an index useful.

When to Use Indexes

Indexes are best suited for tasks such as the following:

Queries that specify a narrow search criteria These queries should retrieve only a few rows that match the specific criteria.
Queries that specify a range of values These queries should also retrieve a small number of rows.
Searches that are used in a join Columns that are often used as join keys are good candidates for indexes.
Searches that retrieve data in a specific order If the resulting data set is to be sorted in the order of a clustered index, the sort is not necessary, because the resulting data is returned presorted. For example, if the clustered index is on the lastname, firstname columns and the application requires sorting by last name and then first name, it is not necessary to add the ORDER BY qualifiers.

Indexes should be used cautiously and sparingly on tables that have a large number of insert, update, and delete operations performed on them because each operation that changes the data must update the index pages as well.

Index Guidelines

You should follow a number of index guidelines to increase both the efficiency and the performance of the system:

Use indexes in moderation. A few indexes can be quite useful, but too many indexes can adversely affect the performance of the system. Because the indexes must be maintained, every time an insert, update, or delete operation is performed on the table, the index must be updated. If there are very many of these operations, the overhead of maintaining the index can be quite high.
Don't index small tables. It is sometimes much more efficient to perform table scans if the table is small (say, a few hundred rows). The additional overhead of the index is not worth the benefit.
Use as few index key columns as necessary to achieve good selectivity. The fewer columns, the better, but not at the expense of selectivity. An index with a few columns is called a narrow index, and an index with many key columns is called a wide index. Narrow indexes take up less space and require less maintenance overhead than do wide indexes.
Use covering queries whenever possible. A covering query is one in which all of the desired data is held in the index keys—that is, all of the index keys are also the selected columns. With a covering query, only the index is accessed; the table itself is bypassed. A covering index is an index in which all of the table columns are included. For example, if the index is on columns a and b and c and the SELECT statement is requesting data from only those columns, only the index needs to be accessed.