Using an Index | Inside Microsoft SQL Server 2000

This chapter has primarily been concerned with the structure of indexes and the nuances of creating and managing them. Hopefully, you're aware of at least some of the places that indexes can be useful in your SQL Server applications. I'll list a few of the situations in which you can benefit greatly from indexes; I'll also revisit most of these situations in Chapters 15 and 16 when I discuss SQL Server query processing and tuning in detail.

Looking for Rows

The most straightforward use of an index is to help SQL Server find one or more rows in a table that satisfy a certain condition. For example, if you're looking for all the customers who live in a particular state (for example, your WHERE clause is WHERE state = 'WI'), you can use an index on the state column to find those rows more quickly. SQL Server can traverse the index from the root, comparing index key values to 'WI', to determine whether 'WI' exists in the table and then find the data associated with that value.

Joining

A typical join tries to find all the rows in one table that match rows in anther table. Of course, you can have joins that aren't looking for exact matching, but you're looking for some sort of relationship between tables, and equality is by far the most common relationship. A query plan for a join frequently starts with one of the tables, finds the rows that match the search conditions, and then uses the join key in the qualifying rows to find matches in the other table. An index on the join column in the second table can be used to quickly find the rows that match.

Sorting

A clustered index stores the data logically in sorted order. The data pages are linked together in order of the clustering keys. If you have a query to ORDER BY the clustered keys or by the first column of a composite clustered key, SQL Server does not have to perform a sort operation to return the data in sorted order.

If you have a nonclustered index on the column you're sorting by, the sort keys themselves are in order at the leaf level. For example, consider the following query:

 SELECT firstname, lastname, state, zipcode FROM customers ORDER BY zipcode

The nonclustered index will have all the zip codes in sorted order. However, we want the data pages themselves to be accessed to find the firstname and lastname values associated with the zipcode. The query optimizer will decide if it's faster to traverse the nonclustered index leaf level and from there access each data row or to just perform a sort on the data. If you're more concerned with getting the first few rows of data as soon as possible and less concerned with the overall time to return all the results, you can force SQL Server to use the nonclustered index with the FAST hint. I'll tell you all about hints in Chapter 16.

Inverse Indexes

SQL Server allows you to sort in either ascending or descending order, and if you have a clustered index on the sort column, SQL Server can use that index and avoid the actual sorting. The pages at the leaf level of the clustered index are doubly linked, so SQL Server can use a clustered index on lastname to satisfy this query

 SELECT * FROM customers ORDER BY lastname DESC

as easily as it can use the same index for this query:

 SELECT * FROM customers ORDER BY lastname

Earlier in the chapter, I mentioned that SQL Server 2000 allows you to create descending indexes. Why would you need them if an ascending index can be scanned in reverse order? The real benefit is when you want to sort with a combination of ascending and descending order. For example, take a look at this query:

 SELECT * FROM employees ORDER BY lastname, salary DESC

The default sort order is ascending, so the query wants the names ordered alphabetically by last name, and within duplicate last names, the rows should be sorted with the highest salary first and lowest salary last. The only kind of index that can help SQL Server avoid actually sorting the data is one with the lastname and salary columns sorted in opposite orders, such as this one:

 CREATE CLUSTERED INDEX name_salary_indx ON employees (lastname, salary DESC)

If you execute sp_help or sp_helpindex on a table, SQL Server will indicate whether the index is descending by placing a minus (-) after the index key. Here's a subset of some sp_helpindex output:

 index_name       index_description         index_keys --------------------- -------------------------------- ------------------- lastname_salary_indx  nonclustered located on PRIMARY  LastName, Salary(-)

Grouping

One way that SQL Server can perform a GROUP BY operation is by first sorting the data by the grouping column. For example, if you want to find out how many customers live in each state, you can write a query with a GROUP BY state clause. A clustered index on state will have all the rows with the same value for state in logical sequence, so the sorting and grouping operations will be very fast.

Maintaining Uniqueness

Creating a unique index (or defining a PRIMARY KEY or UNIQUE constraint that builds a unique index) is by far the most efficient method of guaranteeing that no duplicate values are entered into a column. By traversing an index tree to determine where a new row should go, SQL Server can detect within a few page reads that a row already has that value. Unlike all the other uses of indexes described in this section, using unique indexes to maintain uniqueness isn't just one option among others. Although SQL Server might not always traverse a particular unique index to determine where to try to insert a new row, SQL Server will always use the existence of a unique index to verify whether a new set of data is acceptable.