Managing an Index | Inside Microsoft SQL Server 2000

SQL Server maintains your indexes automatically. As you add new rows, it automatically inserts them into the correct position in a table with a clustered index, and it adds new leaf-level rows to your nonclustered indexes that will point to the new rows. When you remove rows, SQL Server automatically deletes the corresponding leaf-level rows from your nonclustered indexes. In Chapter 9, you'll see some specific examples of the changes that take place within an index as data modification operations take place.

Types of Fragmentation

In general, you need to do very little in terms of index maintenance. However, indexes can become fragmented. The fragmentation can be of two types. Internal fragmentation occurs when space is available within your index pages—that is, when the indexes are not making the most efficient use of space. External fragmentation occurs when the logical order of pages does not match the physical order, or when the extents belonging to a table are not contiguous.

Internal fragmentation means that the index is taking up more space than it needs to. Scanning the entire table involves more read operations than if no free space were available on your pages. However, internal fragmentation is sometimes desirable—in fact, you can request internal fragmentation by specifying a low fillfactor value when you create an index. Having room on a page means that there is space to insert more rows without having to split a page. Splitting is a relatively expensive operation and can lead to external fragmentation because when a page is split, a new page must be linked into the indexes page chain, and usually the new page is not contiguous to the page being split.

External fragmentation is truly bad only when SQL Server is doing an ordered scan of all or part of a table or index. If you're seeking individual rows through an index, it doesn't matter where those rows are physically located—SQL Server can find them easily. If SQL Server is doing an unordered scan, it can use the IAM pages to determine which extents need to be fetched, and the IAM pages list the extents in disk order, so the fetching can be very efficient. Only if the pages need to be fetched in logical order, according to their index key values, do you need to follow the page chain. If the pages are heavily fragmented, this operation is more expensive than if there were no fragmentation.

Detecting Fragmentation

You can use the DBCC SHOWCONTIG command to report on the fragmentation of an index. Here's the syntax:

 DBCC SHOWCONTIG [ ( { table_name | table_id | view_name | view_id } [ , index_name | index_id ] )   ] [ WITH { ALL_INDEXES | FAST [ , ALL_INDEXES ] | TABLERESULTS [ , { ALL_INDEXES } ] [ , { FAST | ALL_LEVELS } ] } ]

You can ask to see the information for all the indexes on a table or view or just one index, and you can specify the object and index by name or by ID number. Alternatively, you can use the ALL_INDEXES option, which provides an individual report for each index on the object, regardless of whether a specific index is specified.

Here's some sample output from running a basic DBCC SHOWCONTIG on the order details table in the Northwind database:

 DBCC SHOWCONTIG scanning 'Order Details' table... Table: 'Order Details' (325576198); index ID: 1, database ID: 6 TABLE level scan performed. - Pages Scanned................................: 9 - Extents Scanned..............................: 6 - Extent Switches..............................: 5 - Avg. Pages per Extent........................: 1.5 - Scan Density [Best Count:Actual Count].......: 33.33% [2:6] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 16.67% - Avg. Bytes Free per Page.....................: 673.2 - Avg. Page Density (full).....................: 91.68%

By default, DBCC SHOWCONTIG scans the page chain at the leaf level of the specified index and keeps track of the following values:

Average number of bytes free on each page (Avg. Bytes Free per Page)
Number of pages accessed (Pages scanned)
Number of extents accessed (Extents scanned)
Number of times a page had a lower page number than the previous page in the scan (This value for Out of order pages is not displayed, but it is used for additional computations.)
Number of times a page in the scan was on a different extent than the previous page in the scan (Extent Switches)

SQL Server also keeps track of all the extents that have been accessed, and then it determines how many gaps are in the used extents. An extent is identified by the page number of its first page. So, if extents 8, 16, 24, 32, and 40 make up an index, there are no gaps. If the extents are 8, 16, 24, and 40, there is one gap. The value in DBCC SHOWCONTIG's output called Extent Scan Fragmentation is computed by dividing the number of gaps by the number of extents, so in this example the Extent Scan Fragmentation is ¼, or 25 percent. A table using extents 8, 24, 40, and 56 has 3 gaps, and its Extent Scan Fragmentation is ¾, or 75 percent. The maximum number of gaps is the number of extents - 1, so Extent Scan Fragmentation can never be 100 percent.

The value in DBCC SHOWCONTIG's output called Logical Scan Fragmentation is computed by dividing the number of Out Of Order Pages by the number of pages in the table. This value is meaningless in a heap.

You can use either the Extent Scan Fragmentation value or the Logical Scan Fragmentation value to determine the general level of fragmentation in a table. The lower the value, the less fragmentation there is. Alternatively, you can use the value called Scan Density, which is computed by dividing the optimum number of extent switches by the actual number of extent switches. A high value means that there is little fragmentation. Scan Density is not valid if the table spans multiple files, so all in all it is less useful than the other values.

You can use DBCC SHOWCONTIG to have the report returned in a format that can be easily inserted into a table. If you have a table with the appropriate columns and datatypes, you can execute the following:

 INSERT INTO CONTIG_TABLE EXEC ('dbcc showcontig ([order details]) with all_indexes, tableresults')

Here's a subset of the results:

 ObjectName     IndexId   Level    Pages    Rows ----------------- ----------- ----------- ----------- ----------- Order Details   1       0       9       2155 Order Details    2      0      4      2155 Order Details   3      0      4      2155

Many more columns are returned if you specify TABLERESULTS. In addition to the values that the nontabular output returns, you also get the values listed in Table 8-3.

Table 8-3. Additional columns returned when TABLERESULTS is specified.

Column	Meaning
ObjectName	Name of the table or view processed.
ObjectId	ID of the object processed.
IndexName	Name of the index processed (NULL for a heap).
IndexId	ID of the index (0 for a heap).
Level	Level of the index. Level 0 is the leaf (or data) level of the index. The level number increases moving up the tree toward the index root. The level is 0 for a heap. By default, only level 0 is reported. If you specify ALL_LEVELS along with TABLERESULTS, you get a row for each level of the index.
Pages	Number of pages comprising that level of the index or the entire heap.
Rows	Number of data or index records at that level of the index. For a heap, this is the number of data records in the entire heap.
MinimumRecordSize	Minimum record size in that level of the index or the entire heap.
MaximumRecordSize	Maximum record size in that level of the index or the entire heap.
AverageRecordSize	Average record size in that level of the index or the entire heap.
ForwardedRecords	Number of forwarded records in that level of the index or the entire heap.
Extents	Number of extents in that level of the index or the entire heap.

One last option available to DBCC SHOWCONTIG is the FAST option. The command takes less time to run when you specify this option, as you might imagine, because it gathers only data that is available from the IAM pages and the nonleaf levels of the index, and it returns only these values:

Pages Scanned
Extent Switches
Scan Density
Logical Scan Fragmentation

Since the level above the leaf has pointers to every page, SQL Server can determine all the page numbers and determine the Logical Scan Fragmentation. In fact, it can also use the IAM pages to determine the Extent Scan Fragmentation. However, the purpose of the FAST option is to determine whether a table would benefit from online defragmenting, and since online defragmenting cannot change the Extent Scan Fragmentation, there is little benefit in reporting it. I'll talk about online defragmenting in the next section.

Removing Fragmentation

Several methods are available for removing fragmentation from an index. First, you can rebuild the index and have SQL Server allocate all new contiguous pages for you. You can do this by using a simple DROP INDEX and CREATE INDEX, but I've already discussed some reasons why this is not optimal. In particular, if the index supports a constraint, you can't use the DROP INDEX command. Alternatively, you can use DBCC DBREINDEX, which can rebuild all the indexes on a table in a single operation, or you can use the drop_existing clause along with CREATE INDEX.

The drawback of these methods is that the table is unavailable while the index is being rebuilt. If you are rebuilding only nonclustered indexes, there is a shared lock on the table, which means that no modifications can be made, but other processes can SELECT from the table. Of course, they cannot take advantage of the index you're rebuilding, so the query might not perform as well as it should. If you're rebuilding the clustered index, SQL Server takes an exclusive lock, and no access is allowed at all.

SQL Server 2000 allows you to defragment an index without completely rebuilding it. In this release, DBCC INDEXDEFRAG reorders the leaf level pages into physical order as well as logical order, but using only the pages that are already allocated to the leaf level. It basically does an in-place ordering, similar to a sorting technique called bubble-sort. This can reduce the logical fragmentation to 0 to 2 percent, which means that an ordered scan through the leaf level will be much faster. In addition, DBCC INDEXDEFRAG compacts the pages of an index, based on the original fillfactor, which is stored in the sysindexes table. This doesn't mean that the pages always end up with the original fillfactor, but SQL Server uses that as a goal. The process does try to leave at least enough space for one average-size row after the defragmentation takes place. In addition, if a lock cannot be obtained on a page during the compaction phase of DBCC INDEXDEFRAG, SQL Server skips the page and doesn't go back to it. Any empty pages created as a result of this compaction are removed. Here's an example:

 DBCC INDEXDEFRAG(0, 'charge', 1)

Here's the output:

 Pages Scanned Pages Moved Pages Removed ------------- ----------- ------------- 673      668      12

The algorithm SQL Server uses for DBCC INDEXDEFRAG finds the next physical page in a file belonging to the leaf level and the next logical page in the leaf level with which to swap it. It finds the next physical page by scanning the IAM pages for that index. The single-page allocations are not included in this release. Pages on different files are handled separately. The algorithm finds the next logical page by scanning the leaf level of the index. After each page move, all locks and latches are dropped and the key of the last page moved is saved. The next iteration of the algorithm uses the key to find the next logical page. This allows other users to update the table and index while DBCC INDEXDEFRAG is running.

For example, suppose the leaf level of an index consists of these pages, in this order:

 47 22 83 32 12 90 64

The first step is to find the first physical page, which is 12, and the first logical page, which is 47. These pages are then swapped, using a temporary buffer as a holding area. After the first swap, the leaf level looks like this:

 12 22 83 32 47 90 64

The next physical page is 22, which is the same as the next logical page, so no work is done. The next physical page is 32, which is swapped with the next logical page, 83, to look like this:

 12 22 32 83 47 90 64

After the next swap of 47 with 83, the leaf level looks like this:

 12 22 32 47 83 90 64

The 64 is swapped with 83:

 12 22 32 47 64 90 83

And finally, 83 and 90 are swapped:

 12 22 32 47 64 83 90

Keep in mind that DBCC INDEXDEFRAG uses only pages that are already part of the index leaf level. No new pages are allocated. In addition, the defragmenting can take quite a while on a large table. You get a report every five minutes on the estimated percentage completed.