What Is an Index?

3 4

As mentioned, an index is an auxiliary data structure used by SQL Server to access data. Depending on its type, an index is stored with the data or separate from the data. Regardless of type, all indexes work in the same basic way, which you'll learn about in this section.

In systems without indexes, all data retrieval must be done by using table scans. In a table scan, all of the data in a table must be read and compared with the requested data. Table scans are generally avoided because of the amount of I/O that is generated by this operation—scanning large tables could take a long time and eat up a lot of system resources. By using an index, you can greatly reduce the number of I/O operations, speeding up access to data as well as freeing up system resources for other operations.

A database index is organized in a B-tree structure. Each page in an index is called an index page or an index node. The index structure begins with a root node at the top level. The root node marks the beginning of the index; it is the first data accessed when a data lookup occurs. The root node contains a number of index rows. These index rows contain a key value and a pointer to an index page (called a branch node), as illustrated in Figure 17-1. This configuration is necessary because in an average-size data table, an index consists of thousands or millions of index pages. By starting at the root node and traversing the branch nodes, SQL Server can zoom in on the data you want.

click to view at full size.

Figure 17-1. Root node and branch nodes.

Using a book as an analogy, an index works something like this: suppose the index started with a page that listed the numbers of the pages on which entries beginning with "a," "b," "c," and so on began. Then suppose those pages contained page numbers for entries in the ranges aa-ab, ac-ad, ae-af, and so on, and those pages contained pointers to entries in the ranges aaa-aab, aac-aad, aae-aaf, and so forth. With this arrangement, you could find what you are looking for quickly, with relatively few lookups. This is similar to a database table index, with the first page being the root node.

Like the root node, each branch node contains a number of index rows held in an index page. Each index row points to another branch node or a leaf node, as illustrated in Figure 17-2. The leaf nodes make up the last level of an index. Unlike the root node, each branch node also contains a linked list to other branch nodes on the same level. In other words, the node knows about adjacent nodes as well as lower nodes.

As the name "B-tree" implies, the branch nodes expand out from the root node in treelike fashion. Each group of branch nodes at the same level of the tree structure is known as an index level, as illustrated in Figure 17-3. The number of I/O operations that are needed to reach the leaf nodes (the nodes at the lowest level in the tree) is dependent on the number of index levels. If the database table contains only a small amount of data, the root node can point directly to the leaf nodes, and the index need not contain any branch nodes at all (an unlikely situation).

click to view at full size.

Figure 17-2. A search tree showing branch nodes and leaf nodes.

click to view at full size.

Figure 17-3. Index levels.

In a nonclustered index, the leaf node contains the key value and either a Row ID pointing to the desired row in the table or the clustered-index key if there is also a clustered index on the table. And in a clustered index, the data itself is in the leaf node. (Clustered and nonclustered indexes will be described in the section "Types of Indexes" later in this chapter.) The number of rows in a leaf node depends on the size of the index entries and, in the case of a clustered index, the size of the data.

NOTE


A Row ID is a pointer that SQL Server builds automatically from the File ID, the page number, and the data row number. With a Row ID, you can retrieve data with only one additional I/O operation. Because you know which page to retrieve and SQL Server knows where that page resides, the page is read into memory with a single I/O request. The simplicity of this process is the reason why indexes are so efficient at retrieving data and provide such a great performance enhancement.

Keep in mind that because the index is created in sorted order, any changes to the data might cause additional overhead. For example, if an insert causes a new index row to be inserted into a leaf node that is full, SQL Server must make room for the new row. It does so by moving approximately half of the rows in the leaf node to another page. This data movement is called a page split. A page split on one level of a tree can cause cascading splits up the tree. Page splits can be avoided by carefully tuning the fill factor, as described in the section "Using Fill Factor to Avoid Page Splits" later in this chapter.



Microsoft SQL Server 2000 Administrator's Companion
Microsoft SQL Server 2000 Administrators Companion
ISBN: B001HC0RPI
EAN: N/A
Year: 2005
Pages: 264

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net