Index Key Terms and Concepts

< Day Day Up >

Before proceeding into a discussion of optimal indexing, this section quickly discusses several crucial key terms and structures. To make the illustrations as clear as possible, you can first create, populate, and index a very simple table:

 CREATE TABLE sample_names (     id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,     last_name CHAR(30) NOT NULL,     age SMALLINT NOT NULL,     INDEX (last_name) );

Now, Figure 7.1 looks at proceedinga simplified diagram showing the structure of the last_name index.

Figure 7.1. Structure of the B-tree index on the `last_name` column.

Figure 7.1 represents a B-tree index, which is the index format MySQL uses for all of its data types (including those that may be used in FULLTEXT indexes) and storage engines, with two exceptions:

The MEMORY (also known as HEAP) storage engine gives you the option to use hash indexes instead. These very fast indexes are discussed a little later.
The spatial data types use a specialized version of the B-tree index, known as an R-tree index.

You can gain several very important insights by examining Figure 7.1:

In the case of the MyISAM storage engine, MySQL separates its data from its indexes. That is, the contents of the indexes are stored in a different physical location than the actual data. In certain situations, you might place indexes and data on different disk drives, which might greatly help performance. This possibility is discussed later.
If you run a query that retrieves last_name only, MySQL can get everything it needs just by reading this index.
Typically, some free space is available within both the index and data pages. In addition, MySQL offers you the option to compress keys: For clarity's sake, this is omitted from this diagram.
The leftmost structures are nonleaf nodes. They can contain pointers to either other nonleaf nodes or leaf nodes. A leaf node is the lowest-level node, and contains the pointers to the actual data.
This index is relatively shallow: It doesn't have that many levels of nonleaf nodes. Figure 7.2 shows a deeper index.
Figure 7.2. A deeper B-tree index, containing more levels of nonleaf nodes.

In the previous examples, the data is not stored in the order of the last_name index. However, it is stored in primary key order, so take a look at Figure 7.3, a diagram of the index on id.

Figure 7.3. A B-tree index with data stored in key order.

As you can see, the actual data is stored in id order. That is, each page contains a group of rows that have nothing in common other than each row's id number is one less than the one that preceded it. The pages themselves are sequenced by the values of their rows' id field. A little later, you'll see how you can change the physical order of your data.

Next, suppose that you used the MEMORY storage engine for this table and requested a hash index:

 CREATE TABLE sample_names (     id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,     last_name CHAR(30) NOT NULL,     age SMALLINT NOT NULL,     INDEX USING HASH (last_name) ) ENGINE = MEMORY;

Hash indexes provide very fast access to information but are a little harder to visualize (see Figure 7.4).

Figure 7.4. Hash index structure.

Without going into deep technical detail on the algorithms used to build these structures, understand that when you generate the index or insert a new row into the table, MySQL maps the index key into a hash bucket (which is a memory structure, not a container for breakfast or recreational drugs). The hash bucket then contains pointers to the data itself. Depending on the amount of available memory and other factors, MySQL generates a number of these hash buckets. Internal algorithms then determine which entries are then placed into the buckets. Generally, these algorithms have little to do with the data itself, but are instead focused on implementing a round-robin or other distribution strategy so that no particular bucket gets overloaded.

Notice how the data itself is stored in the order of the primary key (id), just as it would have been for a B-tree.

When it comes time to find the row (during a query or other operation), MySQL takes the lookup criteria, runs it through the hashing function, locates the correct hash bucket, finds the entry, and then retrieves the row.

For added speed and flexibility, MySQL uses dynamic hashing. However, as you can see from the diagram, there is no correlation between the order of entries in the hash index and their physical location on disk. This means that a hash index can't increase the performance of an ORDER BY.

< Day Day Up >

Figure 7.1. Structure of the B-tree index on the last_name column.

Figure 7.2. A deeper B-tree index, containing more levels of nonleaf nodes.

Figure 7.3. A B-tree index with data stored in key order.

Figure 7.4. Hash index structure.

Figure 7.1. Structure of the B-tree index on the `last_name` column.