22.2. Using Indexes for Optimization | MySQL 5.0 Certification Study Guide

When you create a table, consider whether it should have indexes, because they have important benefits:

Indexes contain sorted values. This allows MySQL to find rows containing particular values faster. The effect can be particularly dramatic for joins, which have the potential to require many combinations of rows to be examined.
Indexes result in less disk I/O. The server can use an index to go directly to the relevant table records, which reduces the number of records it needs to read. Furthermore, if a query displays information only from indexed columns, MySQL might be able to process it by reading only the indexes and without accessing data rows at all.

22.2.1. Types of Indexes

MySQL supports several types of indexes:

A PRIMARY KEY is a unique-valued index. That is, every key value is required to be different from all others. Every value must be non-NULL.
A UNIQUE index is unique-valued, like a PRIMARY KEY, but it can be defined to allow NULL values. If so, NULL is an exception to uniqueness because NULL values may occur multiple times.
A non-unique index is one in which any key value may occur multiple times. This type of index is defined with the keyword INDEX or KEY.
A FULLTEXT index is specially designed for text searching.
A SPATIAL index can be used with the spatial data types.

To define indexes when you're initially creating a table, use CREATE TABLE. To add indexes to an already existing table, use ALTER TABLE or CREATE INDEX.

Index creation using the INDEX, UNIQUE, and PRIMARY KEY keywords is discussed in Section 8.6, "Indexes." FULLTEXT indexes provide a means for optimizing text searching in MyISAM tables. They are not covered here, but you can read about them in Section 38.3.3, "FULLTEXT Indexes." Spatial indexes are not covered in this study guide or on the exam.

22.2.2. Principles for Index Creation

An index helps MySQL perform retrievals more quickly than if no index is used, but indexes can be used with varying degrees of success. Keep the following index-related considerations in mind when designing tables:

Declare an indexed column NOT NULL if possible. Although NULL values can be indexed, NULL is a special value that requires additional decisions by the server when performing comparisons on key values. An index without NULL can be processed more simply and thus faster.
Avoid over indexing; don't index a column just because you can. If you never refer to a column in comparisons (such as in WHERE, ORDER BY, or GROUP BY clauses), there's no need to index it.
Another reason to avoid unnecessary indexing is that every index you create slows down table updates. If you insert a row, an entry must be added to each of the table's indexes. Indexes help when looking up values for UPDATE or DELETE statements, but any change to indexed columns require the appropriate indexes to be updated as well.
One strategy the MySQL optimizer uses is that if it estimates that an index will return a large percentage of the records in the table, it will be just as fast to scan the table as to incur the overhead required to process the index. As a consequence, an index on a column that has very few distinct values is unlikely to do much good. Suppose that a column is declared as ENUM('Y','N') and the values are roughly evenly distributed such that a search for either value returns about half of the records. In this case, an index on the column is unlikely to result in faster queries.
Choose unique and non-unique indexes appropriately. The choice might be influenced by the data type of a column. If the column is declared as an ENUM, the number of distinct column values that can be stored in it is fixed. This number is equal to the number of enumeration elements, plus one for the '' (empty string) element that is used when you attempt to store an illegal value. Should you choose to index an ENUM column, you likely should create a non-unique index. A PRIMARY KEY would allow only as many rows as the number of distinct enumeration values. A UNIQUE index enforces a similar restriction, except that unless the column is declared NOT NULL, the index allows NULL values.
Index a column prefix rather than the entire column. MySQL caches index information in memory whenever possible to avoid reading it from disk repeatedly. Shortening the length of key values can improve performance by reducing the amount of disk I/O needed to read the index and by increasing the number of key values that fit into the key cache. This technique is discussed in Section 22.2.3, "Indexing Column Prefixes."
Avoid creating multiple indexes that overlap (have the same initial columns). This is wasteful because MySQL can use a multiple-column index even when a query uses just the initial columns for lookups. For more information, see Section 22.2.4, "Leftmost Index Prefixes."
The index creation process itself can be optimized if you are creating more than one index for a given table. ALTER TABLE can add several indexes in the same statement, which is faster than processing each one separately. CREATE INDEX allows only one index to be added or dropped at a time.

For indexed MyISAM or InnoDB tables, keeping the internal index statistics up to date helps the query optimizer process queries more efficiently. You can update the statistics with the ANALYZE TABLE statement. See Section 38.3.2, "Keep Optimizer Information Up to Date."

22.2.3. Indexing Column Prefixes

Short index values can be processed more quickly than long ones. Therefore, when you index a column, ask whether it's sufficient to index partial column values rather than complete values. This technique of indexing a column prefix can be applied to string data types.

Suppose that you're considering creating a table using this definition:

 CREATE TABLE t (     name CHAR(255),     INDEX (name) );

If you index all 255 characters of the values in the name column, index processing will be relatively slow:

It's necessary to read more information from disk.
Longer values take longer to compare.
The index cache is not as effective because fewer key values fit into it at a time.

It's often possible to overcome these problems by indexing only a prefix of the column values. For example, if you expect column values to be distinct most of the time in the first 15 characters, index only that many characters of each value, not all 255 characters.

To specify a prefix length for a column, follow the column name in the index definition by a number in parentheses. The following table definition is the same as the previous one, except that key values in the index use only the first 15 characters of the column values:

 CREATE TABLE t (     name CHAR(255),     INDEX (name(15)) );

Indexing a column prefix can speed up query processing, but works best when the prefix values tend to have about the same amount of uniqueness as the original values. Don't use such a short prefix that you produce a very high frequency of duplicate values in the index. It might require some testing to find the optimal balance between long index values that provide good uniqueness versus shorter values that compare more quickly but have more duplicates. To determine the number of records in the table, the number of distinct values in the column, and the number of duplicates, use this query:

 SELECT     COUNT(*) AS 'Total Rows',     COUNT(DISTINCT name) AS 'Distinct Values',     COUNT(*) - COUNT(DISTINCT name) AS 'Duplicate Values' FROM t;

The query gives you an estimate of the amount of uniqueness in the name values. Then run a similar query on the prefix values:

 SELECT     COUNT(DISTINCT LEFT(name,n)) AS 'Distinct Prefix Values',     COUNT(*) - COUNT(DISTINCT LEFT(name,n)) AS 'Duplicate Prefix Values' FROM t;

That tells you how the uniqueness characteristics change when you use an n-character prefix of the name values. Run the query with different values of n to determine an acceptable prefix length.

If an index on a full column is a PRIMARY KEY or UNIQUE index, you'll probably have to change the index to be non-unique if you decide to index prefix values instead. When you index partial column values, it's more likely that the prefix values will contain duplicates.

22.2.4. Leftmost Index Prefixes

In a table that has a composite (multiple-column) index, MySQL can use leftmost index prefixes of that index. A leftmost prefix of a composite index consists of one or more of the initial columns of the index. MySQL's capability to use leftmost index prefixes enables you to avoid creating unnecessary (redundant) indexes.

The CountryLanguage table in the world database provides an example of how a leftmost prefix applies. The table has a two-part primary key on the CountryCode and Language columns:

 mysql> SHOW INDEX FROM CountryLanguage\G *************************** 1. row ***************************        Table: CountryLanguage   Non_unique: 0     Key_name: PRIMARY Seq_in_index: 1  Column_name: CountryCode    Collation: A  Cardinality: NULL     Sub_part: NULL       Packed: NULL         Null:    Index_type: BTREE      Comment:  *************************** 2. row ***************************        Table: CountryLanguage   Non_unique: 0     Key_name: PRIMARY Seq_in_index: 2  Column_name: Language    Collation: A  Cardinality: 984     Sub_part: NULL       Packed: NULL         Null:    Index_type: BTREE      Comment:

The index on the CountryCode and Language columns allows records to be looked up quickly based on a given country name and language. However, MySQL also can use the index when given only a country code. Suppose that you want to determine which languages are spoken in France:

 SELECT * FROM CountryLanguage WHERE CountryCode = 'FRA';

MySQL can see that CountryCode is a leftmost prefix of the primary key and use it as though it were a separate index. This means there's no need to define a second index on the CountryCode column alone.

On the other hand, if you want to perform indexed searches using just the Language column of the CountryLanguage table, you do need to create a separate index because Language is not a leftmost prefix of the existing index.

Note that a leftmost prefix of an index and an index on a column prefix are two different things. A leftmost prefix of an index consists of leading columns in a multiple-column index. An index on a column prefix indexes the leading characters of values in the column.