22.3. General Query Enhancement

This section discusses some general techniques that you can apply on a case-by-case basis to individual queries.

22.3.1. Query Rewriting Techniques

The way you write a query often affects how well indexes are used. Use the following principles to make your queries more efficient:

Don't refer to an indexed column within an expression that must be evaluated for every row in the table. Doing so prevents use of the index. Instead, isolate the column onto one side of a comparison when possible. Suppose that a table t contains a DATE column d that is indexed. One way to select rows containing date values from the year 1994 and up is as follows:
```
 SELECT * FROM t WHERE YEAR >= 1994; 
```
In this case, the value of YEAR must be evaluated for every row in the table, so the index cannot be used. Instead, write the query like this:
```
 SELECT * FROM t WHERE d >= '1994-01-01'; 
```
In the rewritten expression, the indexed column stands by itself on one side of the comparison and MySQL can apply the index to optimize the query.
For situations like this, the EXPLAIN statement is useful for verifying that one way of writing a query is better than another. Section 22.3.2, "Using EXPLAIN to Obtain Optimizer Information," demonstrates why this is so.
Indexes are particularly beneficial for joins that compare columns from two tables. Consider the following join:
```
 SELECT * FROM Country, CountryLanguage WHERE Country.Code = CountryLanguage.CountryCode; 
```
If neither the Code nor CountryCode column is indexed, every pair of column values must be compared to find those pairs that are equal. For example, for each Code value from the Country table, MySQL would have to compare it with every CountryCode value from the CountryLanguage table. If instead CountryCode is indexed, then for each Code value that MySQL retrieves, it can use the index on CountryCode to quickly look up the rows with matching values. (In practice, you'd normally index both of the joined columns when you use inner joins because the optimizer might process the tables in either order.)
When comparing an indexed column to a value, use a value that has the same data type as the column. For example, you can look for rows containing a numeric id value of 18 with either of the following WHERE clauses:
```
 WHERE id = 18 WHERE id = '18' 
```
MySQL will produce the same result either way, even though the value is specified as a number in one case and as a string in the other case. However, for the string value, MySQL must perform a string-to-number conversion, which might cause an index on the id column not to be used.
In certain cases, MySQL can use an index for pattern-matching operations performed with the LIKE operator. This is true if the pattern begins with a literal prefix value rather than with a wildcard character. An index on a name column can be used for a pattern match like this:
```
 WHERE name LIKE 'de%' 
```
That's because the pattern match is logically equivalent to a search for a range of values:
```
 WHERE name >= 'de' AND name < 'df' 
```
On the other hand, the following pattern makes LIKE more difficult for the optimizer:
```
 WHERE name LIKE '%de%' 
```
When a pattern starts with a wildcard character as just shown, MySQL cannot make efficient use of any indexes associated with that column. (That is, even if an index is used, the entire index must be scanned.)

22.3.2. Using `EXPLAIN` to Obtain Optimizer Information

In the preceding section, an example is shown in which it is stated that of the following two queries, the second can be executed more efficiently because the optimizer can use an index on the column d:

 SELECT * FROM t WHERE YEAR >= 1994; SELECT * FROM t WHERE d >= '1994-01-01';

To verify whether MySQL actually will use an index to process the second query, use the EXPLAIN statement to get information from the optimizer about the execution plans it would use. For the two date-selection queries just shown, you might find that EXPLAIN tells you something like this:

 mysql> EXPLAIN SELECT * FROM t WHERE YEAR >= 1994\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: t          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: 867038         Extra: Using where mysql> EXPLAIN SELECT * FROM t WHERE d >= '1994-01-01'\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: t          type: range possible_keys: d           key: d       key_len: 4           ref: NULL          rows: 70968         Extra: Using where

These results indicate that the second query is indeed better from the optimizer's point of view. MySQL can perform a range scan using the index for the column d, drastically reducing the number of rows that need to be examined. (The rows value drops from 867,038 to 70,968.)

The use of EXPLAIN is not covered further here. See Section 37.2, "Using EXPLAIN to Analyze Queries," for more information.

22.3.3. Optimizing Queries by Limiting Output

Some optimizations can be done independently of whether indexes are used. A simple but effective technique is to reduce the amount of output a query produces.

One way to eliminate unnecessary output is by using a LIMIT clause. If you don't need the entire result set, specify how many rows the server should return by including LIMIT in your query. This helps in two ways:

Less information need be returned over the network to the client.
In many cases, LIMIT allows the server to terminate query processing earlier than it would otherwise. Some row-sorting techniques have the property that the first n rows can be known to be in the final order even before the sort has been done completely. This means that when LIMIT n is combined with ORDER BY, the server might be able to determine the first n rows and then terminate the sort operation early.

Don't use LIMIT as a way to pull out just a few rows from a gigantic result set. For example, if a table has millions of rows, the following statement does not become efficient simply because it uses LIMIT:

 SELECT * FROM t LIMIT 10;

Instead, try to use a WHERE clause that restricts the result so that the server doesn't retrieve as many rows in the first place.

Another way to reduce query output is to limit it "horizontally." Select only the columns you need, rather than using SELECT * to retrieve all columns. Suppose that you want information about countries having names that begin with 'M'. The following query produces that information, but it also produces every other column as well:

 SELECT * FROM Country WHERE Name LIKE 'M%';

If all you really want to know is the country names, don't write the query like that. Most of the information retrieved will be irrelevant to what you want to know, resulting in unnecessary server effort and network traffic. Instead, select specifically just the Name column:

 SELECT Name FROM Country WHERE Name LIKE 'M%';

The second query is faster because MySQL has to return less information when you select just one column rather than all of them.

In addition, if an index on Name exists, you get even more improvement for two reasons:

The index can be used to determine quickly which Name values satisfy the condition in the WHERE clause. This is faster than scanning the entire table.
Depending on the storage engine, the server might not read the table rows at all. If the values requested by the query are in the index, then by reading the index MySQL already has the information that the client requested. For example, the MyISAM engine can read the index file to determine which values satisfy the query, and then return them to the client without reading the data file at all. Doing so is faster than reading both the index file and the data file.

22.3.4. Using Summary Tables

Suppose that you run an analysis consisting of a set of retrievals that each perform a complex SELECT of a set of records (perhaps using an expensive join), and that differ only in the way they summarize the records. That's inefficient because it unnecessarily does the work of selecting the records repeatedly. A better technique is to select the records once, and then use them to generate the summaries. In such a situation, consider the following strategy:

Select the set of to-be-summarized records into a temporary table. In MySQL, you can do this easily with a CREATE TABLE … SELECT statement. If the summary table is needed only for the duration of a single client connection, you can use CREATE TEMPORARY TABLE … SELECT and the table will be dropped automatically when you disconnect.
Create any appropriate indexes on the temporary table.
Select the summaries using the temporary table.

The technique of using a summary table has several benefits:

Calculating the summary information a single time reduces the overall computational burden by eliminating most of the repetition involved in performing the initial record selection.
If the original table is a type that is subject to table-level locking, such as a MyISAM table, using a summary table leaves the original table available more of the time for updates by other clients by reducing the amount of time that the table remains locked.
If the summary table is small enough that it's reasonable to hold in memory, you can increase performance even more by making it a MEMORY table. Queries on the table will be especially fast because they require no disk I/O. When the MEMORY table no longer is needed, drop it to free the memory allocated for it.

The following example creates a summary table containing the average GNP value of countries in each continent. Then it compares the summary information to individual countries to find those countries with a GNP much less than the average and much more than the average.

First, create the summary table:

 mysql> CREATE TABLE ContinentGNP     -> SELECT Continent, AVG(GNP) AS AvgGNP     -> FROM Country GROUP BY Continent; mysql> SELECT * FROM ContinentGNP; +---------------+---------------+ | Continent     | AvgGNP        | +---------------+---------------+ | Asia          | 150105.725490 | | Europe        | 206497.065217 | | North America | 261854.789189 | | Africa        |  10006.465517 | | Oceania       |  14991.953571 | | Antarctica    |      0.000000 | | South America | 107991.000000 | +---------------+---------------+

Next, compare the summary table to the original table to find countries that have a GNP less than 1% of the continental average:

 mysql> SELECT     ->     Country.Continent, Country.Name,     ->     Country.GNP AS CountryGNP,     ->     ContinentGNP.AvgGNP AS ContinentAvgGNP     -> FROM Country, ContinentGNP     -> WHERE     ->     Country.Continent = ContinentGNP.Continent     ->     AND Country.GNP < ContinentGNP.AvgGNP * .01     -> ORDER BY Country.Continent, Country.Name; +-----------+---------------+------------+-----------------+ | Continent | Name          | CountryGNP | ContinentAvgGNP | +-----------+---------------+------------+-----------------+ | Asia      | Bhutan        |     372.00 |   150105.725490 | | Asia      | East Timor    |       0.00 |   150105.725490 | | Asia      | Laos          |    1292.00 |   150105.725490 | | Asia      | Maldives      |     199.00 |   150105.725490 | | Asia      | Mongolia      |    1043.00 |   150105.725490 | | Europe    | Andorra       |    1630.00 |   206497.065217 | | Europe    | Faroe Islands |       0.00 |   206497.065217 | | Europe    | Gibraltar     |     258.00 |   206497.065217 | ...

Use the summary table again to find countries that have a GNP more than 10 times the continental average:

 mysql> SELECT     ->     Country.Continent, Country.Name,     ->     Country.GNP AS CountryGNP,     ->     ContinentGNP.AvgGNP AS ContinentAvgGNP     -> FROM Country, ContinentGNP     -> WHERE     ->     Country.Continent = ContinentGNP.Continent     ->     AND Country.GNP > ContinentGNP.AvgGNP * 10     -> ORDER BY Country.Continent, Country.Name; +---------------+---------------+------------+-----------------+ | Continent     | Name          | CountryGNP | ContinentAvgGNP | +---------------+---------------+------------+-----------------+ | Asia          | Japan         | 3787042.00 |   150105.725490 | | Europe        | Germany       | 2133367.00 |   206497.065217 | | North America | United States | 8510700.00 |   261854.789189 | | Africa        | South Africa  |  116729.00 |    10006.465517 | | Oceania       | Australia     |  351182.00 |    14991.953571 | +---------------+---------------+------------+-----------------+

Use of summary tables has the disadvantage that the records they contain are up to date only as long as the original values remain unchanged, and thus so are any summaries calculated from them. If the original table rarely or never changes, this might be only a minor concern. For many applications, summaries that are close approximations are sufficiently accurate.

The summary table technique can be applied at multiple levels. Create a summary table that holds the results of an initial summary, and then summarize that table in different ways to produce secondary summaries. This avoids the computational expense of generating the initial summary repeatedly.

When a summary consists of a single value, you need not create a table at all. If you assign the value to a user variable, you can use the variable for comparison purposes in subsequent queries without having to calculate the value again.

22.3.5. Optimizing Updates

The optimizations discussed so far have been shown for SELECT statements, but optimization techniques can be used for statements that update tables, too:

For a DELETE or UPDATE statement that uses a WHERE clause, try to write it in a way that allows an index to be used for determining which rows to delete or update. The techniques for this that were discussed earlier for SELECT statements apply to DELETE and UPDATE as well.
EXPLAIN is used with SELECT queries, but you might also find it helpful for analyzing UPDATE and DELETE statements. Write a SELECT that has the same WHERE clause as the UPDATE or DELETE and analyze that.
Use multiple-row INSERT statements instead of multiple single-row INSERT statements. For example, instead of using three single-row statements like this:
```
 mysql> INSERT INTO t (id, name) VALUES(1,'Bea'); mysql> INSERT INTO t (id, name) VALUES(2,'Belle'); mysql> INSERT INTO t (id, name) VALUES(3,'Bernice'); 
```
You could use a single multiple-row statement that does the same thing:
```
 mysql> INSERT INTO t (id, name) VALUES(1,'Bea'),(2,'Belle'),(3,'Bernice'); 
```
The multiple-row statement is shorter, which is less information to send to the server. More important, it allows the server to perform all the updates at once and flush the index a single time, rather than flushing it after each of the individual inserts. This optimization can be used with any storage engine.
If you're using an InnoDB table, you can get better performance even for single-row statements by grouping them within a transaction rather than by executing them with autocommit mode enabled:
```
 mysql> START TRANSACTION; mysql> INSERT INTO t (id, name) VALUES(1,'Bea'); mysql> INSERT INTO t (id, name) VALUES(2,'Belle'); mysql> INSERT INTO t (id, name) VALUES(3,'Bernice'); mysql> COMMIT; 
```
Using a transaction allows InnoDB to flush all the changes at commit time. In autocommit mode, InnoDB flushes the changes for each INSERT individually.
For any storage engine, LOAD DATA INFILE is even faster than multiple-row INSERT statements.
You can disable index updating when loading data into an empty MyISAM table to speed up the operation. LOAD DATA INFILE does this automatically for non-unique indexes if the table is empty; it disables index updating before loading and enables it again after loading.
To replace existing rows, use REPLACE rather than DELETE plus INSERT.

22.3.1. Query Rewriting Techniques

22.3.2. Using EXPLAIN to Obtain Optimizer Information

22.3.3. Optimizing Queries by Limiting Output

22.3.4. Using Summary Tables

22.3.5. Optimizing Updates

22.3.2. Using `EXPLAIN` to Obtain Optimizer Information