37.2. Using EXPLAIN to Analyze Queries


37.2. Using EXPLAIN to Analyze Queries

When a SELECT query does not run as quickly as you think it should, use the EXPLAIN statement to ask the MySQL server for information about how the query optimizer processes the query. This information is useful in several ways:

  • EXPLAIN can provide information that points out the need to add an index.

  • If a table already has indexes, you can use EXPLAIN to find out whether the optimizer is using them.

  • If indexes exist but aren't being used, you can try writing a query different ways. EXPLAIN can tell you whether the rewrites are better for helping the server use the available indexes.

When using EXPLAIN to analyze a query, it's helpful to have a good understanding of the tables involved. If you need to determine a table's structure, remember that you can use DESCRIBE to obtain information about a table's columns, and SHOW INDEX for information about its indexes. (See Section 8.8, "Obtaining Table and Index Metadata.")

EXPLAIN works with SELECT queries, but can be used in an indirect way for UPDATE and DELETE statements as well: Write a SELECT statement that has the same WHERE clause as the UPDATE or DELETE and use EXPLAIN to analyze the SELECT.

The following discussion describes how EXPLAIN works. Section 22.3, "General Query Enhancement," discusses some general query-writing principles that help MySQL use indexes more effectively. You can apply those principles in conjunction with EXPLAIN to determine the best way of writing a query.

37.2.1. How EXPLAIN Works

To use EXPLAIN, write your SELECT query as you normally would, but place the keyword EXPLAIN in front of it. As a very simple example, take the following statement:

 SELECT 1; 

To see what EXPLAIN will do with it, issue the statement like this:

 mysql> EXPLAIN SELECT 1; +----------------+ | Comment        | +----------------+ | No tables used | +----------------+ 

In practice, it's unlikely that you'd use EXPLAIN very often for a query like that because the output tells you nothing interesting about optimization. Nevertheless, the example illustrates the important principle that EXPLAIN can be applied to any SELECT query. One of the implications of this principle is that you can use EXPLAIN with simple queries while you're learning how to use it and how to interpret its results. You don't have to begin with a complicated multiple-table join.

With that in mind, consider these two simple single-table queries:

 SELECT * FROM Country WHERE Name = 'France'; SELECT * FROM Country WHERE Code = 'FRA'; 

Both queries produce the same output (information about the country of France), but they are not equally efficient. How do you know that? Because EXPLAIN tells you so. When you use EXPLAIN with each of the two queries, It provides the following information about how the MySQL optimizer views them:

 mysql> EXPLAIN SELECT * FROM Country WHERE Name = 'France'\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: Country          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: 239         Extra: Using where mysql> EXPLAIN SELECT * FROM Country WHERE Code = 'FRA'\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: Country          type: const possible_keys: PRIMARY           key: PRIMARY       key_len: 3           ref: const          rows: 1         Extra:  

EXPLAIN produces several columns of information. In the example just shown, NULL in the possible_keys and key columns shows for the first query that no index is considered available or usable for processing the query. For the second query, the table's PRIMARY KEY column (the Code column that contains three-letter country codes) can be used, and is in fact the index that the optimizer would choose. The rows column of the EXPLAIN output shows the effect of this difference. Its value indicates the number of rows that MySQL estimates it will need to examine while processing the query:

  • For the first query, the value is 239, which happens to be the number of rows in the Country table. In other words, MySQL would scan all rows of the table, which is inefficient.

  • For the second query, only one row need be examined. This is because MySQL can use the table's primary key to go directly to the single relevant row.

This example briefly indicates the kind of useful information that EXPLAIN can provide, even for simple queries. The conclusion to draw is that, if possible, you should use the Code column rather than the Name column to look up Country table records. However, the real power of EXPLAIN lies in what it can tell you about joins SELECT queries that use multiple tables.

EXPLAIN is especially important for join analysis because joins have such enormous potential to increase the amount of processing the server must do. If you select from a table with a thousand rows, the server might need to scan all one thousand rows in the worst case. But if you perform a join between two tables with a thousand rows each, the server might need to examine every possible combination of rows, which is one million combinations. That's a much worse worst case. EXPLAIN can help you reduce the work the server must do to process such a query, so it's well worth using.

37.2.2. Analyzing a Query

The following example demonstrates how to use EXPLAIN to analyze and optimize a sample query. The purpose of the query is to answer the question, "Which cities have a population of more than 8 million?" and to display for each city its name and population, along with the country name. This question could be answered using only city information, except that to get each country's name rather than its code, city information must be joined to country information.

The example uses tables created from world database information. Initially, these tables will have no indexes, so EXPLAIN will show that the query is not optimal. The example then adds indexes and uses EXPLAIN to determine the effect of indexing on query performance.

Begin by creating the initial tables, CountryList and CityList, as follows. These are derived from the Country and City tables, but need contain only the columns involved in the query:

 mysql> CREATE TABLE CountryList ENGINE = MyISAM     -> SELECT Code, Name FROM Country; Query OK, 239 rows affected (0.06 sec) Records: 239  Duplicates: 0  Warnings: 0 mysql> CREATE TABLE CityList ENGINE = MyISAM     -> SELECT CountryCode, Name, Population FROM City; Query OK, 4079 rows affected (0.10 sec) Records: 4079  Duplicates: 0  Warnings: 0 

The query that retrieves the desired information in the required format looks like this:

 mysql> SELECT CountryList.Name, CityList.Name, CityList.Population     -> FROM CountryList, CityList     -> WHERE CountryList.Code = CityList.CountryCode     -> AND CityList.Population > 8000000; +--------------------+------------------+------------+ | Name               | Name             | Population | +--------------------+------------------+------------+ | Brazil             | São Paulo        |    9968485 | | Indonesia          | Jakarta          |    9604900 | | India              | Mumbai (Bombay)  |   10500000 | | China              | Shanghai         |    9696300 | | South Korea        | Seoul            |    9981619 | | Mexico             | Ciudad de México |    8591309 | | Pakistan           | Karachi          |    9269265 | | Turkey             | Istanbul         |    8787958 | | Russian Federation | Moscow           |    8389200 | | United States      | New York         |    8008278 | +--------------------+------------------+------------+ 10 rows in set (0.03 sec) 

While the tables are in their initial unindexed state, applying EXPLAIN to the query yields the following result:

 mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population     -> FROM CountryList, CityList     -> WHERE CountryList.Code = CityList.CountryCode     -> AND CityList.Population > 8000000\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: CountryList          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: 239         Extra:  *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: CityList          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: 4079         Extra: Using where 

The information displayed by EXPLAIN shows that no optimizations could be made:

  • The type value in each row shows how MySQL will read the corresponding table. For CountryList, the value of ALL indicates a full scan of all rows. For CityList, the value of ALL indicates a scan of all its rows to find a match for each CountryList row. In other words, all combinations of rows will be checked to find country code matches between the two tables.

  • The number of row combinations is given by the product of the rows values, where rows represents the optimizer's estimate of how many rows in a table it will need to check at each stage of the join. In this case, the product is 239 x 4,079, or 974,881.

EXPLAIN shows that MySQL would need to check nearly a million row combinations to produce a query result that contains only 10 rows. Clearly, this query would benefit from the creation of indexes that allow the server to look up information faster.

Good columns to index are those that you typically use for searching, grouping, or sorting records. The query does not have any GROUP BY or ORDER BY clauses, but it does use columns for searching:

  • The query uses CountryList.Code and CityList.CountryCode to match records between tables.

  • The query uses CityList.Population to cull records that do not have a large enough population.

To see the effect of indexing, try creating indexes on the columns used to join the tables. In the CountryList table, Code can be used as a primary key because it uniquely identifies each row. Add the index using ALTER TABLE:

 mysql> ALTER TABLE CountryList ADD PRIMARY KEY (Code); 

In the CityList table, CountryCode must be a non-unique index because multiple cities can share the same country code:

 mysql> ALTER TABLE CityList ADD INDEX (CountryCode); 

After creating the indexes, EXPLAIN reports a somewhat different result:

 mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population     -> FROM CountryList, CityList     -> WHERE CountryList.Code = CityList.CountryCode     -> AND CityList.Population > 8000000\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: CityList          type: ALL possible_keys: CountryCode           key: NULL       key_len: NULL           ref: NULL          rows: 4079         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: CountryList          type: eq_ref possible_keys: PRIMARY           key: PRIMARY       key_len: 3           ref: world.CityList.CountryCode          rows: 1         Extra:  

Observe that EXPLAIN now lists the tables in a different order. CityList appears first, which indicates that MySQL will read rows from that table first and use them to search for matches in the second table, CountryList. The change in table processing order reflects the optimizer's use of the index information that is now available for executing the query.

MySQL still will scan all rows of the CityList table (its type value is ALL), but now the server can use each of those rows to directly look up the corresponding CountryList row. This is seen by the information displayed for the CountryList table:

  • The type value of eq_ref indicates that an equality test is performed by referring to the column named in the ref field, CityList.CountryCode.

  • The possible_keys value of PRIMARY shows that the optimizer sees the primary key as a candidate for optimizing the query, and the key field indicates that it will actually use the primary key when executing the query.

The result from EXPLAIN shows that indexing CountryList.Code as a primary key improves query performance. However, it still indicates a full scan of the CityList table. The optimizer sees that the index on CountryCode is available, but the key value of NULL indicates that it will not be used. Does that mean the index on the CountryCode column is of no value? It depends. For this query, the index is not used. In general, however, it's good to index joined columns, so you likely would find for other queries on the CityList table that the index does help.

The product of the rows now is just 4,079. That's much better than 974,881, but perhaps further improvement is possible. The WHERE clause of the query restricts CityList rows based on their Population values, so try creating an index on that column:

 mysql> ALTER TABLE CityList ADD INDEX (Population); 

After creating the index, run EXPLAIN again:

 mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population     -> FROM CountryList, CityList     -> WHERE CountryList.Code = CityList.CountryCode     -> AND CityList.Population > 8000000\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: CityList          type: range possible_keys: CountryCode,Population           key: Population       key_len: 4           ref: NULL          rows: 78         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: CountryList          type: eq_ref possible_keys: PRIMARY           key: PRIMARY       key_len: 3           ref: world.CityList.CountryCode          rows: 1         Extra:  

The output for the CountryList table is unchanged compared to the previous step. That is not a surprise; MySQL already found that it could use a primary key for lookups, which is very efficient. On the other hand, the result for the CityList table is different. The optimizer now sees two indexes in the table as candidates. Furthermore, the key value shows that it will use the index on Population to look up records. This results in an improvement over a full scan, as seen in the change of the rows value from 4,079 to 78.

The query now is optimized. Note that the product of the rows values, 78, still is larger than the actual number of rows produced by the query (10 rows). This is because the rows values are only estimates. The optimizer cannot give an exact count without actually executing the query.

To summarize:

  • With unindexed tables, the rows product was 974,881.

  • After indexing the join columns, the rows product dropped to 4,079, a 99.6% improvement.

  • After indexing the Population column, the rows product dropped to 78, a further improvement of 98.1% over the previous step.

The example shows that using indexes effectively can substantially reduce the work required by the server to execute a query, and that EXPLAIN is a useful tool for assessing the effect of indexing.

37.2.3. EXPLAIN Output Columns

The EXPLAIN statement produces one row of output for each table named in each SELECT of the analyzed statement. (A statement can have more than one SELECT if it uses subqueries or UNION.) To use EXPLAIN productively, it's important to know the meaning of the columns in each row of output:

  • id indicates which SELECT in the analyzed statement that the EXPLAIN output row refers to.

  • select_type categorizes the SELECT referred to by the output row. This column can have any of the values shown in the following table. The word DEPENDENT indicates that a subquery is correlated with the outer query.

    select_type Value

    Meaning

    SIMPLE

    Simple SELECT statement (no subqueries or unions)

    PRIMARY

    The outer SELECT

    UNION

    Second or later SELECT in a union

    DEPENDENT UNION

    Second or later SELECT in a union that is dependent on the outer query

    UNION RESULT

    Result of a union

    SUBQUERY

    First SELECT in a subquery

    DEPENDENT SUBQUERY

    First SELECT in a subquery that is dependent on the outer query

    DERIVED

    Subquery in the FROM clause


  • table is the name of the table to which the information in the row applies. The order of the tables indicates the order in which MySQL will read the tables to process the query. This is not necessarily the order in which you name them in the FROM clause, because the optimizer attempts to determine which order will result in the most efficient processing. The example in the preceding section showed this: The table order displayed by successive EXPLAIN statements changed as indexes were added.

  • type indicates the join type. The value is a measure of how efficiently MySQL can scan the table. The possible type values are described later in this section.

  • possible_keys indicates which of the table's indexes MySQL considers to be candidates for identifying rows that satisfy the query. This value can be a list of one or more index names, or NULL if there are no candidates. The word PRIMARY indicates that MySQL considers the table's primary key to be a candidate.

  • key indicates the optimizer's decision about which of the candidate indexes listed in possible_keys will yield most efficient query execution. If the key value is NULL, it means no index was chosen. This might happen either because there were no candidates or because the optimizer believes it will be just as fast to scan the table rows as to use any of the possible indexes. A table scan might be chosen over an index scan if the table is small, or because the index would yield too high a percentage of the rows in the table to be of much use.

  • key_len indicates how many bytes of index rows are used. From this value, you can derive how many columns from the index are used. For example, if you have an index consisting of three INT columns, each index row contains three 4-byte values. If key_len is 12, you know that the optimizer uses the all three columns of the index when processing the query. If key_len is 4 or 8, it uses only the first one or two columns (that is, it uses a leftmost prefix of the index).

    If you've indexed partial values of string columns, take that into account when assessing the key_len value. Suppose that you have a composite index on two CHAR(8) columns that indexes only the first 4 bytes of each column. In this case, a key_len value of 8 means that both columns of the index would be used, not just the first column.

  • ref indicates which indexed column or columns are used to choose rows from the table. const means key values in the index are compared to a constant expression, such as in Code='FRA'. NULL indicates that neither a constant nor another column is being used, indicating selection by an expression or range of values. It might also indicate that the column does not contain the value specified by the constant expression. If neither NULL nor const is displayed, a table_name.column_name combination will be shown, indicating that the optimizer is looking at column_name in the rows returned from table_name to identify rows for the current table.

  • rows is the optimizer's estimate of how many rows from the table it will need to examine. The value is an approximation because, in general, MySQL cannot know the exact number of rows without actually executing the query. For a multiple-table query, the product of the rows values is an estimate of the total number of row combinations that need to be read. This product gives you a rough measure of query performance. The smaller the value, the better.

  • Extra provides other information about the join. The possible values are described later in this section.

The value in the type column of EXPLAIN output indicates the join type, but joins may be performed with varying degrees of efficiency. The type value provides a measure of this efficiency by indicating the basis on which rows are selected from each table. The following list shows the possible values, from the best type to the worst:

  • system

    The table has exactly one row.

  • const

    The table has exactly one matching row. This type value is similar to system, except that the table may have other, non-matching rows. The EXPLAIN output from the query with WHERE Code='FRA' is an example of this:

     mysql> EXPLAIN SELECT * FROM Country WHERE Code = 'FRA'\G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: Country          type: const possible_keys: PRIMARY           key: PRIMARY       key_len: 3           ref: const          rows: 1         Extra:  

    The query has a type value of const because only one row out of all its rows need be read. If the table contained only the row for France, there would be no non-matching rows and the type value would be system rather than const.

    For both system and const, because only one row matches, any columns needed from it can be read once and treated as constants while processing the rest of the query.

  • eq_ref

    Exactly one row is read from the table for each combination of rows from the tables listed earlier by EXPLAIN. This is common for joins where MySQL can use a primary key to identify table rows.

  • ref

    Several rows may be read from the table for each combination of rows from the tables listed earlier by EXPLAIN. This is similar to eq_ref, but can occur when a non-unique index is used to identify table rows or when only a leftmost prefix of an index is used. For example, the CountryLanguage table has a primary key on the CountryCode and Language columns. If you search using only a CountryCode value, MySQL can use that column as a leftmost prefix, but there might be several rows for a country if multiple languages are spoken there.

  • ref_or_null

    Similar to ref, but MySQL also looks for rows that contain NULL.

  • index_merge

    MySQL uses an index merge algorithm.

  • unique_subquery

    Similar to ref, but used for IN subqueries that select from the primary key column of a single table.

  • index_subquery

    Similar to unique_subquery, but used for IN subqueries that select from an indexed column of a single table.

  • range

    The index is used to select rows that fall within a given range of index values. This is common for inequality comparisons such as id < 10.

  • index

    MySQL performs a full scan, but it scans the index rather than the data rows. An index scan is preferable: The index is sorted and index rows usually are shorter than data rows, so index rows can be read in order and more of them can be read at a time.

  • ALL

    A full table scan of all data rows. Typically, this indicates that no optimizations are done and represents the worst case. It is particularly unfortunate when tables listed later in EXPLAIN output have a join type of ALL because that indicates a table scan for every combination of rows selected from the tables processed earlier in the join.

The Extra column of EXPLAIN output provides additional information about how a table is processed. Some values indicate that the query is efficient:

  • Using index

    MySQL can optimize the query by reading values from the index without having to read the corresponding data rows. This optimization is possible when the query selects only columns that are in the index.

  • Where used

    MySQL uses a WHERE clause to identify rows that satisfy the query. Without a WHERE clause, you get all rows from the table.

  • Distinct

    MySQL reads a single row from the table for each combination of rows from the tables listed earlier in the EXPLAIN output.

  • Not exists

    MySQL can perform a LEFT JOIN "missing rows" optimization that quickly eliminates rows from consideration.

By contrast, some Extra values indicate that the query is not efficient:

  • Using filesort

    Rows that satisfy the query must be sorted, which adds an extra processing step.

  • Using temporary

    A temporary table must be created to process the query.

  • Range checked for each record

    MySQL cannot determine in advance which index from the table to use. For each combination of rows selected from previous tables, it checks the indexes in the table to see which one will be best. This is not great, but it's better than using no index at all.

Using filesort and Using temporary generally are the two indicators of worst performance.

To use EXPLAIN for query analysis, examine its output for clues to ways the query might be improved. Modify the query, and then run EXPLAIN again to see how its output changes. Changes might involve rewriting the query or modifying the structure of your tables.

The following query rewriting techniques can be useful:

  • If the keys value is NULL even when there are indexes available, you can try adding a USE INDEX option as a hint to the optimizer which index is relevant for the query. To force MySQL to use the index, use FORCE INDEX. To tell MySQL to ignore an index that it chose and choose a different one instead, use IGNORE INDEX. Each of these options is used in the FROM clause, following the table name containing the index you want to control. The option is followed by parentheses containing a comma-separated list of one or more index names. PRIMARY means the table's primary key.

     SELECT Name FROM CountryList USE INDEX(PRIMARY) WHERE Code > 'M'; SELECT Name FROM CountryList IGNORE INDEX(Population) WHERE Code < 'B' AND Population > 50000000; 

    The keyword KEY may be used instead of INDEX in all three options.

  • If you want to force MySQL to join tables in a particular order, begin the query with SELECT STRAIGHT_JOIN rather than SELECT, and then list the tables in the desired order in the FROM clause.

  • Sometimes a table in a query has an index available, but the query is written in such a way that prevents the index from being used. If you can rewrite the query into an equivalent form that allows use of the index, do so. Rewriting techniques are discussed in Section 22.3, "General Query Enhancement."

Another way to provide the optimizer with better information on which to base its decisions is to change the structure of your tables:

  • If the possible_keys value is NULL in the output from EXPLAIN, it means MySQL finds no applicable index for processing the query. See whether an index can be added to the columns that identify which records to retrieve. For example, if you perform a join by matching a column in one table with a column in another, but neither of the columns is indexed, try indexing them.

  • Keep table index statistics up to date to help MySQL choose optimal indexes. If the table is a MyISAM or InnoDB table, you can update its statistics with the ANALYZE TABLE statement. As a table's contents change, the statistics go out of date and become less useful to the optimizer in making good decisions about query execution strategies. You should run ANALYZE TABLE more frequently for tables that change often than for those that are updated rarely.

Be careful when using EXPLAIN to analyze a statement that includes a subquery in the FROM clause, if the subquery itself is slow. For such a subquery, MySQL must execute it to determine what it returns so that the optimizer can formulate an execution plan for the outer query.



MySQL 5 Certification Study Guide
MySQL 5.0 Certification Study Guide
ISBN: 0672328127
EAN: 2147483647
Year: 2006
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net