SQL Tuning Statements and Practices | Tuning Stored Programs and Their SQL

MySQL provides several statements and utilities that assist with tuning SQL, and you need to be familiar with these resources. The statements and utilities are described in the following sections.

19.3.1. EXPLAIN Statement

The most important SQL tuning statement in the MySQL language is EXPLAIN. EXPLAIN exposes the execution plan that the optimizer will use to resolve a particular SQL statement. Without EXPLAIN, you are doomed to trial-and-error tuning.

EXPLAIN has a simple syntax:

 EXPLAIN sql_text;

EXPLAIN returns a result set consisting of at least one row for each table referenced in the SQL. Additional rows might be returned to indicate how subqueries or derived tables are used in the query. Example 19-3 is a simple demonstration of an explain plan for a two-table join (we used the G option to print the output with each column on a separate line).

Example 19-3. Example of EXPLAIN output

mysql> EXPLAIN SELECT customer_name
 -> FROM employees join customers
 -> ON(customers.sales_rep_id=employees.employee_id)
 -> WHERE employees.surname='GRIGSBY'
 -> AND employees.firstname='RAY' G

*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: employees1
 type: ref
possible_keys: PRIMARY,i_employees_name3
 key: i_employees_name4
 key_len: 80
 ref: const,const
 rows: 15
 Extra: Using where; Using index6
*************************** 2. row ***************************
 id: 1
 select_type: SIMPLE
 table: customers2
 type: ref
possible_keys: i_customers_sales_rep
 key: i_customers_sales_rep8
 key_len: 9
 ref: sqltune.employees.EMPLOYEE_ID7
 rows: 55589
 Extra: Using where
2 rows in set (0.04 sec)

Let's take a look at the most important pieces of information from these plans. Numbers used in the explanation below correspond to superscripts in the EXPLAIN output above.

For joins, the order of the rows output by EXPLAIN corresponds to the join order, so the presence of the employees table in the first row indicates that employees was the first table in the join.
customers is the second table in the join.
MySQL had a choice between the primary key index and the i_employees_name index.
MySQL chose the i_employees_name index to retrieve rows from employees. This index was on (surname, firstname).
MySQL has determined that it will fetch only a single row in this stage of the query (e.g., it determined that there was only one employees row with that particular surname+firstname combination).
Because the columns in the i_employees_name index were the only employees columns included in the SQL, MySQL was able to satisfy this part of the query using the index aloneaccessing rows in the table itself was unnecessary.
MySQL was required to find rows in the customers table that matched specific values of employees.employee_id.
MySQL used the i_customers_sales_rep index to retrieve these rows (this was an index on customers.sales_rep_id).
MySQL expected to retrieve about 5558 rows from customers. The value here refers to the number of rows that are expected to be processed each time this step is executedwhich, in this case, is only once.

We'll look at a variety of EXPLAIN outputs for common query scenarios in the next few chapters. For now, the main thing to recognize and accept is that if you are going to be tuning SQL statements, you will need to get familiar with the EXPLAIN statement and learn how to interpret the EXPLAIN output.

The EXPLAIN statement is the primary tool in your SQL tuning toolbox. You should become competent in the interpretation of EXPLAIN output.

19.3.2. EXPLAIN and Stored Programs

Unfortunately, there is no way to directly obtain EXPLAIN output for the SQL statements inside stored programs. EXPLAIN will generate an error if asked to explain a CALL statement or a stored program name.

We hope that this restriction will be relaxed in future releases. In the meantime, to tune the SQL in your stored programs, you need to work with the SQL outside of the stored program and only add it to the program when you are satisfied that it is optimized.

19.3.3. Details of the EXPLAIN Output

The output from the EXPLAIN statement consists of lines containing the following columns:

Identifies the individual SELECT statement within a SQL statement that contains multiple SELECT clauses. There will be multiple SELECT statements in SQL statements that contain subqueries, in-line views, or UNION operations. All rows in the EXPLAIN output that have the same ID will belong to the same SELECT statement.

select_type

This column identifies the type of the SELECT statement responsible for this step. Table 19-1 lists the possible values.

Table 19-1. Possible values for the select_type column of the EXPLAIN statement output
select_type	Explanation
SIMPLE	A simple `SELECT` statement that does not involve either subqueries or `UNIONs`.
PRIMARY	If the SQL contains subqueries or `UNION`s, `PRIMARY` indicates the outermost SQL. `PRIMARY` could be the `SELECT` statement that contains subqueries within it or the first `SELECT` in a `UNION`.
UNION	The second or subsequent `SELECT` statements contributing to a `UNION` operation.
UNION RESULT	The result set of a `UNION` operation.
SUBQUERY	A subquery that returns rows that are not "dependent" on the rows in the outer `SELECT`. In practice, this means that the subquery does not contain references to columns in other `SELECT` statements.
DEPENDENT SUBQUERY	A subquery whose results are dependent on the values in an outer `SELECT`. This is typical of `EXISTS` subqueries and of `IN` subqueries (which MySQL rewrites as `EXISTS`).
DEPENDENT UNION	The second or subsequent `SELECT` in a `UNION` that is dependent on rows from an outer `SELECT`.
DERIVED	`SELECT` that appears within a subquery within the `FROM` clause of another SQL.

table

Indicates the name of the table involved in this step. If the table is aliased within the SQL statement, then the name of the alias rather than the name of the table will be reported.

type

Indicates the method by which rows will be selected from the table involved. Table 19-2 shows the possible values for the type column.

Table 19-2. Possible values for the type column of the EXPLAIN statement output
type	Explanation
all	All rows in the table concerned will be read. This occurs primarily when no suitable index exists to retrieve the rows, or when MySQL determines that a full scan of the table will be less expensive than an index lookup.
const	An index is used to retrieve all values from the table matching a constant value supplied in the `WHERE` clause.
eq_ref	An index is used to retrieve all rows from the table that match the rows supplied by a previous `SELECT`. `eq_ref` is typically seen in conjunction with a well-optimized, indexed join. `eq_ref` indicates that all parts of a unique or primary key index are used.
ref	`L`ike `eq_ref` except that either only part of the index can be used or the index is not unique or primary.
ref_or_null	`L`ike `ref` except that the condition also includes a search for null values.
index merge	`O`ccurs when MySQL merges multiple indexes to retrieve the results.
unique_subquery	An index lookup is used to satisfy the result of a subquery.
range	An index is used to retrieve a range of values from the table. This occurs typically when >, <, or `BETWEEN` operators are involved.
index	A full scan of the index is undertaken to find the necessary rows.

possible_keys

Lists all of the keys (indexes) that MySQL considered as having potential to resolve this step. If an index is listed here, but is not used to resolve the step, you can consider using optimizer hints to force or encourage the use of the index. If the index is not listed, then in all probability MySQL cannot use it.

key

Indicates the key (index) that MySQL used to resolve the query.

key_len

Shows the length of the columns in the index used to resolve the query. If there is more than one column in the index, key_len might indicate that only part of the index is used.

ref

Shows which columns are used to select rows from the table. ref may list columns from other tables (join columns from other tables) or the word const if a constant value will be used (this constant value might have come from a WHERE clause literal, or might have been obtained earlier in the query execution).

rows

Indicates the number of rows that MySQL estimates will be processed by this step.

Extra

Contains additional information about the execution step. Possible values for Extra are shown in Table 19-3. Multiple values from this column may appear in the Extra column, separated by semicolons.

Table 19-3. Possible values for the extra column of the EXPLAIN statement output
Extra	Explanation
distinct	MySQL will stop searching for more rows after the first match is found.
not exists	Occurs in a `LEFT JOIN` when there is an additional `WHERE` clause condition that indicates that the `WHERE` clause condition will never be satisfied. A `LEFT JOIN` with an `IS NULL` condition will generate this output. This allows the optimizer to eliminate the table from further processing.
range checked for each record	There is no good general-purpose index to support a join. MySQL will determine on a row-by-row basis whether to use an index and/or which index to use.
Using filesort	MySQL needs to return rows in order, and no index is available to support that ordering. MySQL will need to sort the rows and may need to write to disk during that sorting. Even if there is sufficient memory to avoid a disk sort, you will still see this tag if a sort is necessary.
Using index	This step could be resolved by reading an index alone. Typically, this occurs when all of the columns required to resolve the step are present in an index.
Using index for group-by	Same as `Using index`, but used to support a `GROUP BY` operation.
Using temporary	A temporary table is created to hold intermediate results. Often seen in conjunction with `using filesort`.
Using where	The results returned by this step are filtered to satisfy the `WHERE` clause condition.
Using sort_union	Similar to `using union` except that the rows had to be sorted before the `UNION` could be performed, usually because range conditions are involved.
Using union	A form of index merge in which rows that appeared in any of the index scans are returned. Typically used to support `WHERE` clause conditions that include `OR` conditions.
Using intersect	A form of index merge in which only the rows appearing in all of the index scans are returned. Typically used to support `WHERE` clause conditions that include only `AND` conditions.

19.3.4. Extended EXPLAIN

An undocumented feature of the EXPLAIN statement can be used to reveal the rewrites that MySQL performs on a statement prior to execution.

If you issue the statement EXPLAIN EXTENDED sql, followed by SHOW WARNINGS, MySQL will print the SQL that it actually executes, including any rewrites applied to the SQL by the optimizer. For instance, in Example 19-4, we see how MySQL rewrites an IN subquery to an EXISTS subquery.

Example 19-4. Using EXPLAIN EXTENDED

mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM ta_5000 WHERE sales_id IN (SELECT sales_
id FROM tb_5000)G
*************************** 1. row ***************************
 id: 1
 select_type: PRIMARY
 table: ta_5000
 type: ALL
possible_keys: NULL
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 5131
 Extra: Using where
*************************** 2. row ***************************
 id: 2
 select_type: DEPENDENT SUBQUERY
 table: tb_5000
 type: ALL
possible_keys: NULL
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 4985
 Extra: Using where
2 rows in set, 1 warning (0.04 sec)

mysql> SHOW WARNINGS G
*************************** 1. row ***************************
 Level: Note
 Code: 1003
Message: select count(0) AS 'count(*)' from 'sqltune'.'ta_5000' where 
('sqltune'.'ta_5000'.'SALES_ID',(select
1 AS 'Not_used' from 'sqltune'.'tb_5000' where (('sqltune'.'ta_5000'.
'SALES_ID') = 'sqltune'.'tb_5000'.'SALES_ID')))
1 row in set (0.05 sec)

Most of the time, MySQL rewrites are not particularly significant. However, if you are completely at a loss to understand MySQL's refusal to use an index or some other execution plan decision, examining the rewrite might be useful.

19.3.5. Optimizer Hints

Optimizer hints are instructions that you can embed in your SQL that do not change the meaning of the SQL, but rather instruct or suggest to the optimizer how you would like the SQL to be executed.

Most of the time, you will not need to add hints. In fact, hints can be dangerous because they limit the choices the optimizer has available, and if data in the tables change or if new indexes are added to the table, MySQL may be unable to adapt because of your hints. However, there definitely will be situations where you will discover that the optimizer has made a less than perfect decision and you will want to give the optimizer specific instructions.

Table 19-4 lists the commonly used optimizer hints . We will see examples of each of these hints in the next two chapters.

Table 19-4. MySQL optimizer hints
Hint	Where it appears	What it does
STRAIGHT_JOIN	After the `SELECT` clause	Forces the optimizer to join the tables in the order in which they appear in the `FROM` clause. Use this if you want to force tables to be joined in a particular order.
USE INDEX(`index` [,`index`...])	After a table name in the `FROM` clause	Instructs MySQL to only consider using the indexes listed. MySQL may choose to use none of the indexes if it calculates that using them would not be faster than scanning the entire table.
FORCE INDEX(`index` [,`index`...])	After a table name in the `FROM` clause	Instructs MySQL to use one of the indexes listed. This differs from `USE INDEX` in that MySQL is instructed not to perform a table scan of the data unless it is impossible to use any of the indexes listed.
IGNORE INDEX(`index` [,`index`...])	After a table name in the `FROM` clause	Instructs MySQL not to consider any of the listed indexes when working out the execution plan.

19.3.6. Measuring SQL and Stored Program Execution

When we execute a SQL statement from the MySQL command line, MySQL is kind enough to report on the elapsed time taken to execute the statement:

 mysql> CALL TestProc1( );
 Query OK, 0 rows affected (9.35 sec)

Elapsed time is a good first measurement of SQL or stored program performance, but there are lots of reasons why elapsed time might vary between runs that may have absolutely nothing to do with how well the SQL statement is optimized:

Other users may be running jobs on the host while we execute our SQL statements; we will be contending with them for CPU, disk I/O, and locks.
The number of physical I/Os necessary to execute our statement will vary depending on the amount of data cached in the operating system, the MyISAM key cache, the InnoDB buffer pool, and/or some other storage engine-specific cache.

For these reasons, it is sometimes better to obtain additional metrics to work out whether our tuning efforts are successful. Useful execution statistics can be obtained from the SHOW STATUS statement, although the level of detail will vary depending on our storage engine, with InnoDB currently offering the most comprehensive selection of statistics.

Generally, we will want to compare before and after variables for each statistic andbecause the statistics are sometimes computed across all sessions using the MySQL serverensure that our session has exclusive use of the server while the statement runs.

In Example 19-5, we calculate the number of logical and physical reads performed while counting the number of rows on the InnoDB-based sales table. Logical reads are the number of block requests from the InnoDB buffer pool, while physical reads reflect the number of blocks that actually had to be read from disk.

Example 19-5. Examining InnoDB execution statistics before and after SQL statement execution

mysql> /* Logical reads before execution*/
SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests';
+----------------------------------+-------+
| Variable_name | Value |
+----------------------------------+-------+
| Innodb_buffer_pool_read_requests | 598 |
+----------------------------------+-------+
1 row in set (0.01 sec)

mysql> /* Physical reads before execution*/
SHOW STATUS LIKE 'Innodb_data_reads';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Innodb_data_reads | 79 |
+-------------------+-------+
1 row in set (0.01 sec)

mysql>
mysql> SELECT count(*) from sales;
+----------+
| count(*) |
+----------+
| 2500000 |
+----------+
1 row in set (27.67 sec)

mysql>
mysql> /* Logical reads after execution*/
SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests';
+----------------------------------+--------+
| Variable_name | Value |
+----------------------------------+--------+
| Innodb_buffer_pool_read_requests | 365177 |
+----------------------------------+--------+
1 row in set (0.46 sec)

mysql> /* Physical reads after execution*/
SHOW STATUS LIKE 'Innodb_data_reads';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Innodb_data_reads | 17472 |
+-------------------+-------+
1 row in set (0.01 sec)

Subtracting the before values from the after values gives us a logical read count of 364,579 and a physical read count of 17,393. We also note the elapsed time of 27.67 seconds.

The next time we execute this query, we might see a lower physical read count and a lower elapsed time because the data we need is already in cache. However, we would not expect the logical read count to change unless the data in the table was changed. This makes the logical read statistics (Innodb_buffer_pool_read_requests) arguably the most useful statistics for determining if our SQL tuning efforts have been successful.

Table 19-5 shows the SHOW STATUS variables that are most useful for measuring SQL execution performance.

Table 19-5. SHOW STATUS statistics that are useful when measuring SQL performance
SHOW STATUS statistic	Explanation
Innodb_buffer_pool_read_requests	Number of requests from the InnoDB buffer pool. This statistic is sometimes called logical reads since it reflects the absolute number of data reads required to satisfy a query. This value will remain constant between runs provided that our data does not change. If we observe a reduction in this statistic, then we have almost certainly improved the performance of our query.
Innodb_data_reads	Number of blocks from disk that InnoDB had to read to execute the query. If the cache is empty, then this value will be equal to `Innodb_buffer_pool_read_requests`. If all of the required blocks are in the cache, then this statistic will be 0. Usually, the value will be somewhere in between. If two executions of the same SQL have different response times, we can look at this statistic to determine if the difference is because one execution required more physical I/O[a].
Innodb_rows_read	Number of rows read by InnoDB to satisfy the query. For some SQL statements, we may see excessive values for this statistic, which generally indicates that the SQL is inefficient (because it is accessing the same rows twice, or because it is accessing more rows than are required).
Last_query_cost	Optimizer's "cost" estimate for the last SQL executed. Unlike the other metrics, this statistic does not require us to have to calculate a delta value. Higher costs indicate that the optimizer thinks the SQL will take longer to run.
Sort_rows	Number of rows that had to be sorted.
Sort_merge_passes	Number of disk sort "merge runs" that had to be performed. The fewer merge runs, the faster the sort. Chapter 21 describes sort optimization in detail.

[a] For example, if we execute a new SQL statement twice, the second execution will usually have a lower elapsed time because the first execution brings the required blocks into the InnoDB buffer pool or the MyISAM key cache.

19.3.7. The Slow Query Log

One way to identify SQL statements or stored programs that may need tuning is to enable the MySQL slow query log. We can do this by adding the following lines to our MySQL initialization files:

 log_slow_queries
 long_query_time=N

This will cause MySQL to write any queries that exceed an elapsed time exceeding N seconds to a log file. The log file can be found in the MySQL data directory and is named hostname-slow.log. For each SQL statement identified, MySQL will print the SQL statement along with a few execution statistics, as shown in Example 19-6.

Example 19-6. Example of slow query log contents

Time Id Command Argument
# Time: 050214 23:42:30
# User@Host: root[root] @ localhost [127.0.0.1]
# Query_time: 67 Lock_time: 0 Rows_sent: 1 Rows_examined: 101199
use sqltune;
select count(*) from customers where contact_surname not in (select surname from 
employees);

The slow query log execution statistics are not particularly enlightening, and there is no EXPLAIN output, so we would normally paste the SQL into our MySQL client for further analysis.

Starting with MySQL 5.1.6, the slow query log can be directed to the database table mysql.slow_log. This allows us to more easily access the information from MySQL clients and gives us the power to analyze the information using SQL statements. We enable logging to this table by specifying log_output=TABLE in our initialization file.

Part I: Stored Programming Fundamentals

Introduction to MySQL Stored Programs

MySQL Stored Programming Tutorial

Language Fundamentals

Blocks, Conditional Statements, and Iterative Programming

Using SQL in Stored Programming

Error Handling

Part II: Stored Program Construction

Creating and Maintaining Stored Programs

Transaction Management

MySQL Built-in Functions

Stored Functions

Triggers

Part III: Using MySQL Stored Programs in Applications

Using MySQL Stored Programs in Applications

Using MySQL Stored Programs with PHP

Using MySQL Stored Programs with Java

Using MySQL Stored Programs with Perl

Using MySQL Stored Programs with Python

Using MySQL Stored Programs with .NET

Part IV: Optimizing Stored Programs

Stored Program Security

Tuning Stored Programs and Their SQL

Basic SQL Tuning

Advanced SQL Tuning

Optimizing Stored Program Code

Best Practices in MySQL Stored Program Development