Optimizing Selects | Mastering MySQL 4

In a join, you can calculate the number of rows MySQL needs to search by multiplying all the rows together. In the following example, MySQL would need to examine 5*8*1 rows, giving a total of 40:

mysql> EXPLAIN SELECT * FROM customer,sales_rep,sales  WHERE sales.sales_rep=sales_rep.employee_number  AND customer.id=sales.id; +-----------+--------+---------------+---------+---------+----------+------+------------+ | table     | type   | possible_keys | key     | key_len | ref      | rows | Extra      | +-----------+--------+---------------+---------+---------+----------+------+------------+ | sales_rep | ALL    | PRIMARY       | NULL    |    NULL | NULL     |    5 |            | | sales     | ALL    | NULL          | NULL    |    NULL | NULL     |    8 | where used | | customer  | eq_ref | PRIMARY       | PRIMARY |       4 | sales.id |    1 |            | +-----------+--------+---------------+---------+---------+----------+------+------------+ 3 rows in set (0.00 sec)

You can see that the more tables that are joined, the greater the rows searched. Part of good database design is balancing the choice between smaller database tables that require more joins with larger tables that are harder to maintain. Chapter 8, "Database Normalization," will introduce some useful techniques to help you achieve this.

For purposes of optimizing, I'm going to concentrate on the join between sales_rep and sales from the previous query. I've re-created just that join below (using the alternative LEFT JOIN syntax):

 mysql> EXPLAIN SELECT * FROM sales_rep LEFT JOIN sales   ON sales.sales_rep = sales_rep.employee_number; +-----------+------+---------------+------+---------+------+------+-------+ | table     | type | possible_keys | key  | key_len | ref  | rows | Extra | +-----------+------+---------------+------+---------+------+------+-------+ | sales_rep | ALL  | NULL          | NULL |    NULL | NULL |    5 |       | | sales     | ALL  | NULL          | NULL |    NULL | NULL |    8 |       | +-----------+------+---------------+------+---------+------+------+-------+ 2 rows in set (0.00 sec)

The number of rows to examine in this query is five times eight (from the rows column), which gives you 40. No indexes are used here. If you examine the structure of the first two tables, you can see why:

mysql> DESCRIBE sales; +-----------+---------+------+-----+---------+-------+ | Field     | Type    | Null | Key | Default | Extra | +-----------+---------+------+-----+---------+-------+ | code      | int(11) |      | PRI | 0       |       | | sales_rep | int(11) | YES  |     | NULL    |       | | id        | int(11) | YES  |     | NULL    |       | | value     | int(11) | YES  |     | NULL    |       | +-----------+---------+------+-----+---------+-------+ 4 rows in set (0.00 sec) mysql> DESCRIBE sales_rep; +-----------------+-------------+------+-----+---------+-------+ | Field           | Type        | Null | Key | Default | Extra | +-----------------+-------------+------+-----+---------+-------+ | employee_number | int(11)     |      | PRI | 0       |       | | surname         | varchar(40) | YES  |     | NULL    |       | | first_name      | varchar(30) | YES  |     | NULL    |       | | commission      | tinyint(4)  | YES  | MUL | NULL    |       | | date_joined     | date        | YES  |     | NULL    |       | | birthday        | date        | YES  |     | NULL    |       | +-----------------+-------------+------+-----+---------+-------+ 6 rows in set (0.00 sec)

Because you have no WHERE condition, the query will return all records from the first table—sales_rep. The join is then performed between the employee_number field from the sales_rep table (which you cannot use an index for because you are returning all records) and the sales_rep field from the sales table (which is not indexed). The problem is that the join condition does not make use of an index. If you added an index to the sales_rep field, you'd improve performance:

mysql> CREATE INDEX sales_rep ON sales(sales_rep); Query OK, 8 rows affected (0.01 sec) Records: 8  Duplicates: 0  Warnings: 0 mysql> EXPLAIN SELECT * FROM sales_rep LEFT JOIN sales  ON sales.sales_rep = sales_rep.employee_number; +-----------+------+---------------+-----------+---------+---------------------------+------+-------+ | table     | type | possible_keys | key       | key_len | ref                       | rows | Extra | +-----------+------+---------------+-----------+---------+---------------------------+------+-------+ | sales_rep | ALL  | NULL          | NULL      |    NULL | NULL                      |    5 |       | | sales     | ref  | sales_rep     | sales_rep |       5 | sales_rep.employee_number |    2 |       | +-----------+------+---------------+-----------+---------+---------------------------+------+-------+ 2 rows in set (0.00 sec)

You've reduced the number of rows that MySQL needs to read from 40 to 10 (5*2), a fourfold improvement.

If you performed the LEFT JOIN the other way around (with the sales table containing the possible nulls, not the sales_rep table), you'd get a different result with EXPLAIN:

mysql> EXPLAIN SELECT * FROM sales LEFT JOIN sales_rep ON   sales.sales_rep = sales_rep.employee_number; +-----------+--------+---------------+---------+---------+-----------------+------+-------+ | table     | type   | possible_keys | key     | key_len | ref             | rows | Extra | +-----------+--------+---------------+---------+---------+-----------------+------+-------+ | sales     | ALL    | NULL          | NULL    |    NULL | NULL            |    8 |       | | sales_rep | eq_ref | PRIMARY       | PRIMARY |       4 | sales.sales_rep |    1 |       | +-----------+--------+---------------+---------+---------+-----------------+------+-------+ 2 rows in set (0.00 sec)

Only eight rows are examined because while all rows from the sales table are being returned, the primary key on the sales_rep table (employee_number) is being used to perform the join.

As a further example, examine the following:

mysql> EXPLAIN SELECT * FROM sales_rep LEFT JOIN sales  ON sales.sales_rep = sales_rep.employee_number  WHERE sales.sales_rep IS NULL; +-----------+------+---------------+-----------+---------+---------------------------+------+------------+ | table     | type | possible_keys | key       | key_len | ref                       | rows | Extra      | +-----------+------+---------------+-----------+---------+---------------------------+------+------------+ | sales_rep | ALL  | NULL          | NULL      |    NULL | NULL                      |    5 |            | | sales     | ref  | sales_rep     | sales_rep |       5 | sales_rep.employee_number |    2 | where used | +-----------+------+---------------+-----------+---------+---------------------------+------+------------+ 2 rows in set (0.00 sec)

Now, if you change the sales_rep field to not allow nulls, you'll see some changes:

mysql> ALTER TABLE sales CHANGE sales_rep sales_rep INT NOT NULL; Query OK, 8 rows affected (0.00 sec) Records: 8  Duplicates: 0  Warnings: 0 mysql> EXPLAIN SELECT * FROM sales_rep LEFT JOIN sales ON   sales.sales_rep = sales_rep.employee_number WHERE   sales.sales_rep IS NULL; +-----------+------+---------------+-----------+---------+---------------------------+------+------------------------+ | table     | type | possible_keys | key       | key_len | ref                       | rows | Extra                  | +-----------+------+---------------+-----------+---------+---------------------------+------+------------------------+ | sales_rep | ALL  | NULL          | NULL      | NULL    | NULL                      |    5 |                        | | sales     | ref  | sales_rep     | sales_rep |    4    | sales_rep.employee_number |    2 | where used; Not exists | +-----------+------+---------------+-----------+---------+---------------------------+------+------------------------+ 2 rows in set (0.00 sec)

Notice that the index length (shown in key_len) is now 4, not 5. Because nulls are no longer permissible, each record does not have to store whether the field is null or not, so overall it is one byte shorter. Also, notice the Not exists comment in the extra column. Because the sales_rep field can no longer contain nulls, once MySQL finds a record that matches the LEFT JOIN criteria, it no longer needs to search for any more.

The order in which the tables are presented to MySQL can, in some cases, also make a difference in the speed of the query. MySQL tries its best to choose the best options, but it doesn't always know in advance which the quickest route will be. The next section explains how to help MySQL garner in advance as much information as possible about the index composition, but, as the following example demonstrates, sometimes even this isn't enough.

First, create four identical tables:

mysql> CREATE TABLE t1 (f1 int unique not null, primary key(f1)); Query OK, 0 rows affected (0.15 sec) mysql> CREATE TABLE t2 (f2 int unique not null, primary key(f2)); Query OK, 0 rows affected (0.15 sec) mysql> CREATE TABLE t3 (f3 int unique not null, primary key(f3)); Query OK, 0 rows affected (0.15 sec) mysql> CREATE TABLE t4 (f4 int unique not null, primary key(f4)); Query OK, 0 rows affected (0.15 sec)

Next, add two records to each:

mysql> INSERT INTO t1 VALUES(1),(2); Query OK, 2 rows affected (0.12 sec) Records: 2  Duplicates: 0  Warnings: 0 mysql> INSERT INTO t2 VALUES(1),(2); Query OK, 2 rows affected (0.12 sec) Records: 2  Duplicates: 0  Warnings: 0 mysql> INSERT INTO t3 VALUES(1),(2); Query OK, 2 rows affected (0.12 sec) Records: 2  Duplicates: 0  Warnings: 0 mysql> INSERT INTO t4 VALUES(1),(2); Query OK, 2 rows affected (0.12 sec) Records: 2  Duplicates: 0  Warnings: 0

Now, imagine you need to perform a join on these tables. The following query will give you the required results:

mysql> SELECT * FROM t1,t2 LEFT JOIN t3 ON (t3.f3=t1.f1)  LEFT JOIN t4 ON (t4.f4=t1.f1) WHERE t2.f2=t4.f4; +----+----+------+------+ | f1 | f2 | f3   | f4   | +----+----+------+------+ |  1 |  1 |    1 |    1 | |  2 |  2 |    2 |    2 | +----+----+------+------+ 2 rows in set (0.02 sec)

If you use EXPLAIN to examine the record, you'll notice the following:

mysql> EXPLAIN SELECT * FROM t1,t2 LEFT JOIN t3 ON  (t3.f3=t1.f1) LEFT JOIN t4 ON (t4.f4=t1.f1)  WHERE t2.f2=t4.f4; +-------+--------+---------------+---------+---------+-------+------+-------------------------+ | table | type   | possible_keys | key     | key_len | ref   | rows | Extra                   | +-------+--------+---------------+---------+---------+-------+------+-------------------------+ | t1    | index  | NULL          | PRIMARY |       4 | NULL  |    2 | Using index             | | t2    | index  | PRIMARY,b,f2  | PRIMARY |       4 | NULL  |    2 | Using index             | | t3    | eq_ref | PRIMARY,c,f3  | PRIMARY |       4 | t1.f1 |    1 | Using index             | | t4    | eq_ref | PRIMARY,d,f4  | PRIMARY |       4 | t1.f1 |    1 | where used; Using index | +-------+--------+---------------+---------+---------+-------+------+-------------------------+ 4 rows in set (0.00 sec)

There are two index scans, one on t1 and another on t2, meaning that MySQL needs to scan the entire index. Looking carefully at the query, you'll see that the LEFT JOIN is what requires t2 to be read before t4. You can get around this by changing the order of the tables and separating t2 from the LEFT JOIN:

mysql> EXPLAIN SELECT * FROM t2,t1 LEFT JOIN t3  ON (t3.f3=t1.f1) LEFT JOIN t4 ON (t4.f4=t1.f1)  WHERE t2.f2=t4.f4; +-------+--------+---------------+---------+---------+-------+------+-------------+ | table | type   | possible_keys | key     | key_len | ref   | rows | Extra       | +-------+--------+---------------+---------+---------+-------+------+-------------+ | t1    | index  | NULL          | PRIMARY |       4 | NULL  |    2 | Using index | | t3    | eq_ref | PRIMARY,c,f3  | PRIMARY |       4 | t1.f1 |    1 | Using index | | t4    | eq_ref | PRIMARY,d,f4  | PRIMARY |       4 | t1.f1 |    1 | Using index | | t2    | eq_ref | PRIMARY,b,f2  | PRIMARY |       4 | t4.f4 |    1 | Using index | +-------+--------+---------------+---------+---------+-------+------+-------------+ 4 rows in set (0.01 sec)

Notice the difference! According to the rows column, only 2*1*1*1 rows need to be read (2 in total), as opposed to 2*2*1*1 (4 in total) with the earlier query. Of course, the results are the same:

mysql> SELECT * FROM t2,t1 LEFT JOIN t3 ON (t3.f3=t1.f1)  LEFT JOIN t4 ON (t4.f4=t1.f1) WHERE t2.f2=t4.f4; +----+----+------+------+ | f2 | f1 | f3   | f4   | +----+----+------+------+ |  1 |  1 |    1 |    1 | |  2 |  2 |    2 |    2 | +----+----+------+------+ 2 rows in set (0.00 sec)

Once again, this demonstrates the importance of testing your queries with EXPLAIN. Without a good understanding of the inner workings of MySQL, you may never have known which of the above queries was quicker, even if you had an inkling there would be a difference. EXPLAIN quantifies your hunches, which develop with experience.

Helping MySQL's Query Optimizer with ANALYZE

The brain inside MySQL that decides which key, if any, to use is called the query optimizer. It takes a quick look at the index to see which indexes are the best to use. A human does something similar when looking for a book. For example, say that you were searching for a book by the author Zakes Mda entitled Ways of Dying, and you knew only that there were two indexes. If one is alphabetical by author name and consists of 4,000 entries, and the other is alphabetical by book title and consists of 12,000 entries, you'd probably decide to use the author index. But if you knew that Zakes Mda had written 200 books but that there was only one book entitled Ways of Dying, you'd probably use the other index. MySQL too performs better if it has an idea of what each index contains. You can provide this kind of information (called the cardinality, or number of unique values) by running the following command: ANALYZE TABLE tablename. There's not much point in running this on the sample tables, but on large tables with many inserts, updates, and deletes, regularly analyzing the table can help the performance.

ANALYZE TABLE updates the key distribution for the table if it is not up-to-date. (Running ANALYZE is equivalent to running myisamchk -a or myismachk --analyze. See Chapter 12, "Database Replication," for more information.)

Warning

This only works with MyISAM and BDB tables, and the table is locked with a read lock for the duration of this process. Therefore, you don't want to do this when your database is busy.

You can see the information available to MySQL by running the command SHOW INDEX:

mysql> SHOW INDEX FROM customer; +----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+---------+ | Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+---------+ | customer |          0 | PRIMARY  |            1 | id          | A         |           8 |     NULL | NULL   |         | | customer |          1 | surname  |            1 | surname     | A         |           8 |     NULL | NULL   |         | | customer |          1 | surname  |            2 | initial     | A         |           8 |     NULL | NULL   |         | | customer |          1 | surname  |            3 | first_name  | A         |           8 |     NULL | NULL   |         | +----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+---------+ 4 rows in set (0.01 sec)

Table 4.5 explains the meanings of the columns returned by a SHOW INDEX statement.

Table 4.5: Meanings of the Columns Returned by SHOW INDEX
Column	Description
Table	Name of the table you're looking at.
Non_unique	0 or 1. 0 means the index can't contain duplicates (a primary key or unique index), and1 means it can.
Key_name	Name of the index.
Seq_in_index	Order of columns in the index, starting with 1.
Column_name	Column name.
Collation	A or NULL. A means the index is sorted in ascending order, and NULL means it's not sorted.
Cardinality	Number of unique values in the index. This specifically is updated by running ANALYZE TABLE or myisamchk -a.
Sub_part	NULL if the entire column is indexed, otherwise the size of the index, in characters.
Packed	Whether the index is packed or not.
Null	YES if the column can contain NULL values.
Comment	Various remarks.

Deletes and updates can leave gaps in the table (especially when your tables contain TEXT, BLOB, or VARCHAR fields. This means the drive is doing too much work because the head needs to skip over these gaps when reading.

OPTIMIZE TABLE solves this problem, removing the gaps in the data by rejoining fragmented records, doing the equivalent of a defrag on the table data.

Tip

See Chapter 10, "Basic Administration," for more on the OPTIMIZE statement. Note that the table is locked for the duration of the process, so you don't want to run it during peak hours!

Optimizing SELECT Statements and Security

The more complex your permissions (see Chapter 14, "Database Security"), the more overhead you'll experience on your queries. This is hardly a reason to skimp on security, but if you have some high-volume queries, and another set of low-volume queries with complex permissions, it may be useful to keep the low-volume set as separate as possible.

Benchmarking Functions

The BENCHMARK() function tells you how long it takes MySQL to perform a function a certain number of times. It can help to give a basic idea of the difference in capabilities between machines. The syntax is as follows:

SELECT BENCHMARK(number_of_repetitions,expression)

Compare the difference between the following benchmarks, where MySQL calculates the square root of 999 10 million times. The first one was run on a fairly inactive 1GB Duron running Windows 98, and the second was run on a slightly busier Pentium III 850MhZ running Linux Red Hat 7. Just for good measure, I ran it a third time, on an ancient Cyrix 200MMX running FreeBSD 4.6, doing nothing else:

mysql> SELECT BENCHMARK(10000000,SQRT(999)); +-------------------------------+ | BENCHMARK(10000000,SQRT(999)) | +-------------------------------+ |                             0 | +-------------------------------+ 1 row in set (0.66 sec) mysql> SELECT BENCHMARK(10000000,SQRT(999)); +-------------------------------+ | BENCHMARK(10000000,SQRT(999)) | +-------------------------------+ |                             0 | +-------------------------------+ 1 row in set (2.73 sec) mysql> SELECT BENCHMARK(10000000,SQRT(999)); +-------------------------------+ | BENCHMARK(10000000,SQRT(999)) | +-------------------------------+ |                             0 | +-------------------------------+ 1 row in set (13.24 sec)

Warning

You should use BENCHMARK() with caution to compare machines. For a live database, there are many other factors, such as disk speed, which are not taken into account with this test. Its main aim is to help in the optimization of functions.