Problem Queries | MySQL Database Design and Tuning

< Day Day Up >

After looking into the query situation, you realize that, basically, two types of problem queries exist. The first is encountered when a user tries to look up the status of a shipment. However, it is not consistent: It appears to happen sporadically for most users. The second problem query happens to everyone whenever they attempt to accept a new package for shipment.

Package Status Lookup

Internal employees and external website users have begun complaining that it takes too long to look up the shipping status for a package. What makes this more perplexing is that it doesn't happen all the time. Some queries run very fast, whereas others can take minutes to complete.

The principal tables for tracking package status include the following:

 CREATE TABLE package_header (     package_id INTEGER PRIMARY KEY AUTO_INCREMENT,     dropoff_location_id SMALLINT(3),     destination_location_id SMALLINT(3),     sender_first_name VARCHAR(20),     sender_last_name VARCHAR(30), ...     recipient_first_name VARCHAR(20),     recipient_last_name VARCHAR(30), ...     recipient_fax VARCHAR(30), ...     INDEX (sender_last_name, sender_first_name),     INDEX (recipient_last_name, recipient_first_name),     INDEX (recipient_fax) ) ENGINE = INNODB; CREATE TABLE package_status (     package_status_id INTEGER PRIMARY KEY AUTO_INCREMENT,     package_id INTEGER NOT NULL REFERENCES package_header(package_id), ... ...     package_location_id SMALLINT(3) NOT NULL,     activity_timestamp DATETIME NOT NULL,     comments TEXT,     INDEX (package_id) ) ENGINE = INNODB;

Diagnosis

As an experienced MySQL expert, you know that MySQL offers a number of valuable tools to help spot performance problems. One of them is the slow query log, as discussed in Chapter 2, "Performance Monitoring Options." By simply enabling this log, you can sit back and wait for the troubled queries to make their presence known.

Sure enough, after a few minutes you see some candidates:

 # Time: 060306 17:26:18 # User@Host: [fpembleton] @ localhost [] # Query_time: 6  Lock_time: 0  Rows_sent: 12  Rows_examined: 573992012 SELECT ph.*, ps.* FROM package_header ph, package_status ps WHERE  ph.package_id = ps.package_id AND ph.recipient_fax like '%431-5979%'; # Time: 060306 17:26:19 # User@Host: [wburroughs] @ localhost [] # Query_time: 9  Lock_time: 0  Rows_sent: 0  Rows_examined: 5739922331 SELECT ph.*, ps.* FROM package_header ph, package_status ps WHERE  ph.package_id = ps.package_id AND ph.recipient_fax like '%785-4551%'; # Time: 060306 17:26:21 # User@Host: [nikkis] @ localhost [] # Query_time: 9  Lock_time: 0  Rows_sent: 0  Rows_examined: 5739922366 SELECT ph.*, ps.* FROM package_header ph, package_status ps WHERE  ph.package_id = ps.package_id AND ph.recipient_fax like '%341-1142%';

Now that you've found what appears to be a problem query, your next step is to run EXPLAIN to see what steps the MySQL optimizer is following to obtain results:

 mysql> EXPLAIN     -> SELECT ph.*, ps.*     -> FROM package_header ph, package_status ps     -> WHERE ph.package_id = ps.package_id     -> AND ph.recipient_fax like '%431-5979%'\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: ph          type: ALL possible_keys: PRIMARY           key: NULL       key_len: NULL           ref: NULL          rows: 521750321         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: ps          type: ref possible_keys: package_id           key: package_id       key_len: 4           ref: high_hat.ph.package_id          rows: 1         Extra: 2 rows in set (0.00 sec)

This output provides the answer: MySQL is performing an expensive table scan on package_header every time a user searches on recipient fax. Considering the sheer size of the table, it's apparent that this leads to very lengthy queries. It also explains the sporadic nature of the query problem: Most status queries use some other lookup criteria.

When you interview the developer of the query, you learn that this query exists to serve customers, who might not always know the area code for the recipient fax. To make the query more convenient, the developer allowed users to just provide a phone number, and he places a wildcard before and after the number to find all possible matches. He's aghast to learn that this type of query frequently renders existing indexes useless.

Solution

When faced with a large-table query that is not correctly taking advantage of indexes, you have two very different options: Fix the query or add a new index. In this case, it's probably easiest and wisest to just correct the query. The application logic should force the user to enter an area code and fax number. In combination, these two values will be able to employ the index:

 mysql> EXPLAIN     -> SELECT ph.*, ps.*     -> FROM package_header ph, package_status ps     -> WHERE ph.package_id = ps.package_id     -> AND ph.recipient_fax like '516-431-5979'\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: ph          type: range possible_keys: PRIMARY,recipient_fax           key: recipient_fax       key_len: 30           ref: NULL          rows: 1         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: ps          type: ref possible_keys: package_id           key: package_id       key_len: 4           ref: high_hat.ph.package_id          rows: 1         Extra: 2 rows in set (0.00 sec)

As you saw earlier during the review of MySQL's optimizer in Chapter 6, "Understanding the MySQL Optimizer," version 5.0 offers better index utilization. In this case, the developer might elect to allow the user to query on several area codes. The new optimizer capabilities mean that MySQL can still take advantage of the index:

 mysql> EXPLAIN     -> SELECT ph.*, ps.*     -> FROM package_header ph, package_status ps     -> WHERE ph.package_id = ps.package_id     -> AND ((ph.recipient_fax like '516-431-5979')     -> OR (ph.recipient_fax like '212-431-5979'))\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: ph          type: range possible_keys: PRIMARY,recipient_fax           key: recipient_fax       key_len: 30           ref: NULL          rows: 2         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: ps          type: ref possible_keys: package_id           key: package_id       key_len: 4           ref: high_hat.ph.package_id          rows: 1         Extra: 2 rows in set (0.00 sec)

Shipping Option Lookup

To wring more profit from its shipping service, High-Hat implemented a complex pricing mechanism, with thousands of possible prices based on weight, distance, potential value of the customer, currency, language, and so on. All of this information is stored in a single, vital lookup table:

 CREATE TABLE shipping_prices (     price_id INTEGER PRIMARY KEY AUTO_INCREMENT,     price_code CHAR(17) NOT NULL,     from_zone SMALLINT(3) NOT NULL,     to_zone SMALLINT(3) NOT NULL,     min_weight DECIMAL(6,2) NOT NULL,     max_weight DECIMAL(6,2) NOT NULL,     ...     price_in_usd decimal(5,2) NOT NULL,     price_in_euro decimal(5,2) NOT NULL,     price_in_gbp decimal(5,2) NOT NULL,     ...     price_in_zambia_kwacha DECIMAL(15,2) NOT NULL,     price_rules_in_english LONGTEXT NOT NULL,     price_rules_in_spanish LONGTEXT NOT NULL,     ...     price_rules_in_tagalog LONGTEXT NOT NULL,     price_rules_in_turkish LONGTEXT NOT NULL,     ...     INDEX (price_code),     INDEX (from_zone),     INDEX (to_zone),     INDEX (min_weight),     INDEX (max_weight) ) ENGINE = MYISAM;

Users are complaining that it takes too long to look up the potential price to ship a package. In several cases, customers have either hung up on the High-Hat sales representative or even stormed out of the package drop-off centers.

Diagnosis

Given how frequently this data is accessed by users, it seems that it should be resident in memory most of the time. However, this is not what your analysis shows.

The first thing you check is the size of the table and its indexes. You're surprised to see that this table has hundreds of thousands of very large rows, which consumes enormous amounts of space and makes full memory-based caching unlikely.

The next observation that you make is that this is a heavily denormalized table. This means that when a High-Hat representative retrieves the necessary rows to quote a price to a customer in France, each row that she accesses contains vastly larger amounts of information (such as the price in all currencies and shipping rules in all languages), even though this data is irrelevant in her circumstance.

Finally, you examine the query cache to see how many queries and results are being buffered in memory. You're disappointed to see that the query cache hit rate is very low. However, this makes sense: Recall that if two queries differ in any way, they cannot leverage the query cache.

Solution

The underlying problem here is that the database design is horribly inefficient: Had the designers done a better job of normalization, there would be reduced memory requirements for the essential lookup columns; extraneous columns would not even be included in most result sets. Alas, a database redesign is out of the question, so your next course of action is to make the best of a bad situation and help MySQL do a better job of caching information given the dreadful database design.

Often, the least aggravating and time-consuming approach to raising cache performance is to simply plug more memory into your database server. However, in this case, the server has no more storage capacity, so you need to come up with an alternative strategy. The only remaining choice is to focus on MySQL configuration.

You have several choices when deciding how to cache heavily accessed tables containing critical lookup information that is infrequently updated.

Switch to a MEMORY table These fast, RAM-based tables were explored in Chapter 4, "Designing for Speed," overview of MySQL's storage engines. If there was sufficient RAM, you could theoretically load the entire shipping_prices table into memory. However, there isn't enough storage, so this option is not workable.
Increase utilization of the key cache As you saw in Chapter 11, "MyISAM Performance Enhancement," the MyISAM key cache leverages memory to hold index values, thereby reducing costly disk access. However, memory is already a precious commodity on this server, so it's unlikely that you'll be able to extract some additional RAM from your administrators. In addition, this isn't an index problem; instead, the fault lies with the sheer amount of data in each row.
Make better use of the query cache As you have seen, the query cache buffers frequently used queries and result sets. However, there are several important requirements before a query can extract results from this buffer. One crucial prerequisite is that a new query must exactly match already-cached queries and result sets. If the new query does not match, the query cache will not be consulted to return results.
In this case, you know from your analysis that there is a high degree of variability among queries and result sets, which means that even the largest query cache won't help.
Employ replication Recall from earlier in this chapter, the replication discussion that significant performance benefits often accrue by simply spreading the processing load among multiple machines. In this case, placing this fundamental lookup table on its own dedicated machines is very wise. Because there are no other tables with which to contend, it will have the lion's share of memory, so caching hit rates should be somewhat improved.
The application will then be pointed at this server to perform lookups, which should require minimal code changes. However, given the mammoth amount of data found in this table, it's vital that the replicated server have sufficient memory and processor speed to effectively serve its clients. Last but not least, these large rows have the potential to crowd your network, so it's important that the replicated server be placed on a fast network with ample bandwidth. If not, there's a real risk that you might trade one performance problem for another.

< Day Day Up >