Speeding MyISAM Operations | MySQL Database Design and Tuning

< Day Day Up >

Now that you've examined how to make the most of MyISAM's disk and memory structures, this section devotes attention to improving performance for those applications that use this storage engine.

Loading Information

Operations that load data can consume massive amounts of time and system resources. Chapter 15, "Improving Import and Export Operations," which is dedicated to import and export processing, discusses general best practices for these types of procedures. For now, the following sections look at several key MyISAM-specific parameters that affect data load performance.

Specifying Table Size

Chapter 4, "Designing for Speed," reviewed the implications of tuning the average row length and row size parameters (AVG_ROW_LENGTH and MAX_ROWS, respectively) for MyISAM tables.

If you are unsure of what these values should be, but still want the option to create very large MyISAM tables, you can raise the value of the myisam_data_pointer_size parameter. Its default value of four (that is, 32 bits) translates to sufficient row pointer space to support a maximum table size of 4GB, assuming that your operating system allows a file this large. By raising it (to a maximum value of eight, or 64 bits), you increase the potential size of the table to a truly enormous value, but only for those tables created after the parameter has been changed; existing tables still face the previous size limitations. The trade-off is a slightly increased overhead cost for tables that are not this large.

Unless you really need automatic support for such big tables (such as for system-generated temporary tables that exceed 4GB), it's wise to just leave this parameter set to its default; you always have the option to raise the MAX_ROWS value or use MySQL's built-in RAID capabilities to support any necessary enormous user-created tables.

Delaying Key Writes

The MyISAM key cache was first mentioned in Chapter 7; this vital engine feature is discussed in more detail a little later in this chapter. For now, this section spends a moment discussing the benefits versus drawbacks of delaying key writes.

When you create a MyISAM table, MySQL gives you the option to request, by appending DELAY KEY WRITE=1 to your CREATE TABLE statement, that updates made to the key cache not be flushed to disk until the table is closed. Although this postpones the inevitable disk writing until it can be completed all at once, you do run the risk of lost or corrupted data should your server crash before the write has been completed. You can use the delay_key_write server variable to further configure this functionality.

For example, if it is set to OFF, MyISAM flushes the key cache when necessary, regardless of whether the table creator wanted delayed key writes.

Bulk Insert Buffering

MySQL offers special in-memory buffers for situations when you perform certain kinds of data loading operations for nonempty tables, such as multi-row INSERT, INSERT ... SELECT, and LOAD DATA INFILE statements. You can control the size of this buffer by setting the bulk_insert_buffer_size server variable.

To measure the performance impact of tuning bulk_insert_buffer_size, we created several empty MyISAM-based sample tables, composing each table with a variety of data types. To create a baseline performance measurement, we ran mysqlimport and loaded approximately 100 million records into each table.

Next, we deleted all rows from each table, set bulk_insert_buffer_size to zero (which canceled out the internal cache), restarted the server, and then reran mysqlimport. Interestingly, we saw no performance degradation when reloading these tables; it took the same amount of time as when the internal cache was enabled. Why? Quite simply, MySQL doesn't use this cache on completely empty tables. To take advantage of this cache, be certain that your tables contain some data, even if it is a single sample row that you will delete post-reload.

We then reran the test with each table containing only one row. With these settings, there was a dramatic difference in performance: This time, when the cache was disabled, the import operations took, on average, 40% longer than with an 8MB cache enabled. Finally, we tested the impact of larger caches. As you would expect, the law of diminishing returns set in; enormous values for this parameter did not translate into corresponding performance gains.

Deactivating Indexes

Deactivating index updates is one way to speed inserts of large amounts of data into an indexed MyISAM table. To do this, append --keys-used to myisamchk, along with a bitmask indicating the index that you want to deactivate. If you pass in a zero, all indexes are ignored when new data is inserted into the table.

After you have loaded your information, run myisamchk with the r option to check the table's integrity and then rebuild the indexes.

Improving FULLTEXT Searches

Given that free-form, character-based data makes up a significant portion of the information tracked by many relational database applications, MyISAM's FULLTEXT search capabilities make it much easier to quickly locate and retrieve relevant results.

As entire books have been written regarding optimal search algorithms, this section primarily focuses on ways to advance the speed at which these specific types of searches operate.

Before getting started, how does MySQL implement FULLTEXT searches? First, your table must be designed to use the MyISAM storage engine, and cannot use MyISAM-based MERGE tables. Next, MySQL will only create FULLTEXT indexes for columns that are defined as CHAR, VARCHAR, or TEXT. The column cannot use the ucs2 character set. Finally, all columns in a single FULLTEXT index must have the same character set and collation. You can request these indexes as part of any of these operations:

CREATE TABLE
ALTER TABLE
CREATE INDEX

To build the index, MySQL walks through the text in the identified column(s), breaking the text down into individual words that are then placed into the index, along with a pointer back to the data where the word is to be found. After the index is in place, you can then search it via the MATCH() function.

Now that we've briefly explained how these indexes are created and used, let's describe a scenario and then set up a sample table to help us illustrate how to design efficient FULLTEXT queries.

For years, High-Hat Airways' executives have told the CEO that their airline's customer satisfaction metrics are the highest in the world. The CEO, thrilled with this ongoing customer contentment, decides to launch a contest to find the happiest High-Hat client. Customers will be encouraged to write letters to the airline, describing their best High-Hat experience. The winner of the contest will be determined by a panel of distinguished judges, who will award a fabulous prize an entire wardrobe of High-Hat merchandise, from baseball caps to t-shirts to luxury slippers. These judges will use a web-based user interface to search the millions of expected entries for certain keywords and phrases, and then vote for their favorites.

As designer of the application, you create a new table to hold these entries:

 CREATE TABLE customer_letters (         id INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,         letter_date DATE NOT NULL,         customer_id INTEGER NOT NULL,         letter_body LONGTEXT NOT NULL ) ENGINE = MYISAM;

After determining your table design, as well as the fact that you need FULLTEXT search capabilities, how should you configure this capability?

Configuring FULLTEXT Options

You should keep several key parameters in mind as you set up your FULLTEXT search options:

Words to ignore MySQL maintains a list of words that are disregarded during FULLTEXT searches. For the curious and/or source code literate, you can find these words in the ft_static.c file, found in the myisam directory:
```
 #ifdef COMPILE_STOPWORDS_IN /* This particular stopword list was taken from SMART distribution    ftp://ftp.cs.cornell.edu/pub/smart/smart.11.0.tar.Z    it was slightly modified to my taste, though  */   "a's",   "able",   "about", ... ...   "yourself",   "yourselves",   "zero", #endif NULL } ; 
```
As entries begin to flow in, you are dismayed to observe that many of them are filled with rude remarks, obscenities, assertions about the matriarchal parentage of the CEO, and anatomically impossible suggestions.
Clearly, it will be embarrassing to the company if the distinguished judges view these entries; more importantly, you might also get fired. Fortunately, MySQL lets you specify your own list of ignored words by simply setting the ft_stopword_file variable to point at this site-specific list, which should be in a similar format to the ft_static.c file as shown earlier in this section. After the list is in place and all existing FULLTEXT indexes rebuilt, the judges won't even see these uncouth entries in their search results.
Maximum word size By setting the ft_max_word_len variable, you can specify the maximum number of characters that MySQL will include in a FULLTEXT search.
Minimum word size You also have control over the minimum word length for FULLTEXT search. The default value of four might be too low for your site because there are many distinct three-letter words that would be legitimate search candidates. If you need to change this behavior, alter the ft_min_word_len setting.
It's important to note that you need to rebuild your FULLTEXT indexes after changing any of the previous three parameters. To rebuild these indexes as quickly as possible, append the QUICK directive to the REPAIR TABLE statement.
Query expansion behavior Query expansion is an extremely helpful feature. When requested (via the WITH QUERY EXTENSION syntax), MySQL performs two search passes, using the most relevant matches from the initial resultset to find additional matches that the user might have wanted but was not able to correctly specify.
You can control how many of the initial results are recursively fed back into the query expansion. By raising the ft_query_expansion_limit from its default of 20, MySQL passes additional values to the follow-on query, whereas a lower value means fewer results will be considered during the second pass.

Building FULLTEXT Indexes

The next important decision to make about FULLTEXT searching is what parameters to set and when to build the index. If you try to launch a standard FULLTEXT search without the proper index in place, MySQL returns an error:

 mysql> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('appalled'); ERROR 1191 (HY000): Can't find FULLTEXT index matching the column list

You can still run queries against the underlying table; you can even use MATCH() in Boolean mode, but you are limited in your usage until the index is in place. Building a FULLTEXT index can take a very long time to complete, especially if the table is extremely large. For this reason, if at all possible it's wise to delay creating your index until the table is fully loaded. However, it's even more important that you correctly define your temporary sort settings (myisam_sort_buffer_size, myisam_max_sort_file_size, and myisam_max_extra_soft_file_size) when rebuilding your FULLTEXT indexes; these settings were described earlier in this chapter.

For example, we generated more than 4.5 million rows of random data and then loaded this information into the customer_letters table. The letter_body column contained, on average, 30 meaningful words.

After the table was built, we created a FULLTEXT index on the letter_body column. With our temporary sort settings defined correctly (that is, forcing the use of a disk-based sorting method rather than a key cache based approach), it took three hours to build the index.

After the data was loaded and indexed, we then unloaded all information into a 4GB flat file. After dropping the table and restarting the server, we reloaded it with the FULLTEXT index already defined. It still took about three hours to complete the reload and index rebuild.

Finally, we lowered values of the temporary sort settings to levels at which MySQL would be forced to use the key cache method of sorting for index creation. After restarting the server, we dropped the table and then re-created an empty copy. It took 11 hours (more than three times longer than before) to finish the task of loading and indexing the table. This highlights how important it is to use the proper settings when building a FULLTEXT index.

Using the FULLTEXT Index

After you've created the index, how can you tell if it's being used? As with all MySQL queries, the EXPLAIN command provides the answer. Let's look at several queries and their associated plans.

This first example runs a simple search through the customer correspondence:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('incompetent')\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where 1 row in set (0.00 sec)

You can see that this query will correctly use the FULLTEXT index to locate rows. Next, suppose that you want to take advantage of query expansion:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('incompetent' WITH QUERY EXPANSION)\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where 1 row in set (0.00 sec)

Again, no surprises in the EXPLAIN output. Now, suppose that you want to find all rows that contain two different words:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE (MATCH(letter_body) AGAINST ('delighted') AND     -> MATCH(letter_body) AGAINST ('delicious'))\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where 1 row in set (0.00 sec)

This is a good, efficient query plan; there are several other ways to achieve the same results. For example, you could also use a UNION statement:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('delighted')     -> UNION     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('unexpectedly')\ G *************************** 1. row ***************************            id: 1   select_type: PRIMARY         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where *************************** 2. row ***************************            id: 2   select_type: UNION         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where *************************** 3. row ***************************            id: NULL   select_type: UNION RESULT         table: <union1,2>          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: NULL         Extra: 3 rows in set (0.00 sec)

What about cases in which you want results containing at least one of two words?

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE (MATCH(letter_body) AGAINST ('delighted') OR     -> MATCH(letter_body) AGAINST ('delicious'))\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: ALL possible_keys: NULL           key: NULL       key_len: NULL           ref: NULL          rows: 4704049         Extra: Using where 1 row in set (0.00 sec)

This is not what you want to see. MySQL performs an extremely expensive table scan for queries written this way. Fortunately, there are better ways to create OR queries:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body) AGAINST ('delighted delicious')\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where 1 row in set (0.02 sec)

Boolean searches are also helpful for queries that need more flexible search criteria:

 mysql> EXPLAIN     -> SELECT * FROM customer_letters     -> WHERE MATCH(letter_body)     -> AGAINST ('+delighted +delicious' IN BOOLEAN MODE)\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: customer_letters          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where 1 row in set (0.00 sec)

Finally, the following looks at a simple join between two tables, using a FULLTEXT search as part of the filtering criteria:

 mysql> EXPLAIN     -> SELECT cm.*, cl.*     -> FROM customer_master cm, customer_letters cl     -> WHERE cm.customer_id = cl.customer_id     -> AND MATCH(cl.letter_body) AGAINST ('lawsuit')\ G *************************** 1. row ***************************            id: 1   select_type: SIMPLE         table: cl          type: fulltext possible_keys: cl_ix1           key: cl_ix1       key_len: 0           ref:          rows: 1         Extra: Using where *************************** 2. row ***************************            id: 1   select_type: SIMPLE         table: cm          type: eq_ref possible_keys: PRIMARY           key: PRIMARY       key_len: 4           ref: high_hat.cl.customer_id          rows: 1         Extra: Using where 2 rows in set (0.02 sec)

MySQL processes this query as you would expect. In general, if you're new to FULLTEXT searches, always run EXPLAIN before launching a noteworthy query.

General FULLTEXT Performance Suggestions

To get the most out of your FULLTEXT searches, keep these suggestions in mind:

MySQL AB continually releases newer software versions; generally, more modern versions have better FULLTEXT search performance.
For consistency's sake, verify that the server and myisamchk utility are both informed about your wishes for any FULLTEXT configuration settings. If you fail to do so, myisamchk will use the default values, not your customized settings. To avoid this problem, define the settings with identical values in two places within your configuration file: under both the [mysqld] and [myisamchk] sections.
It's a good idea to clear the query cache after changing your FULLTEXT settings or rebuilding your index.
Use indexed columns beginning in the leftmost position. This rule was covered in Chapter 7's indexing review and Chapter 8's study of advanced SQL. The same regulation applies for FULLTEXT indexes.
If possible, cleanse your data prior to loading it into your database. A FULLTEXT index is only as useful as the data that it catalogs. If your data set contains misspellings, abbreviations, and other inconsistencies, it makes FULLTEXT searches much harder to successfully invoke.

Concurrency Considerations

Chapter 9, "Developing High-Speed Applications," spent a considerable amount of time discussing concurrency. The following sections look at two MyISAM-specific settings that can affect concurrency and performance.

Concurrent Inserts

The concurrent_insert variable lets you dictate whether MyISAM allows simultaneous SELECT and INSERT statements on a table that has no empty space in its middle. For most applications, the concurrency-friendly default of ON is the appropriate choice.

Query Cache Locking

If you want to block other queries from seeing the contents of the query cache while another process has a write lock in place on a MyISAM table (but has not yet modified the table), set the query_cache_wlock_invalidate variable to ON/1. The main reason to restrict this behavior is if you are concerned that one of these other queries might suffer from very bad timing, and receive a stale, query cache based result. Of course, once the lock-owning process has altered the table's data, the query cache is invalidated no matter what you do.

The more liberal, default setting of OFF/0 lets these additional queries continue to access the query cache, even if the underlying table is locked for writing.

< Day Day Up >