< Day Day Up > |
Now that you've examined how to make the most of MyISAM's disk and memory structures, this section devotes attention to improving performance for those applications that use this storage engine. Loading InformationOperations that load data can consume massive amounts of time and system resources. Chapter 15, "Improving Import and Export Operations," which is dedicated to import and export processing, discusses general best practices for these types of procedures. For now, the following sections look at several key MyISAM-specific parameters that affect data load performance. Specifying Table SizeChapter 4, "Designing for Speed," reviewed the implications of tuning the average row length and row size parameters (AVG_ROW_LENGTH and MAX_ROWS, respectively) for MyISAM tables. If you are unsure of what these values should be, but still want the option to create very large MyISAM tables, you can raise the value of the myisam_data_pointer_size parameter. Its default value of four (that is, 32 bits) translates to sufficient row pointer space to support a maximum table size of 4GB, assuming that your operating system allows a file this large. By raising it (to a maximum value of eight, or 64 bits), you increase the potential size of the table to a truly enormous value, but only for those tables created after the parameter has been changed; existing tables still face the previous size limitations. The trade-off is a slightly increased overhead cost for tables that are not this large. Unless you really need automatic support for such big tables (such as for system-generated temporary tables that exceed 4GB), it's wise to just leave this parameter set to its default; you always have the option to raise the MAX_ROWS value or use MySQL's built-in RAID capabilities to support any necessary enormous user-created tables. Delaying Key WritesThe MyISAM key cache was first mentioned in Chapter 7; this vital engine feature is discussed in more detail a little later in this chapter. For now, this section spends a moment discussing the benefits versus drawbacks of delaying key writes. When you create a MyISAM table, MySQL gives you the option to request, by appending DELAY KEY WRITE=1 to your CREATE TABLE statement, that updates made to the key cache not be flushed to disk until the table is closed. Although this postpones the inevitable disk writing until it can be completed all at once, you do run the risk of lost or corrupted data should your server crash before the write has been completed. You can use the delay_key_write server variable to further configure this functionality. For example, if it is set to OFF, MyISAM flushes the key cache when necessary, regardless of whether the table creator wanted delayed key writes. Bulk Insert BufferingMySQL offers special in-memory buffers for situations when you perform certain kinds of data loading operations for nonempty tables, such as multi-row INSERT, INSERT ... SELECT, and LOAD DATA INFILE statements. You can control the size of this buffer by setting the bulk_insert_buffer_size server variable. To measure the performance impact of tuning bulk_insert_buffer_size, we created several empty MyISAM-based sample tables, composing each table with a variety of data types. To create a baseline performance measurement, we ran mysqlimport and loaded approximately 100 million records into each table. Next, we deleted all rows from each table, set bulk_insert_buffer_size to zero (which canceled out the internal cache), restarted the server, and then reran mysqlimport. Interestingly, we saw no performance degradation when reloading these tables; it took the same amount of time as when the internal cache was enabled. Why? Quite simply, MySQL doesn't use this cache on completely empty tables. To take advantage of this cache, be certain that your tables contain some data, even if it is a single sample row that you will delete post-reload. We then reran the test with each table containing only one row. With these settings, there was a dramatic difference in performance: This time, when the cache was disabled, the import operations took, on average, 40% longer than with an 8MB cache enabled. Finally, we tested the impact of larger caches. As you would expect, the law of diminishing returns set in; enormous values for this parameter did not translate into corresponding performance gains. Deactivating IndexesDeactivating index updates is one way to speed inserts of large amounts of data into an indexed MyISAM table. To do this, append --keys-used to myisamchk, along with a bitmask indicating the index that you want to deactivate. If you pass in a zero, all indexes are ignored when new data is inserted into the table. After you have loaded your information, run myisamchk with the r option to check the table's integrity and then rebuild the indexes. Improving FULLTEXT SearchesGiven that free-form, character-based data makes up a significant portion of the information tracked by many relational database applications, MyISAM's FULLTEXT search capabilities make it much easier to quickly locate and retrieve relevant results. As entire books have been written regarding optimal search algorithms, this section primarily focuses on ways to advance the speed at which these specific types of searches operate. Before getting started, how does MySQL implement FULLTEXT searches? First, your table must be designed to use the MyISAM storage engine, and cannot use MyISAM-based MERGE tables. Next, MySQL will only create FULLTEXT indexes for columns that are defined as CHAR, VARCHAR, or TEXT. The column cannot use the ucs2 character set. Finally, all columns in a single FULLTEXT index must have the same character set and collation. You can request these indexes as part of any of these operations:
To build the index, MySQL walks through the text in the identified column(s), breaking the text down into individual words that are then placed into the index, along with a pointer back to the data where the word is to be found. After the index is in place, you can then search it via the MATCH() function. Now that we've briefly explained how these indexes are created and used, let's describe a scenario and then set up a sample table to help us illustrate how to design efficient FULLTEXT queries. For years, High-Hat Airways' executives have told the CEO that their airline's customer satisfaction metrics are the highest in the world. The CEO, thrilled with this ongoing customer contentment, decides to launch a contest to find the happiest High-Hat client. Customers will be encouraged to write letters to the airline, describing their best High-Hat experience. The winner of the contest will be determined by a panel of distinguished judges, who will award a fabulous prize an entire wardrobe of High-Hat merchandise, from baseball caps to t-shirts to luxury slippers. These judges will use a web-based user interface to search the millions of expected entries for certain keywords and phrases, and then vote for their favorites. As designer of the application, you create a new table to hold these entries: CREATE TABLE customer_letters ( id INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT, letter_date DATE NOT NULL, customer_id INTEGER NOT NULL, letter_body LONGTEXT NOT NULL ) ENGINE = MYISAM; After determining your table design, as well as the fact that you need FULLTEXT search capabilities, how should you configure this capability? Configuring FULLTEXT OptionsYou should keep several key parameters in mind as you set up your FULLTEXT search options:
Building FULLTEXT IndexesThe next important decision to make about FULLTEXT searching is what parameters to set and when to build the index. If you try to launch a standard FULLTEXT search without the proper index in place, MySQL returns an error: mysql> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('appalled'); ERROR 1191 (HY000): Can't find FULLTEXT index matching the column list You can still run queries against the underlying table; you can even use MATCH() in Boolean mode, but you are limited in your usage until the index is in place. Building a FULLTEXT index can take a very long time to complete, especially if the table is extremely large. For this reason, if at all possible it's wise to delay creating your index until the table is fully loaded. However, it's even more important that you correctly define your temporary sort settings (myisam_sort_buffer_size, myisam_max_sort_file_size, and myisam_max_extra_soft_file_size) when rebuilding your FULLTEXT indexes; these settings were described earlier in this chapter. For example, we generated more than 4.5 million rows of random data and then loaded this information into the customer_letters table. The letter_body column contained, on average, 30 meaningful words. After the table was built, we created a FULLTEXT index on the letter_body column. With our temporary sort settings defined correctly (that is, forcing the use of a disk-based sorting method rather than a key cache based approach), it took three hours to build the index. After the data was loaded and indexed, we then unloaded all information into a 4GB flat file. After dropping the table and restarting the server, we reloaded it with the FULLTEXT index already defined. It still took about three hours to complete the reload and index rebuild. Finally, we lowered values of the temporary sort settings to levels at which MySQL would be forced to use the key cache method of sorting for index creation. After restarting the server, we dropped the table and then re-created an empty copy. It took 11 hours (more than three times longer than before) to finish the task of loading and indexing the table. This highlights how important it is to use the proper settings when building a FULLTEXT index. Using the FULLTEXT IndexAfter you've created the index, how can you tell if it's being used? As with all MySQL queries, the EXPLAIN command provides the answer. Let's look at several queries and their associated plans. This first example runs a simple search through the customer correspondence: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('incompetent')\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec) You can see that this query will correctly use the FULLTEXT index to locate rows. Next, suppose that you want to take advantage of query expansion: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('incompetent' WITH QUERY EXPANSION)\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec) Again, no surprises in the EXPLAIN output. Now, suppose that you want to find all rows that contain two different words: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE (MATCH(letter_body) AGAINST ('delighted') AND -> MATCH(letter_body) AGAINST ('delicious'))\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec) This is a good, efficient query plan; there are several other ways to achieve the same results. For example, you could also use a UNION statement: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('delighted') -> UNION -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('unexpectedly')\ G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where *************************** 2. row *************************** id: 2 select_type: UNION table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where *************************** 3. row *************************** id: NULL select_type: UNION RESULT table: <union1,2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 3 rows in set (0.00 sec) What about cases in which you want results containing at least one of two words? mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE (MATCH(letter_body) AGAINST ('delighted') OR -> MATCH(letter_body) AGAINST ('delicious'))\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4704049 Extra: Using where 1 row in set (0.00 sec) This is not what you want to see. MySQL performs an extremely expensive table scan for queries written this way. Fortunately, there are better ways to create OR queries: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) AGAINST ('delighted delicious')\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.02 sec) Boolean searches are also helpful for queries that need more flexible search criteria: mysql> EXPLAIN -> SELECT * FROM customer_letters -> WHERE MATCH(letter_body) -> AGAINST ('+delighted +delicious' IN BOOLEAN MODE)\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer_letters type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec) Finally, the following looks at a simple join between two tables, using a FULLTEXT search as part of the filtering criteria: mysql> EXPLAIN -> SELECT cm.*, cl.* -> FROM customer_master cm, customer_letters cl -> WHERE cm.customer_id = cl.customer_id -> AND MATCH(cl.letter_body) AGAINST ('lawsuit')\ G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cl type: fulltext possible_keys: cl_ix1 key: cl_ix1 key_len: 0 ref: rows: 1 Extra: Using where *************************** 2. row *************************** id: 1 select_type: SIMPLE table: cm type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: high_hat.cl.customer_id rows: 1 Extra: Using where 2 rows in set (0.02 sec) MySQL processes this query as you would expect. In general, if you're new to FULLTEXT searches, always run EXPLAIN before launching a noteworthy query. General FULLTEXT Performance SuggestionsTo get the most out of your FULLTEXT searches, keep these suggestions in mind:
Concurrency ConsiderationsChapter 9, "Developing High-Speed Applications," spent a considerable amount of time discussing concurrency. The following sections look at two MyISAM-specific settings that can affect concurrency and performance. Concurrent InsertsThe concurrent_insert variable lets you dictate whether MyISAM allows simultaneous SELECT and INSERT statements on a table that has no empty space in its middle. For most applications, the concurrency-friendly default of ON is the appropriate choice. Query Cache LockingIf you want to block other queries from seeing the contents of the query cache while another process has a write lock in place on a MyISAM table (but has not yet modified the table), set the query_cache_wlock_invalidate variable to ON/1. The main reason to restrict this behavior is if you are concerned that one of these other queries might suffer from very bad timing, and receive a stale, query cache based result. Of course, once the lock-owning process has altered the table's data, the query cache is invalidated no matter what you do. The more liberal, default setting of OFF/0 lets these additional queries continue to access the query cache, even if the underlying table is locked for writing. |
< Day Day Up > |