To conclude this chapter, we show you selected techniques for improving the performance of your databases, queries, and MySQL server. We focus on how to choose and design indexes, tips for querying and database design, how to tune the server parameters, and how to use MySQL's query cache. 15.9.1 Index DesignAs discussed in Chapter 5, each table should have a PRIMARY KEY definition as part of its CREATE TABLE statement. A primary key is an attribute (or set of attributes) that uniquely identifies a row in a table. Storing two rows with the same primary key isn't permitted and an attempt to INSERT duplicate primary keys produces an error (unless you use the IGNORE modifier). The attribute values of the primary key are stored in an index to allow fast access to a row using the primary key values. The default index type for a MyISAM table type is fast for queries that find a specific row, a range of rows, for joins between tables, grouping data, ordering data, and finding minimum and maximum values. Indexes don't provide any speed improvement for retrieving all the rows in a table or for other query types. As discussed briefly in Chapter 5, indexes are also useful for fast access to rows by values other than those in the primary key. For example, in the customer table, you might define an index by adding it using: ALTER TABLE customer ADD INDEX namecity (surname,firstname,city); After you define this index, some queries that select a particular customer through a WHERE clause automatically use it. Consider an example: SELECT * FROM customer WHERE surname = 'Marzalla' AND firstname = 'Dimitria' AND city = 'St Albans'; This query can use the new index to quickly locate the row that matches the search criteria. Without the index, the server must scan all the rows in the customer table and compare each row to the WHERE clause. This might be quite slow and certainly requires significantly more disk activity than the index-based approach (assuming the table has more than a few rows). A particular feature of database servers is that they develop a query evaluation strategy and optimize it without any interaction from the user or programmer. If an index is available, and it makes sense to use it in the context of a query, the server does this automatically. All you need to do is identify which queries are common, and make an index available for those common queries by adding the KEY clause to the CREATE TABLE statement or using ALTER TABLE on an existing table. If you've created the namecity index, and you want to check that MySQL will use it for the previous query, you can do so with the EXPLAIN statement: EXPLAIN SELECT * FROM customer WHERE surname = 'Marzalla' AND firstname = 'Dimitria' AND city = 'St Albans'; This reports that: +----+-------------+----------+------+---------------+----------+---------+---- ---------------+------+--------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+----------+---------+---- ---------------+------+--------+ | 1 | SIMPLE | customer | ref | namecity | namecity | 153 | const, const,const | 1 | Using where | +----+-------------+----------+------+---------------+----------+---------+---- ---------------+------+--------+ 1 row in set (0.06 sec) You can see that the namecity index is listed as a possible choice in the possible_keys column and, as expected, it's the index that'll be used to evaluate the query as shown in the key column. The EXPLAIN statement is a useful diagnostic tool for understanding and optimizing complex queries but we don't discuss it in detail here; you can find out more about it in Section 5.2.1 of the MySQL manual. Careful index design is important. The namecity index we have defined can also speed queries other than those that supply a complete surname, firstname, and city. For example, consider a query: SELECT * FROM customer WHERE surname = 'LaTrobe' AND firstname = 'Anthony'; This query can also use the index namecity, because the index permits access to rows in sorted order first by surname, then firstname, and then city. With this sorting, all "LaTrobe, Anthony" index entries are clustered together in the index. Indeed, the index can also be used for the query: SELECT * FROM customer WHERE surname LIKE 'Mar%'; Similarly, all surnames beginning with "Mar" are clustered together in the index. You can use EXPLAIN to check that the index is being used. However, the index can't be used for a query such as: SELECT * FROM customer WHERE firstname = 'Dimitria' AND city = 'St Albans'; The index can't be used because the leftmost attribute named in the index, surname, isn't part of the WHERE clause. In this case, all rows in the customer table must be scanned and the query is much slower (again assuming there are more than a few rows in the customer table, and assuming there is no other index that could be used).
There are other cases in which an index can't be used, such as when a query contains an OR that isn't on an indexed attribute: SELECT * FROM customer WHERE surname = 'Marzalla' OR zipcode = "3001"; Again, the customer table must be completely scanned, because the second condition, zipcode="3001", requires all rows to be retrieved as there is no index available on the attribute zipcode. Also, the attributes that are combined together with OR must be the leftmost attributes in the index; otherwise the query requires a complete scan of the customer table. The following example requires a complete scan: SELECT * FROM customer WHERE firstname = 'Dimitria' OR surname = 'Marzalla'; If all the attributes in the index are used in all the queries, to optimize index size, the leftmost attribute in the KEY clause should be the attribute with the highest number of duplicate entries. Because indexes speed up queries, why wouldn't you create indexes on all the attributes you can possibly search on? The answer is that while indexes are fast for searching, they consume space and require updates each time rows are added or deleted, or key attributes are changed. So, if a database is largely static, additional indexes have low overheads, but if a database changes frequently, each additional index slows down the update process significantly. In either case, indexes consume additional space on disk and in memory, and unnecessary indexes should be avoided. One way to reduce the size of an index and speed updates is to create an index on a prefix of an attribute. Our namecity index uses considerable space: for each row in the customer table, an index entry is up to 150 characters in length because it is created from the combined values of the surname, firstname, and city attributes.[2]
To reduce space, you can define the index as: ALTER TABLE customer ADD INDEX namecity (surname(10),firstname(3),city(2)); This uses only the first 10 characters of surname, 3 of firstname, and 2 of city to distinguish index entries. This is quite reasonable, because 10 characters from a surname distinguishes between most surnames, and the addition of a few characters from a first name and the prefix of their city should be sufficient to uniquely identify almost all customers. Having a smaller index with less information can also mean that queries are actually faster, because more index information can be retrieved from disk per second, more of the index can fit into spare memory, and disk retrieval speed is almost always the bottleneck in query performance. The space saving is significant with a reduced index. A new index entry requires only 15 characters, a savings of up to 135 characters, so index insertions, deletions, and modifications are now likely to be much faster. Note that for TEXT and BLOB attribute types, a prefix must be taken when indexing, because indexing the entire attribute is impractical and isn't permitted by the MySQL server. 15.9.2 Design TipsCareful index design is one technique that improves speed and reduces resource requirements. However, design of your database, tables, attributes, and queries is also important. As discussed previously, accessing a hard disk is slow and is usually the bottleneck in database server performance. Therefore, most techniques described in this section improve performance by minimizing disk space and disk use. Reducing disk space requirements improves both disk seek and read performance. Disk read performance is improved because less data is required to be transferred, while seek performance is improved because the disk head has to move less on average when randomly accessing a smaller file than when accessing a larger file. Here are some simple ways to improve database server performance:
15.9.3 Server Tuning TipsComprehensive database tuning is a complex topic that fills many books. We include in this section only a few practical ideas to help you to begin to improve the performance of a database system. You can refer to the books in Appendix G for more information and also read Section 5.5 of the MySQL manual. MySQL includes is the mysqladmin tool for database administration. Details of the system setup can be found by running the following command in a Unix shell: % /usr/local/mysql/bin/mysqladmin -uroot -ppassword variables In Microsoft Windows, type the following into the Run dialog that's accessible through the Start menu: "C:\Program Files\EasyPHP1-7\mysql\bin\mysqladmin.exe" -uroot -ppassword variables Both commands assume you've followed our installation instructions in Appendix A through Appendix C. This shows, in part, the following selected system parameters: join_buffer current value: 131072 key_buffer current value: 8388600 net_buffer_length current value: 16384 record_buffer current value: 131072 sort_buffer current value: 2097144 table_cache current value: 64 Some of the important parameters are those that impact disk use. MySQL has several main-memory buffer parameters that control how much data is kept in memory for processing. These include:
In general, the larger these buffers, the more data from disk is cached or stored in memory and the fewer disk accesses are required. However, if the sum of these parameters is near to exceeding the size of the memory installed in the server, the operating system will start to swap data between disk and memory, and the MySQL server will be slow. In any case, careful experimentation based on the application is likely to improve server performance. Section 5.5.2 of the MySQL manual suggests parameter settings when starting the MySQL server. First, for machines with more than 256 MB of free memory, large tables in the database, and a moderate number of users, start your MySQL in Unix with: % /usr/local/mysql/bin/mysqld_safe -O key_buffer=64M -O table_cache=256 \ -O sort_buffer=4M -O read_buffer_size=1M & The following setting is appropriate for an application such as the online winestore, because many users are expected, the queries are largely index-based, and the database is small: mysqld_safe -O key_buffer=512k -O sort_buffer=16k \ -O table_cache=32 -O read_buffer_size=8k -O net_buffer_length=1K & There are two other parameters used in this example that we've not discussed. The table_cache parameter manages the maximum number of open tables per user connection, while the net_buffer parameter sets the minimum size of the network query buffer in which incoming queries are kept before they are executed. The SHOW STATUS and SHOW VARIABLES commands that are described at the beginning of this chapter can be used to report on MySQL's use and behavior. SHOW VARIABLES does the same thing as the mysqladmin command at the beginning of this section. SHOW STATUS gives a brief point-in-time summary of the server status and can help find more about the number of user connections, queries, and table use. This is useful input into changing the startup parameters we've just described. 15.9.4 Query CachingMySQL 4 features an optional query cache. When you activate it, the results of queries are stored in a memory buffer. If an identical query arrives later, the results are returned from the cache rather than the query being rerun. This is an excellent feature if your application runs many identical queries and your database doesn't change too often. Typically, this makes it an ideal tool for web database applications, and we recommend you use it for your applications; if you followed our Unix installation instructions in Appendix A through Appendix C, you've already enabled the query cache with default parameters. Consider an example. In our online winestore, the following query is executed every time any user visits the homepage. The query finds the latest three wines that have been stocked at the winestore and have been reviewed by a wine writer: SELECT wi.winery_name, w.year, w.wine_name, w.wine_id, w.description FROM wine w, winery wi, inventory i WHERE w.winery_id = wi.winery_id AND w.wine_id = i.wine_id AND w.description IS NOT NULL GROUP BY w.wine_id ORDER BY i.date_added DESC LIMIT 3; The query isn't fast to run: it uses three tables, two join conditions, a conditional check for a description, a GROUP BY clause, an ORDER BY clause, and the LIMIT modifier. However, it's an ideal candidate for query caching: the homepage is popular and new wines, wine reviews, and inventory are infrequently added. From simple experiments with our online winestore from the first edition of this book, we've found that adding query caching makes visiting the homepage almost three times faster. Query caching has several features:
However, there are some situations in which it isn't useful and you need to be careful:
15.9.4.1 Configuring query cachingQuery caching is off by default when you install MySQL 4.1. To turn it on, you need to edit your MySQL configuration file that you installed when following the installation procedure in Appendix A through Appendix C. In a Unix environment, edit the file /etc/my.cnf, and in Microsoft Windows edit the my.ini file in C:\winnt for Windows 2000/2003/NT or C:\windows for Windows XP. Find the section that beings with the heading: # The MySQL server [mysqld] Add the following statements to the end of that section: query_cache_size = 16M query_cache_type = 1 query_cache_min_res_unit = 4K query_cache_limit = 1M You need to add the first parameter query_cache_size that defines how much memory the cache uses (we've decided on 16 MB): the default is 0, which disables caching. The remaining three statements are optional, but we've included them with their default settings anyway. The query_cache_type parameter defines whether caching is off (0), on (1), or only used when you ask for it (2); we discuss how to control which queries are cached in the next section. The third parameter query_cache_min_res_unit defines the minimum memory block size for a cached result set, and the default of 4 KB. works well. The last parameter query_cache_limit defines the maximum size of a result set that can be cached, and the 1 MB default is a sensible choice. After you've made the changes, save the file, and restart your MySQL using the method described for your platform in Appendix A through Appendix C. Alternatively, reboot your machine. You now have caching enabled. If you repeat a query such as SELECT * FROM customer twice or more, you can check that the cache is in action by using the SHOW STATUS command from the command interpreter. This outputs, in part, the following: | Qcache_queries_in_cache | 1 | | Qcache_inserts | 1 | | Qcache_hits | 3 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_free_memory | 16709128 | | Qcache_free_blocks | 1 | | Qcache_total_blocks | 4 | In our example, you can see that our system has just started up: there's one query in the cache, only one query has ever been inserted, it's been re-run three times, and no queries that couldn't be cached have ever been run. There's also plenty of memory free. 15.9.4.2 Controlling query cachingAfter you've got caching turned on, you can control whether an individual query is cached or not. If you don't prevent caching, and the query_cache_type parameter is set to its default of 1, all queries are cached (with the exception of those queries that it doesn't make sense to cache, as discussed previously). If the query_cache_type parameter is set to 2, only those queries you ask to be cached will be cached. Here's an example of how to explicitly cache a query: SELECT SQL_CACHE * FROM customer; Here's an example of how not to cache a query: SELECT SQL_NO_CACHE * FROM customer; Caching only works with SELECT queries. It doesn't make sense to cache dynamic DELETE, INSERT, or UPDATE queries (or their variants such as TRUNCATE). |