Database Optimization


As a conclusion to this chapter, I want to mention several performance-related techniques with respect to databases. The busier a site is, the more important each little tweak will become.

The performance of your database is primarily dependent upon its structure and indexes. When creating databases, try to

  • Choose the best storage engine

  • Use the smallest data type possible for each column

  • Define columns as NOT NULL whenever possible

  • Use integers as primary keys

  • Judiciously define indexes, selecting the correct type and applying them to the right column or columns

  • Limit indexes to a certain number of characters, if applicable

Along with these tips, there are two simple techniques for optimizing databases. I'll discuss each next.

Tips

  • Suppose you define an index on multiple columns, like this:

     ALTER TABLE tablename ADD INDEX  (col1, col2, col3) 

    This effectively creates an index for searches on col1, on col1 and col2 together, or on all three columns together. It does not provide an index for searching just col2 or col3 or those two together.

  • Another way to improve the speed of your databases is to tune how MySQL runs. You can do this by specifying different options, such as the key_buffer (memory allotted for indexes), max_connections (how many connections can be handled at one time), and table_cache (table buffer). More information on these parameters can be found in the MySQL manual.


Optimizing tables

MySQL manages its tables rather nicely, and you normally do not need to worry about them. But deleting records, as just one example, creates gaps in the tables, resulting in extra overhead.

One way to improve MySQL's performance is to run an OPTIMIZE command on such tables. This query will rid a table of any unnecessary overhead, thereby speeding any interactions with it.

To optimize a table

1.

Check the table's status (Figure 5.41).

 SHOW TABLE STATUS \G 

Figure 5.41. The url_associations table has 44 unused bytes.


Using the query explained earlier in the chapter, I'll inspect each table for gaps (the \G option makes it easier to read). The Data_free value shows the number of bytes that have been allocated but are not being used.

2.

Optimize the url_associations table (Figure 5.42).

 OPTIMIZE TABLE url_associations; 

Figure 5.42. Running an OPTIMIZE query on a table.


After running this query, MySQL will indicate the status of the table.

3.

Recheck the table's status.

 SHOW TABLE STATUS \G 

You should now see 0 for the Data_free value.

Tips

  • Optimizing a table to clear out 44 bytes is definitely overkill, used to demonstrate a process. You should not need to take this step very often, except after significant changes like deleting lots of rows or changing variable-length column definitions.

  • A common misconception for people new to the concept of an automatically incrementing primary key (like url_id) is that there shouldn't be gaps in the sequence (which occur after deleting a record). Such gaps are perfectly fine, and you should in no way consider renumbering the sequence (because primary keys shouldn't change).

  • MySQL tables of the MyISAM table type (the default in current versions of the software) can also be optimized using the separate myisamchk utility.


Explaining queries

The heart of a MySQL-driven application is the queries being run on the database, so making these as efficient as possible is of the utmost importance. To improve a query's efficiency, it helps to understand how exactly MySQL will run that query. This can be accomplished using the EXPLAIN SQL keyword. For example (Figure 5.43):

 EXPLAIN SELECT * FROM urls WHERE url_id=1 

Figure 5.43. Use EXPLAIN to show how MySQL handles queries.


The explanation given by MySQL for the query will help you understand where to add indexes to your tables and how to better tailor your SQL. The explanation will list tables involved, the type of join used, what possible indexes (or keys) MySQL might use for this query, which index is actually used, and the number of rows that MySQL must look at to run this query. I'll briefly describe what each value means and how it relates to you.

To start, select_type is the type of query being run. Normally this will be SIMPLE, unless you are using subqueries. The table column shows the table being used by the query. Queries involving multiple tables will list each in the order that the query will access them. The type is very important, reflected the type of join being used. Table 5.10 lists these options, from most efficient to least. If your type value says const, eq_ref, or ref, you're in good shape. If it says all, you may have a much slower query than you should.

Table 5.10. These are the most common type values you'll see in EXPLAIN results (system and const are the best; all is the worst).

EXPLAIN type Values

VALUE

NOTES

system

A special kind of const

const

Great: only one possible matching row (e.g., url_id=2)

eq_ref

Very good sign; indicates that primary or unique indexes are being used

ref

Not as good as eq_ref because it can't use the proper indexes

ref_or_null

Like ref, but may involve NULL values

index

Like all but an index is used

all

Worst possible value; add indexes!


The rows information is a major factor for how long a query takes to run. It indicates approximately how many rows will need to be scanned in order to execute the query. By multiplying the row values from all of the tables involved in a query, you get a sense of how big the scanning process is.

Finally, the possible_keys value indicates what indexes are available for use. The key value is what index will actually be used. If either or both of these are NULL, there may be a problem.

This is a lot of information and somewhat sophisticated at that, so I'll run through a quick sample as to how you might use this information.

To use explain

1.

Run an EXPLAIN query (Figure 5.44).

 EXPLAIN SELECT category, url FROM  url_associations AS ua, urls  AS u, url_categories AS uc WHERE  u.url_id=ua.url_id AND ua.url_  category_id=uc.url_category_id  ORDER BY uc.category \G 

Figure 5.44. Here an EXPLAIN shows how MySQL will handle a join across three tables.


This is an advanced join that uses all three tables and is therefore a good candidate for explanation. As you can see in the resulting image,

  • The final two tables both have a type of eq_ref and PRIMARY key values, which is great.

  • Only one row in each of the final two tables needs to be referenced for each row in the first, which is perfect. (The product of 4 * 1 * 1 is very small, and even if there are hundreds of records, the sum will still only be X * 1 * 1.)

  • The first table has a type of index, which isn't great but is better than all and is not uncommon with the first table involved in a join.

  • The first table does use an index.

All in all, this query is as efficient as it can be. The Using filesort value for the ua table reflects the fact that I'm sorting the query results. If I remove the ORDER BY clause, the query will run somewhat faster.

2.

If necessary, change your query or add indexes as needed.

This particular query is fine as is, because I already added the proper indexes to the tables. If indexes aren't available, you would want to add some. Alternatively, you can try redefining your query so that it's as efficient as possible.

3.

Run an ANALYZE TABLE query to update MySQL's index information (Figure 5.45).

 ANALYZE TABLE urls, url_associations,  url_categories; 

Figure 5.45. Have MySQL ANALYZE your tables to make sure it makes the most of your indexes.


The ANALYZE TABLE query forces MySQL to update its statistics for a table. It may improve the effectiveness of queries and EXPLAINs.

4.

Run the EXPLAIN query a second time to see if improvements have been made.

Not all queries can be improved, but what you should keep an eye on are:

  • The type values

  • If keys are being used

  • The product of all the rows values

Looking Ahead to MySQL 5

As MySQL ages, the developers have been adding more and more features (features long present in other database applications, to be frank). Some of the new topics discussed in this chapter were added in versions 4 and 4.1 of MySQL, and there is a lot planned for version 5.

The two big new expected features are views and stored procedures. Views are like saved query results. They're great if you have complex queries, like joins, that you frequently run or that require further querying.

Stored procedures are saved SQL commands. They offer better performance and greater security over standard SQL commands hard-coded into your applications.


Tips

  • For more information on EXPLAIN and ANALYZE, see the MySQL manual.

  • If your query relies on equating two columns (...WHERE col1=col2), MySQL cannot take advantage of any indexes if the two columns are of different sizes.




    PHP and MySQL for Dynamic Web Sites. Visual QuickPro Guide
    PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide (2nd Edition)
    ISBN: 0321336577
    EAN: 2147483647
    Year: 2005
    Pages: 166
    Authors: Larry Ullman

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net