This section discusses some general techniques that you can apply on a case-by-case basis to individual queries. 22.3.1. Query Rewriting TechniquesThe way you write a query often affects how well indexes are used. Use the following principles to make your queries more efficient:
22.3.2. Using EXPLAIN to Obtain Optimizer InformationIn the preceding section, an example is shown in which it is stated that of the following two queries, the second can be executed more efficiently because the optimizer can use an index on the column d: SELECT * FROM t WHERE YEAR >= 1994; SELECT * FROM t WHERE d >= '1994-01-01'; To verify whether MySQL actually will use an index to process the second query, use the EXPLAIN statement to get information from the optimizer about the execution plans it would use. For the two date-selection queries just shown, you might find that EXPLAIN tells you something like this: mysql> EXPLAIN SELECT * FROM t WHERE YEAR >= 1994\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 867038 Extra: Using where mysql> EXPLAIN SELECT * FROM t WHERE d >= '1994-01-01'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t type: range possible_keys: d key: d key_len: 4 ref: NULL rows: 70968 Extra: Using where These results indicate that the second query is indeed better from the optimizer's point of view. MySQL can perform a range scan using the index for the column d, drastically reducing the number of rows that need to be examined. (The rows value drops from 867,038 to 70,968.) The use of EXPLAIN is not covered further here. See Section 37.2, "Using EXPLAIN to Analyze Queries," for more information. 22.3.3. Optimizing Queries by Limiting OutputSome optimizations can be done independently of whether indexes are used. A simple but effective technique is to reduce the amount of output a query produces. One way to eliminate unnecessary output is by using a LIMIT clause. If you don't need the entire result set, specify how many rows the server should return by including LIMIT in your query. This helps in two ways:
Don't use LIMIT as a way to pull out just a few rows from a gigantic result set. For example, if a table has millions of rows, the following statement does not become efficient simply because it uses LIMIT: SELECT * FROM t LIMIT 10; Instead, try to use a WHERE clause that restricts the result so that the server doesn't retrieve as many rows in the first place. Another way to reduce query output is to limit it "horizontally." Select only the columns you need, rather than using SELECT * to retrieve all columns. Suppose that you want information about countries having names that begin with 'M'. The following query produces that information, but it also produces every other column as well: SELECT * FROM Country WHERE Name LIKE 'M%'; If all you really want to know is the country names, don't write the query like that. Most of the information retrieved will be irrelevant to what you want to know, resulting in unnecessary server effort and network traffic. Instead, select specifically just the Name column: SELECT Name FROM Country WHERE Name LIKE 'M%'; The second query is faster because MySQL has to return less information when you select just one column rather than all of them. In addition, if an index on Name exists, you get even more improvement for two reasons:
22.3.4. Using Summary TablesSuppose that you run an analysis consisting of a set of retrievals that each perform a complex SELECT of a set of records (perhaps using an expensive join), and that differ only in the way they summarize the records. That's inefficient because it unnecessarily does the work of selecting the records repeatedly. A better technique is to select the records once, and then use them to generate the summaries. In such a situation, consider the following strategy:
The technique of using a summary table has several benefits:
The following example creates a summary table containing the average GNP value of countries in each continent. Then it compares the summary information to individual countries to find those countries with a GNP much less than the average and much more than the average. First, create the summary table: mysql> CREATE TABLE ContinentGNP -> SELECT Continent, AVG(GNP) AS AvgGNP -> FROM Country GROUP BY Continent; mysql> SELECT * FROM ContinentGNP; +---------------+---------------+ | Continent | AvgGNP | +---------------+---------------+ | Asia | 150105.725490 | | Europe | 206497.065217 | | North America | 261854.789189 | | Africa | 10006.465517 | | Oceania | 14991.953571 | | Antarctica | 0.000000 | | South America | 107991.000000 | +---------------+---------------+ Next, compare the summary table to the original table to find countries that have a GNP less than 1% of the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP < ContinentGNP.AvgGNP * .01 -> ORDER BY Country.Continent, Country.Name; +-----------+---------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +-----------+---------------+------------+-----------------+ | Asia | Bhutan | 372.00 | 150105.725490 | | Asia | East Timor | 0.00 | 150105.725490 | | Asia | Laos | 1292.00 | 150105.725490 | | Asia | Maldives | 199.00 | 150105.725490 | | Asia | Mongolia | 1043.00 | 150105.725490 | | Europe | Andorra | 1630.00 | 206497.065217 | | Europe | Faroe Islands | 0.00 | 206497.065217 | | Europe | Gibraltar | 258.00 | 206497.065217 | ... Use the summary table again to find countries that have a GNP more than 10 times the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP > ContinentGNP.AvgGNP * 10 -> ORDER BY Country.Continent, Country.Name; +---------------+---------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +---------------+---------------+------------+-----------------+ | Asia | Japan | 3787042.00 | 150105.725490 | | Europe | Germany | 2133367.00 | 206497.065217 | | North America | United States | 8510700.00 | 261854.789189 | | Africa | South Africa | 116729.00 | 10006.465517 | | Oceania | Australia | 351182.00 | 14991.953571 | +---------------+---------------+------------+-----------------+ Use of summary tables has the disadvantage that the records they contain are up to date only as long as the original values remain unchanged, and thus so are any summaries calculated from them. If the original table rarely or never changes, this might be only a minor concern. For many applications, summaries that are close approximations are sufficiently accurate. The summary table technique can be applied at multiple levels. Create a summary table that holds the results of an initial summary, and then summarize that table in different ways to produce secondary summaries. This avoids the computational expense of generating the initial summary repeatedly. When a summary consists of a single value, you need not create a table at all. If you assign the value to a user variable, you can use the variable for comparison purposes in subsequent queries without having to calculate the value again. 22.3.5. Optimizing UpdatesThe optimizations discussed so far have been shown for SELECT statements, but optimization techniques can be used for statements that update tables, too:
|