Recipe 12.6. Finding Rows Containing Per-Group Minimum or Maximum Values


Problem

You want to find which row within each group of rows in a table contains the maximum or minimum value for a given column. For example, you want to determine the most expensive painting in your collection for each artist.

Solution

Create a temporary table to hold the per-group maximum or minimum values, and then join the temporary table with the original one to pull out the matching row for each group. If you prefer a single-query solution, use a subquery in the FROM clause rather than a temporary table.

Discussion

Many questions involve finding largest or smallest values in a particular table column, but it's also common to want to know what the other values are in the row that contains the value. For example, when you are using the artist and painting tables, it's possible to answer questions like "What is the most expensive painting in the collection, and who painted it?" One way to do this is to store the highest price in a user-defined variable and then use the variable to identify the row containing the price so that you can retrieve other columns from it:

mysql> SET @max_price = (SELECT MAX(price) FROM painting); mysql> SELECT artist.name, painting.title, painting.price     -> FROM artist INNER JOIN painting     -> ON painting.a_id = artist.a_id     -> WHERE painting.price = @max_price; +----------+---------------+-------+ | name     | title         | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa |    87 | +----------+---------------+-------+ 

The same thing can be done by creating a temporary table to hold the maximum price and then joining it with the other tables:

mysql> CREATE TABLE tmp SELECT MAX(price) AS max_price FROM painting; mysql> SELECT artist.name, painting.title, painting.price     -> FROM artist INNER JOIN painting INNER JOIN tmp     -> ON painting.a_id = artist.a_id     -> AND painting.price = tmp.max_price; +----------+---------------+-------+ | name     | title         | price | +----------+---------------+-------+ | Da Vinci | The Mona Lisa |    87 | +----------+---------------+-------+ 

The techniques of using a user-defined variable or a temporary table as just shown were illustrated originally in Section 8.5. Their use here is similar except that now we are applying them to multiple tables.

On the face of it, using a temporary table and a join is just a more complicated way of answering the question than with a user-defined variable. Does this technique have any practical value? Yes, it does, because it leads to a more general technique for answering more difficult questions. The previous statements show information only for the single most expensive painting in the entire painting table. What if your question is, "What is the most expensive painting for each artist?" You can't use a user-defined variable to answer that question, because the answer requires finding one price per artist, and a variable can hold only a single value at a time. But the technique of using a temporary table works well, because the table can hold multiple rows, and a join can find matches for all of them.

To answer the question, select each artist ID and the corresponding maximum painting price into a temporary table. The table will contain not just the maximum painting price but the maximum within each group, where "group" is defined as "paintings by a given artist." Then use the artist IDs and prices stored in the tmp table to match rows in the painting table, and join the result with the artist table to get the artist names:

mysql> CREATE TABLE tmp     -> SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id; mysql> SELECT artist.name, painting.title, painting.price     -> FROM artist INNER JOIN painting INNER JOIN tmp     -> ON painting.a_id = artist.a_id     -> AND painting.a_id = tmp.a_id     -> AND painting.price = tmp.max_price; +----------+-------------------+-------+ | name     | title             | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa     |    87 | | Van Gogh | The Potato Eaters |    67 | | Renoir   | Les Deux Soeurs   |    64 | +----------+-------------------+-------+ 

To obtain the same result with a single statement, use a subquery in the FROM clause that retrieves the same rows contained in the temporary table:

mysql> SELECT artist.name, painting.title, painting.price     -> FROM artist INNER JOIN painting INNER JOIN     -> (SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id)     ->   AS tmp     -> ON painting.a_id = artist.a_id     -> AND painting.a_id = tmp.a_id     -> AND painting.price = tmp.max_price; +----------+-------------------+-------+ | name     | title             | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa     |    87 | | Van Gogh | The Potato Eaters |    67 | | Renoir   | Les Deux Soeurs   |    64 | +----------+-------------------+-------+ 

Yet another way to answer maximum-per-group questions is to use a LEFT JOIN that joins a table to itself. The following statement identifies the highest-priced painting per artist ID (we are using IS NULL to select all the rows from p1 for which there is no row in p2 with a higher price):

mysql> SELECT p1.a_id, p1.title, p1.price     -> FROM painting AS p1 LEFT JOIN painting AS p2     -> ON p1.a_id = p2.a_id AND p1.price < p2.price     -> WHERE p2.a_id IS NULL; +------+-------------------+-------+ | a_id | title             | price | +------+-------------------+-------+ |    1 | The Mona Lisa     |    87 | |    3 | The Potato Eaters |    67 | |    5 | Les Deux Soeurs   |    64 | +------+-------------------+-------+ 

To display artist names rather than ID values, join the result of the LEFT JOIN to the artist table:

mysql> SELECT artist.name, p1.title, p1.price     -> FROM painting AS p1 LEFT JOIN painting AS p2     -> ON p1.a_id = p2.a_id AND p1.price < p2.price     -> INNER JOIN artist ON p1.a_id = artist.a_id     -> WHERE p2.a_id IS NULL; +----------+-------------------+-------+ | name     | title             | price | +----------+-------------------+-------+ | Da Vinci | The Mona Lisa     |    87 | | Van Gogh | The Potato Eaters |    67 | | Renoir   | Les Deux Soeurs   |    64 | +----------+-------------------+-------+ 

The selfLEFT JOIN method is perhaps somewhat less intuitive than using a temporary table or a subquery.

The techniques just shown work for other kinds of values, such as temporal values. Consider the driver_log table that lists drivers and trips that they've taken:

mysql> SELECT name, trav_date, miles     -> FROM driver_log     -> ORDER BY name, trav_date; +-------+------------+-------+ | name  | trav_date  | miles | +-------+------------+-------+ | Ben   | 2006-08-29 |   131 | | Ben   | 2006-08-30 |   152 | | Ben   | 2006-09-02 |    79 | | Henry | 2006-08-26 |   115 | | Henry | 2006-08-27 |    96 | | Henry | 2006-08-29 |   300 | | Henry | 2006-08-30 |   203 | | Henry | 2006-09-01 |   197 | | Suzi  | 2006-08-29 |   391 | | Suzi  | 2006-09-02 |   502 | +-------+------------+-------+ 

One type of maximum-per-group problem for this table is "show the most recent trip for each driver." It can be solved with a temporary table like this:

mysql> CREATE TABLE tmp     -> SELECT name, MAX(trav_date) AS trav_date     -> FROM driver_log GROUP BY name; mysql> SELECT driver_log.name, driver_log.trav_date, driver_log.miles     -> FROM driver_log INNER JOIN tmp     -> ON driver_log.name = tmp.name AND driver_log.trav_date = tmp.trav_date     -> ORDER BY driver_log.name; +-------+------------+-------+ | name  | trav_date  | miles | +-------+------------+-------+ | Ben   | 2006-09-02 |    79 | | Henry | 2006-09-01 |   197 | | Suzi  | 2006-09-02 |   502 | +-------+------------+-------+ 

You can also use a subquery in the FROM clause like this:

mysql> SELECT driver_log.name, driver_log.trav_date, driver_log.miles     -> FROM driver_log INNER JOIN     -> (SELECT name, MAX(trav_date) AS trav_date     -> FROM driver_log GROUP BY name) AS tmp     -> ON driver_log.name = tmp.name AND driver_log.trav_date = tmp.trav_date     -> ORDER BY driver_log.name; +-------+------------+-------+ | name  | trav_date  | miles | +-------+------------+-------+ | Ben   | 2006-09-02 |    79 | | Henry | 2006-09-01 |   197 | | Suzi  | 2006-09-02 |   502 | +-------+------------+-------+ 

Which technique is better: the temporary table or the subquery in the FROM clause? For small tables, there might not be much difference either way. If the temporary table or subquery result is large, a general advantage of the temporary table is that you can index it after creating it and before using it in a join.

See Also

This recipe shows how to answer maximum-per-group questions by selecting summary information into a temporary table and joining that table to the original one or by using a subquery in the FROM clause. These techniques have application in many contexts. One of them is calculation of team standings, where the standings for each group of teams are determined by comparing each team in the group to the team with the best record. Section 12.7 discusses how to do this.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net