Recipe 3.16. Choosing Appropriate LIMIT Values


Problem

LIMIT doesn't seem to do what you want it to.

Solution

Be sure that you understand what question you're asking. It may be that LIMIT is exposing some interesting subtleties in your data that you have not considered.

Discussion

LIMIT n is useful in conjunction with ORDER BY for selecting smallest or largest values from a result set. But does that actually give you the rows with the n smallest or largest values? Not necessarily! It does if your rows contain unique values, but not if there are duplicates. You may find it necessary to run a preliminary query first to help you choose the proper LIMIT value.

To see why this is, consider the following dataset, which shows the American League pitchers who won 15 or more games during the 2001 baseball season (you can find this data in the al_winner.sql file in the tables directory of the recipes distribution):

mysql> SELECT name, wins FROM al_winner     -> ORDER BY wins DESC, name; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | | Hudson, Tim    |   18 | | Abbott, Paul   |   17 | | Mays, Joe      |   17 | | Mussina, Mike  |   17 | | Sabathia, C.C. |   17 | | Zito, Barry    |   17 | | Buehrle, Mark  |   16 | | Milton, Eric   |   15 | | Pettitte, Andy |   15 | | Radke, Brad    |   15 | | Sele, Aaron    |   15 | +----------------+------+ 

If you want to know who won the most games, adding LIMIT 1 to the preceding statement gives you the correct answer because the maximum value is 21, and there is only one pitcher with that value (Mark Mulder). But what if you want the four highest game winners? The proper statements depend on what you mean by that, which can have various interpretations:

  • If you just want the first four rows, sort the rows, and add LIMIT 4:

    mysql> SELECT name, wins FROM al_winner     -> ORDER BY wins DESC, name     -> LIMIT 4; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | +----------------+------+ 

    That may not suit your purposes because LIMIT imposes a cutoff that occurs in the middle of a set of pitchers with the same number of wins (Tim Hudson also won 18 games).

  • To avoid making a cutoff in the middle of a set of rows with the same value, select rows with values greater than or equal to the value in the fourth row. Find out what that value is with LIMIT, and then use it in the WHERE clause of a second query to select rows:

    mysql> SELECT wins FROM al_winner     -> ORDER BY wins DESC, name     -> LIMIT 3, 1; +------+ | wins | +------+ |   18 | +------+ mysql> SELECT name, wins FROM al_winner     -> WHERE wins >= 18     -> ORDER BY wins DESC, name; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | | Hudson, Tim    |   18 | +----------------+------+ 

    To select these results in a single statement, without having to substitute the cutoff value from one statement manually into the other, use the first statement as a subquery of the second:

    mysql> SELECT name, wins FROM al_winner     -> WHERE wins >=     ->   (SELECT wins FROM al_winner     ->   ORDER BY wins DESC, name     ->   LIMIT 3, 1)     -> ORDER BY wins DESC, name; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | | Hudson, Tim    |   18 | +----------------+------+ 

  • If you want to know all the pitchers with the four largest wins values, another approach is needed. Determine the fourth-largest value with DISTINCT and LIMIT, and then use it to select rows:

    mysql> SELECT DISTINCT wins FROM al_winner     -> ORDER BY wins DESC, name     -> LIMIT 3, 1; +------+ | wins | +------+ |   17 | +------+ mysql> SELECT name, wins FROM al_winner     -> WHERE wins >= 17     -> ORDER BY wins DESC, name; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | | Hudson, Tim    |   18 | | Abbott, Paul   |   17 | | Mays, Joe      |   17 | | Mussina, Mike  |   17 | | Sabathia, C.C. |   17 | | Zito, Barry    |   17 | +----------------+------+ 

    As in the previous example, these statements can be combined into one by using a subquery:

    mysql> SELECT name, wins FROM al_winner     -> WHERE wins >=     ->   (SELECT DISTINCT wins FROM al_winner     ->   ORDER BY wins DESC, name     ->   LIMIT 3, 1)     -> ORDER BY wins DESC, name; +----------------+------+ | name           | wins | +----------------+------+ | Mulder, Mark   |   21 | | Clemens, Roger |   20 | | Moyer, Jamie   |   20 | | Garcia, Freddy |   18 | | Hudson, Tim    |   18 | | Abbott, Paul   |   17 | | Mays, Joe      |   17 | | Mussina, Mike  |   17 | | Sabathia, C.C. |   17 | | Zito, Barry    |   17 | +----------------+------+ 

For this dataset, each method yields a different result for "four highest." The moral is that the way you use LIMIT may require some thought about what you really want to know.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net