9.3. Specifying Which Rows to Retrieve

If you specify no criteria for selecting records from a table, a SELECT statement retrieves every record in the table. This is often more information than you need, particularly for large tables. To be more specific about which rows are of interest, include a WHERE clause that describes the characteristics of those rows.

A WHERE clause can be as simple or complex as necessary to identify the rows that are relevant for your purposes. For example, to retrieve records from the Country table for those countries that achieved independence after the year 1990, it's sufficient to use a WHERE clause that specifies a single condition:

 SELECT * FROM Country WHERE IndepYear > 1990;

More complex WHERE clauses specify multiple conditions, which may be combined using logical operators such as AND and OR. The following statement returns rows with Population values in the range from 1 million to 2 million:

 SELECT * FROM Country WHERE Population >= 1000000 AND Population <= 2000000;

For testing values in a range, you can also use the BETWEEN operator:

 SELECT * FROM Country WHERE Population BETWEEN 1000000 AND 2000000;

Some operators have higher precedence than others. For example, AND has a higher precedence than OR. To control the order of evaluation of terms within a complex expression (or simply to make the evaluation order explicit), use parentheses to group expression terms. Consider the following WHERE clause:

 WHERE GNP < 1000 AND Continent = 'Africa' OR Continent = 'Asia'

Because AND has a higher precedence than OR, the preceding expression is equivalent to the following one:

 WHERE (GNP < 1000 AND Continent = 'Africa') OR Continent = 'Asia'

That expression finds all records with a GNP value less than 1000 that also have a Continent value of 'Africa', as well as all records with a Continent value of 'Asia' (regardless of their GNP value). However, a different placement of parentheses results in a very different meaning:

 WHERE GNP < 1000 AND (Continent = 'Africa' OR Continent = 'Asia')

That expression finds records that have a GNP value less than 1000 and a Continent value of either 'Africa' or 'Asia'.

More information on writing expressions can be found in Chapter 10, "SQL Expressions." Detailed descriptions of the operators and functions that you can use in expressions are provided in the MySQL Reference Manual.

It's possible to prevent SELECT statements that might generate a great deal of output from returning more than 1,000 rows. The mysql client supports this feature if you invoke it with the --safe-updates option. For more information, see Section 2.9, "Using the --safe-updates Option."

9.3.1. Using `ORDER BY` to Sort Query Results

By default, the rows in the result set produced by a SELECT statement are returned by the server to the client in no particular order. When you issue a query, the server is free to return the rows in any convenient order. This order can be affected by factors such as the order in which rows are actually stored in the table or which indexes are used to process the query. If you require output rows to be returned in a specific order, include an ORDER BY clause that indicates how to sort the results.

The examples in this section demonstrate ORDER BY using a table t that has the following contents (id is numeric, last_name and first_name are strings, and birth contains dates):

 mysql> SELECT id, last_name, first_name, birth FROM t; +------+-----------+------------+------------+ | id   | last_name | first_name | birth      | +------+-----------+------------+------------+ |    1 | Brown     | Bill       | 1972-10-14 | |    2 | Larsson   | Sven       | 1965-01-03 | |    3 | Brown     | Betty      | 1971-07-12 | |    4 | Larsson   | Selma      | 1968-05-29 | +------+-----------+------------+------------+

ORDER BY provides a great deal of flexibility for sorting result sets. It has the following characteristics:

You can name one or more columns, separated by commas, to use for sorting. With a single sort column, rows are sorted based on the values in that column:

 mysql> SELECT id, last_name, first_name, birth FROM t     -> ORDER BY birth; +------+-----------+------------+------------+ | id   | last_name | first_name | birth      | +------+-----------+------------+------------+ |    2 | Larsson   | Sven       | 1965-01-03 | |    4 | Larsson   | Selma      | 1968-05-29 | |    3 | Brown     | Betty      | 1971-07-12 | |    1 | Brown     | Bill       | 1972-10-14 | +------+-----------+------------+------------+

If there are additional sort columns, rows with the same value in the first sort column are sorted together, and are then further sorted using the values in the second and remaining sort columns. The following query sorts the Browns before the Larssons, and then within each group of rows with the same last name, sorts them by first name:

 mysql> SELECT id, last_name, first_name, birth FROM t     -> ORDER BY last_name, first_name; +------+-----------+------------+------------+ | id   | last_name | first_name | birth      | +------+-----------+------------+------------+ |    3 | Brown     | Betty      | 1971-07-12 | |    1 | Brown     | Bill       | 1972-10-14 | |    4 | Larsson   | Selma      | 1968-05-29 | |    2 | Larsson   | Sven       | 1965-01-03 | +------+-----------+------------+------------+

By default, ORDER BY sorts values in ascending order (smallest to largest). Any sort column may be followed with ASC if you want to specify ascending order explicitly. These ORDER BY clauses are equivalent:

 ORDER BY last_name, first_name ORDER BY last_name ASC, first_name ASC

To sort values in descending order (largest to smallest), follow the sort column name with DESC:

 mysql> SELECT id, last_name, first_name, birth FROM t     -> ORDER BY id DESC; +------+-----------+------------+------------+ | id   | last_name | first_name | birth      | +------+-----------+------------+------------+ |    4 | Larsson   | Selma      | 1968-05-29 | |    3 | Brown     | Betty      | 1971-07-12 | |    2 | Larsson   | Sven       | 1965-01-03 | |    1 | Brown     | Bill       | 1972-10-14 | +------+-----------+------------+------------+

When you name a column followed by ASC or DESC, the sort direction specifier applies only to that column. It doesn't affect sort direction for any other columns listed in the ORDER BY clause.

ORDER BY typically refers to table columns by name:
```
 SELECT last_name, first_name FROM t ORDER BY last_name, first_name; 
```
However, it's possible to refer to columns in other ways. If a column is given an alias in the output column list, you should refer to that column in the ORDER BY column by its alias:
```
 SELECT last_name AS last, first_name AS first FROM t ORDER BY last, first; 
```
Or you can specify a number corresponding to the column's position in the column output list (1 for the first output column, 2 for the second, and so forth) :
```
 SELECT last_name, first_name FROM t ORDER BY 1, 2; 
```
However, the syntax for specifying columns by position has been removed from the SQL Standard (in SQL:1999) and is obsolete. Application developers should consider using one of the other column specification methods.
It's possible to perform a sort using an expression result. If the expression appears in the output column list, you can use it for sorting by repeating it in the ORDER BY clause. Alternatively, you can refer to the expression by an alias given to it. The following queries each sort the output rows by month of the year:
```
 SELECT id, last_name, first_name, MONTH(birth)     FROM t ORDER BY MONTH(birth); SELECT id, last_name, first_name, MONTH(birth) AS m     FROM t ORDER BY m; 
```
You can also refer to the expression by its column position, although this is not recommended.

Output sorting can be based on values that don't appear in the output at all. The following statement displays month names in the output, but sorts the rows using the numeric month value:

 mysql> SELECT id, last_name, first_name, MONTHNAME(birth) FROM t     -> ORDER BY MONTH(birth); +------+-----------+------------+------------------+ | id   | last_name | first_name | MONTHNAME(birth) | +------+-----------+------------+------------------+ |    2 | Larsson   | Sven       | January          | |    4 | Larsson   | Selma      | May              | |    3 | Brown     | Betty      | July             | |    1 | Brown     | Bill       | October          | +------+-----------+------------+------------------+

ORDER BY doesn't require the sorted columns to be indexed, although a query might run faster if such an index does exist.
ORDER BY is useful together with LIMIT for selecting a particular section of a set of sorted rows. (See Section 9.3.3, "Limiting a Selection Using LIMIT.")
ORDER BY can be used with DELETE or UPDATE to force rows to be deleted or updated in a certain order. (These uses of ORDER BY are covered in Chapter 11, "Updating Data.")

9.3.2. The Natural Sort Order of Data Types

Each type of data managed by MySQL has its own natural sort order. For the most part, these orders are fairly intuitive. The rules for string sorting are the most complex because they depend on whether the strings are non-binary, binary, or come from ENUM or SET columns.

A numeric column sorts in ascending numeric order by default, or descending order if DESC is specified.
A temporal column sorts in ascending time order by default, with oldest values first and most recent values last. The order is reversed if DESC is specified.
The sort order for a string column that has a data type other than ENUM or SET depends on whether the column contains non-binary or binary values. Non-binary strings sort in the order defined by their collation. This order can be case sensitive or not, depending on the collation. Binary strings sort based on the numeric values of the bytes contained in the strings. For example, assume that a table t has a CHAR column c that has the latin1 character set and that contains the following values:
```
 mysql> SELECT c FROM t; +------+ | c    | +------+ | a    | | A    | | B    | | A    | | b    | | a    | +------+ 
```
A CHAR column is non-binary, so its contents sort according to the column's collation. If the collation is not case sensitive, values sort lexically without regard to lettercase:
```
 mysql> SELECT c FROM t ORDER BY c; +------+ | c    | +------+ | a    | | A    | | A    | | a    | | B    | | b    | +------+ 
```
Notice that the results come out in letter order, but the rows for a given letter are not further sorted by lettercase.
If the collation is case sensitive, lettercase becomes significant. You can force a string column sort to be case sensitive by using the COLLATE operator with a case-sensitive collation:
```
 mysql> SELECT c FROM t ORDER BY c COLLATE latin1_general_cs; +------+ | c    | +------+ | A    | | A    | | a    | | a    | | B    | | b    | +------+ 
```
If the collation is binary, numeric character values are the determining factor:
```
 mysql> SELECT c FROM t ORDER BY c COLLATE latin1_bin; +------+ | c    | +------+ | A    | | A    | | B    | | a    | | a    | | b    | +------+ 
```
The sort order for members of an ENUM or SET column is based on their internal numeric values. These values correspond to the order in which the enumeration or set members are listed in the column definition. Suppose that a table t contains a column mon that is an ENUM listing abbreviations for months of the year:
```
 CREATE TABLE t (     mon ENUM('Jan','Feb','Mar','Apr','May','Jun',              'Jul','Aug','Sep','Oct','Nov','Dec') ); 
```
Assume that table t has 12 rows, one for each of the possible enumeration values. When you sort this column, the values come out in month-of-year order:
```
 mysql> SELECT mon FROM t ORDER BY mon; +------+ | mon  | +------+ | Jan  | | Feb  | | Mar  | | Apr  | | May  | | Jun  | | Jul  | | Aug  | | Sep  | | Oct  | | Nov  | | Dec  | +------+ 
```
This occurs because 'Jan' through 'Dec' are assigned internal values 1 through 12 based on their order in the column definition, and those values determine the sort order. To produce a lexical string sort instead, use CAST() to convert the enumeration values to CHAR values:
```
 mysql> SELECT mon FROM t ORDER BY CAST(mon AS CHAR); +------+ | mon  | +------+ | Apr  | | Aug  | | Dec  | | Feb  | | Jan  | | Jul  | | Jun  | | Mar  | | May  | | Nov  | | Oct  | | Sep  | +------+ 
```
SET columns also sort using the internal values of the set's legal members. The ordering is more complex than with ENUM because values may consist of multiple SET members. For example, the following SET column contains three members:
```
 CREATE TABLE t (hue SET('red','green','blue')); 
```
Assume that t contains the following rows:
```
 mysql> SELECT hue FROM t; +----------------+ | hue            | +----------------+ | red,green      | | red,green,blue | | red,blue       | | green,blue     | +----------------+ 
```
The SET members 'red', 'green', and 'blue' have internal values of 1, 2, and 4, respectively. Thus, the rows of the table have internal numeric values of 1+2 = 3, 1+2+4 = 7, 1+4 = 5, and 2+4 = 6. An ORDER BY on the column sorts using those numeric values:
```
 mysql> SELECT hue FROM t ORDER BY hue; +----------------+ | hue            | +----------------+ | red,green      | | red,blue       | | green,blue     | | red,green,blue | +----------------+ 
```
As with ENUM, SET values can be sorted lexically by using CAST() to convert them to strings:
```
 mysql> SELECT hue FROM t ORDER BY CAST(hue AS CHAR); +----------------+ | hue            | +----------------+ | green,blue     | | red,blue       | | red,green      | | red,green,blue | +----------------+ 
```
NULL values in a column sort together at the beginning for ascending sorts and at the end for descending sorts.

9.3.3. Limiting a Selection Using `LIMIT`

MySQL supports a LIMIT clause in SELECT statements, which tells the server to return only some of the rows selected by the statement. This is useful for retrieving records based on their position within the set of selected rows.

LIMIT may be given with either one or two arguments:

 LIMIT row_count LIMIT skip_count, row_count

Each argument must be given as an integer constant. You cannot use expressions, user variables, and so forth.

When followed by a single integer, row_count, LIMIT returns the first row_count rows from the beginning of the result set. To select just the first 10 rows of a result set, use LIMIT 10:

 SELECT * FROM Country LIMIT 10;

When followed by two integers, skip_count and row_count, LIMIT skips the first skip_count rows from the beginning of the result set, and then returns the next row_count rows. To skip the first 20 rows and then return the next 10 rows, do this:

 SELECT * FROM Country LIMIT 20,10;

The single-argument form of LIMIT is applicable only when the rows you want to retrieve appear at the beginning of the result set. The two-argument form is more general and can be used to select an arbitrary section of rows from anywhere in the result set.

When you need only some of the rows selected by a query, LIMIT is an efficient way to obtain them. For a client application that fetches rows from the server, you get better performance by adding LIMIT to the query than by having the client fetch all the rows and discard all but the ones of interest. By using LIMIT, the unwanted rows never cross the network at all.

It's often helpful to include an ORDER BY clause to put the rows in a particular order when you use LIMIT. When ORDER BY and LIMIT are used together, MySQL applies ORDER BY first and then LIMIT. One common use for this is to find the row containing the smallest or largest values in a particular column. For example, to find the row in a table t containing the smallest id value, use this statement:

 SELECT * FROM t ORDER BY id LIMIT 1;

To find the largest value instead, use DESC to sort the rows in reverse:

 SELECT * FROM t ORDER BY id DESC LIMIT 1;

The two-argument form of LIMIT is useful in conjunction with ORDER BY for situations in which you want to process successive sections of a result set. For example, in Web applications, it's common to display the result of a large search across a series of pages that each present one section of the result. To retrieve sections of the search result this way, issue a series of statements that all specify the same number of rows to return in the LIMIT clause, but vary the number of initial rows to skip:

 SELECT * FROM t ORDER BY id LIMIT  0, 20; SELECT * FROM t ORDER BY id LIMIT 20, 20; SELECT * FROM t ORDER BY id LIMIT 40, 20; SELECT * FROM t ORDER BY id LIMIT 60, 20; ...

It's possible to abuse the LIMIT feature. For example, it isn't a good idea to use a clause such as LIMIT 1000000, 10 to return 10 rows from a query that normally would return more than a million rows. The server must still process the query to determine the first million rows before returning the 10 rows. It's better to use a WHERE clause to reduce the query result to a more manageable size, and then use LIMIT to pull rows from that reduced result. This also makes the use of ORDER BY with LIMIT more efficient because the server need not sort as large a row set before applying the limit.

The UPDATE and DELETE statements also support the use of LIMIT, which causes only a certain number of rows to be updated or deleted. See Chapter 11, "Updating Data."

9.3.4. Using `DISTINCT` to Eliminate Duplicates

If a query returns a result that contains duplicate rows, you can remove duplicates to produce a result set in which every row is unique. To do this, include the keyword DISTINCT after SELECT and before the output column list.

Suppose that a query returns a result set that contains duplicated rows:

 mysql> SELECT last_name FROM t; +-----------+ | last_name | +-----------+ | Brown     | | Larsson   | | Brown     | | Larsson   | +-----------+

Adding DISTINCT removes the duplicates and returns only unique rows:

 mysql> SELECT DISTINCT last_name FROM t; +-----------+ | last_name | +-----------+ | Brown     | | Larsson   | +-----------+

Duplicate elimination for string values happens differently for non-binary and binary strings. The strings 'ABC', 'Abc', and 'abc' are considered distinct if they're binary strings. If they are non-binary strings, they are considered distinct if they have different values based on their collation.

DISTINCT treats all NULL values within a given column as having the same value. Suppose that a table t contains the following rows:

 mysql> SELECT i, j FROM t; +------+------+ | i    | j    | +------+------+ |    1 |    2 | |    1 | NULL | |    1 | NULL | +------+------+

For purposes of DISTINCT, the NULL values in the second column are the same, so the second and third rows are identical. Adding DISTINCT to the query eliminates one of them as a duplicate:

 mysql> SELECT DISTINCT i, j FROM t; +------+------+ | i    | j    | +------+------+ |    1 |    2 | |    1 | NULL | +------+------+

Using DISTINCT is logically equivalent to using GROUP BY on all selected columns with no aggregate function. For such a query, GROUP BY just produces a list of distinct grouping values. If you display and group by a single column, the query produces the distinct values in that column. If you display and group by multiple columns, the query produces the distinct combinations of values in the column. For example, the following two queries produce the same set of rows:

 SELECT DISTINCT id FROM t; SELECT id FROM t GROUP BY id;

As do these:

 SELECT DISTINCT id, name FROM t; SELECT id, name FROM t GROUP BY id, name;

Another correspondence between the behavior of DISTINCT and GROUP BY is that for purposes of assessing distinctness, DISTINCT considers all NULL values the same. This is analogous to the way that GROUP BY groups NULL values.

A difference between DISTINCT and GROUP BY is that DISTINCT doesn't cause row sorting. In MySQL, GROUP BY does cause sorting.

DISTINCT can be used with the COUNT() function to count how many distinct values a column contains. In this case, NULL values are ignored:

 mysql> SELECT j FROM t; +------+ | j    | +------+ |    2 | | NULL | | NULL | +------+ mysql> SELECT COUNT(DISTINCT j) FROM t; +-------------------+ | COUNT(DISTINCT j) | +-------------------+ |                 1 | +-------------------+

COUNT(DISTINCT) is discussed further in Section 9.4.3, "The COUNT() Aggregate Function."

9.3.1. Using ORDER BY to Sort Query Results

9.3.2. The Natural Sort Order of Data Types

9.3.3. Limiting a Selection Using LIMIT

9.3.4. Using DISTINCT to Eliminate Duplicates

9.3.1. Using `ORDER BY` to Sort Query Results

9.3.3. Limiting a Selection Using `LIMIT`

9.3.4. Using `DISTINCT` to Eliminate Duplicates