Recipe 8.4. Using DISTINCT to Eliminate Duplicates


Problem

You want to know which values are present in a set of values, without displaying duplicate values multiple times. Or you want to know how many distinct values there are.

Solution

Use DISTINCT to select unique values or COUNT(DISTINCT) to count them.

Discussion

One summary operation that doesn't use aggregate functions is to determine which values or rows are contained in a dataset by eliminating duplicates. Do this with DISTINCT (or DISTINCTROW, which is synonymous). DISTINCT is useful for boiling down a query result, and often is combined with ORDER BY to place the values in more meaningful order. For example, to determine the names of the drivers listed in the driver_log table, use the following statement:

mysql> SELECT DISTINCT name FROM driver_log ORDER BY name; +-------+ | name  | +-------+ | Ben   | | Henry | | Suzi  | +-------+ 

A statement without DISTINCT produces the same names, but is not nearly as easy to understand, even with a small dataset:

mysql> SELECT name FROM driver_log; +-------+ | name  | +-------+ | Ben   | | Suzi  | | Henry | | Henry | | Ben   | | Henry | | Suzi  | | Henry | | Ben   | | Henry | +-------+ 

To determine how many different drivers there are, use COUNT(DISTINCT):

mysql> SELECT COUNT(DISTINCT name) FROM driver_log; +----------------------+ | COUNT(DISTINCT name) | +----------------------+ |                    3 | +----------------------+ 

COUNT(DISTINCT) ignores NULL values. Should you wish to count NULL as one of the values in the set if it's present, use one of the following expressions:

COUNT(DISTINCT val) + IF(COUNT(IF(val IS NULL,1,NULL))=0,0,1) COUNT(DISTINCT val) + IF(SUM(ISNULL(val))=0,0,1) COUNT(DISTINCT val) + (SUM(ISNULL(val))!=0) 

DISTINCT queries often are useful in conjunction with aggregate functions to obtain a more complete characterization of your data. Suppose that you have a customer table that contains a state column indicating the state where customers are located. Applying COUNT(*) to the customer table indicates how many customers you have, using DISTINCT on the state values in the table tells you the number of states in which you have customers, and COUNT(DISTINCT) on the state values tells you how many states your customer base represents.

When used with multiple columns, DISTINCT shows the different combinations of values in the columns and COUNT(DISTINCT) counts the number of combinations. The following statements show the different sender/recipient pairs in the mail table and how many such pairs there are:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail     -> ORDER BY srcuser, dstuser; +---------+---------+ | srcuser | dstuser | +---------+---------+ | barb    | barb    | | barb    | tricia  | | gene    | barb    | | gene    | gene    | | gene    | tricia  | | phil    | barb    | | phil    | phil    | | phil    | tricia  | | tricia  | gene    | | tricia  | phil    | +---------+---------+ mysql> SELECT COUNT(DISTINCT srcuser, dstuser) FROM mail; +----------------------------------+ | COUNT(DISTINCT srcuser, dstuser) | +----------------------------------+ |                               10 | +----------------------------------+ 

DISTINCT works with expressions, too, not just column values. To determine the number of hours of the day during which messages in the mail are sent, count the distinct HOUR⁠(⁠ ⁠ ⁠) values:

mysql> SELECT COUNT(DISTINCT HOUR(t)) FROM mail; +-------------------------+ | COUNT(DISTINCT HOUR(t)) | +-------------------------+ |                      12 | +-------------------------+ 

To find out which hours those were, list them:

mysql> SELECT DISTINCT HOUR(t) AS hour FROM mail ORDER BY hour; +------+ | hour | +------+ |    7 | |    8 | |    9 | |   10 | |   11 | |   12 | |   13 | |   14 | |   15 | |   17 | |   22 | |   23 | +------+ 

Note that this statement doesn't tell you how many messages were sent each hour. That's covered in Section 8.15.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net