9.5. Grouping Results


If a query does not contain a GROUP BY clause to place rows of the result set into groups, an aggregate function produces a result that is based on all the selected rows. A GROUP BY clause may be added to generate a more fine-grained summary that produces values for subgroups within a set of selected rows.

Suppose that a table named personnel contains the following information about company employees:

 mysql> SELECT * FROM personnel; +---------+--------+---------+-------------+----------+ | pers_id | name   | dept_id | title       | salary   | +---------+--------+---------+-------------+----------+ |       1 | Wendy  |      14 | Supervisor  | 38000.00 | |       2 | Wally  |       7 | Stock clerk | 28000.00 | |       3 | Ray    |       7 | Programmer  | 41000.00 | |       4 | Burton |      14 | Secretary   | 32000.00 | |       5 | Gordon |      14 | President   | 78000.00 | |       6 | Jeff   |       7 | Stock clerk | 29000.00 | |       7 | Doris  |       7 | Programmer  | 48000.00 | |       8 | Daisy  |       7 | Secretary   | 33000.00 | |       9 | Bea    |       7 | Accountant  | 40000.00 | +---------+--------+---------+-------------+----------+ 

Use of COUNT(*) to count rows when there is no GROUP BY produces a single value for the entire set of rows:

 mysql> SELECT COUNT(*) FROM personnel; +----------+ | COUNT(*) | +----------+ |        9 | +----------+ 

Adding a GROUP BY clause arranges rows using the values in the grouping column or columns. The result is that COUNT(*) produces a count for each group. To find out how many times each title occurs, do this:

 mysql> SELECT title, COUNT(*) FROM personnel     -> GROUP BY title; +-------------+----------+ | title       | COUNT(*) | +-------------+----------+ | Accountant  |        1 | | President   |        1 | | Programmer  |        2 | | Secretary   |        2 | | Stock clerk |        2 | | Supervisor  |        1 | +-------------+----------+ 

To count the number of people in each department, group by department number:

 mysql> SELECT dept_id, COUNT(*) FROM personnel     -> GROUP BY dept_id; +---------+----------+ | dept_id | COUNT(*) | +---------+----------+ |       7 |        6 | |      14 |        3 | +---------+----------+ 

A GROUP BY that names multiple columns arranges rows according to the combinations of values in those columns. For example, to find out how many times each job title occurs in each department, group by both department and title:

 mysql> SELECT dept_id, title, COUNT(*) FROM personnel     -> GROUP BY dept_id, title; +---------+-------------+----------+ | dept_id | title       | COUNT(*) | +---------+-------------+----------+ |       7 | Accountant  |        1 | |       7 | Programmer  |        2 | |       7 | Secretary   |        1 | |       7 | Stock clerk |        2 | |      14 | President   |        1 | |      14 | Secretary   |        1 | |      14 | Supervisor  |        1 | +---------+-------------+----------+ 

The preceding queries use COUNT(*) to count rows, but you can also use summary functions to compute results based on values in specific columns of the rows in each group. For example, numeric functions can tell you about the salary characteristics of each title or department:

 mysql> SELECT title, MIN(salary), MAX(salary), AVG(salary)     -> FROM personnel     -> GROUP BY title; +-------------+-------------+-------------+--------------+ | title       | MIN(salary) | MAX(salary) | AVG(salary)  | +-------------+-------------+-------------+--------------+ | Accountant  |    40000.00 |    40000.00 | 40000.000000 | | President   |    78000.00 |    78000.00 | 78000.000000 | | Programmer  |    41000.00 |    48000.00 | 44500.000000 | | Secretary   |    32000.00 |    33000.00 | 32500.000000 | | Stock clerk |    28000.00 |    29000.00 | 28500.000000 | | Supervisor  |    38000.00 |    38000.00 | 38000.000000 | +-------------+-------------+-------------+--------------+ mysql> SELECT dept_id, MIN(salary), MAX(salary), AVG(salary)     -> FROM personnel     -> GROUP BY dept_id; +---------+-------------+-------------+--------------+ | dept_id | MIN(salary) | MAX(salary) | AVG(salary)  | +---------+-------------+-------------+--------------+ |       7 |    28000.00 |    48000.00 | 36500.000000 | |      14 |    32000.00 |    78000.00 | 49333.333333 | +---------+-------------+-------------+--------------+ 

If you want the results from AVG() to be displayed to two decimals, use ROUND(AVG(salary),2).

If you combine the GROUP_CONCAT() function with GROUP BY, GROUP_CONCAT() produces a concatenated result from each group of strings. The following example creates lists of the countries that have a particular form of government on the South American continent:

 mysql> SELECT GovernmentForm, GROUP_CONCAT(Name) AS Countries     -> FROM Country     -> WHERE Continent = 'South America'     -> GROUP BY GovernmentForm\G *************************** 1. row *************************** GovernmentForm: Dependent Territory of the UK      Countries: Falkland Islands *************************** 2. row *************************** GovernmentForm: Federal Republic      Countries: Argentina,Venezuela,Brazil *************************** 3. row *************************** GovernmentForm: Overseas Department of France      Countries: French Guiana *************************** 4. row *************************** GovernmentForm: Republic      Countries: Chile,Uruguay,Suriname,Peru,Paraguay,Bolivia,                 Guyana,Ecuador,Colombia 

The default string separator used by GROUP_CONCAT() is ',' (comma). Records are added to the resulting string in the order in which the database server reads them. To change the separator and the concatenation order, add SEPARATOR and ORDER BY clauses, respectively, within the parentheses. For ORDER BY, you can specify ASC or DESC, just as when you use it in other contexts:

 mysql> SELECT GovernmentForm,     ->        GROUP_CONCAT(Name ORDER BY Name ASC SEPARATOR ' - ')     ->        AS Countries     -> FROM Country     -> WHERE Continent = 'South America'     -> GROUP BY GovernmentForm\G *************************** 1. row *************************** GovernmentForm: Dependent Territory of the UK      Countries: Falkland Islands *************************** 2. row *************************** GovernmentForm: Federal Republic      Countries: Argentina - Brazil - Venezuela *************************** 3. row *************************** GovernmentForm: Overseas Department of France      Countries: French Guiana *************************** 4. row *************************** GovernmentForm: Republic      Countries: Bolivia - Chile - Colombia - Ecuador - Guyana -                 Paraguay - Peru - Suriname - Uruguay 

The next example for this function returns the continents that contain countries that have a name beginning with 'I', as well as the form of government for those countries. The example demonstrates that GROUP_CONCAT() accepts a DISTINCT clause to remove duplicates from the concatenated list. The first query shows what the result looks like without DISTINCT, and the second uses DISTINCT to display each form of government only once:

 mysql> SELECT Continent,     ->        GROUP_CONCAT(GovernmentForm ORDER BY GovernmentForm ASC)     ->        AS 'Government Form'     -> FROM Country     -> WHERE Name LIKE 'I%'     -> GROUP BY Continent; +-----------+--------------------------------------------------------------+ | Continent | Government Form                                              | +-----------+--------------------------------------------------------------+ | Asia      | Federal Republic,Islamic Republic,Republic,Republic,Republic | | Europe    | Republic,Republic,Republic                                   | +-----------+--------------------------------------------------------------+ mysql> SELECT Continent,     ->        GROUP_CONCAT(DISTINCT GovernmentForm     ->                     ORDER BY GovernmentForm ASC)     ->        AS 'Government Form'     -> FROM Country     -> WHERE Name LIKE 'I%'     -> GROUP BY Continent; +-----------+--------------------------------------------+ | Continent | Government Form                            | +-----------+--------------------------------------------+ | Asia      | Federal Republic,Islamic Republic,Republic | | Europe    | Republic                                   | +-----------+--------------------------------------------+ 

Note that in each of the preceding queries, the output columns consist only of the columns listed in the GROUP BY clause, and values produced by summary functions. If you try to retrieve table columns other than those listed in the GROUP BY clause, the values displayed for the extra columns are unpredictable.

9.5.1. GROUP BY and Sorting

In MySQL, a GROUP BY clause has the side effect of sorting rows. If you already have a GROUP BY clause in your query that produces the desired sort order, there's no need for an ORDER BY. Use of ORDER BY is necessary with GROUP BY only to produce a different sort order than that resulting from the GROUP BY. However, this isn't a portable behavior. For database engines other than MySQL, GROUP BY might not sort rows. To write more portable queries, add an ORDER BY even if MySQL does not require it.

9.5.2. Selecting Groups with HAVING

It could be when you use GROUP BY that you're interested only in groups that have particular summary characteristics. To retrieve just those groups and eliminate the rest, use a HAVING clause that identifies the required group characteristics. HAVING acts in a manner somewhat similar to WHERE, but occurs at a different stage of query processing:

1.

WHERE, if present, identifies the initial set of records to select from a table.

2.

GROUP BY arranges the selected records into groups.

3.

Aggregate functions compute summary values for each group.

4.

HAVING identifies which groups to retrieve for the final result set.

The following example shows how this progression works, using the personnel table shown earlier in the chapter:

1.

A query with no GROUP BY clause or aggregate functions selects a list of records. This list provides details, not overall characteristics:

 mysql> SELECT title, salary     -> FROM personnel WHERE dept_id = 7; +-------------+----------+ | title       | salary   | +-------------+----------+ | Stock clerk | 28000.00 | | Programmer  | 41000.00 | | Stock clerk | 29000.00 | | Programmer  | 48000.00 | | Secretary   | 33000.00 | | Accountant  | 40000.00 | +-------------+----------+ 

2.

Adding GROUP BY and aggregate functions arranges rows into groups and computes summary values for each.

 mysql> SELECT title, COUNT(*), AVG(salary)     -> FROM personnel WHERE dept_id = 7     -> GROUP BY title; +-------------+----------+--------------+ | title       | COUNT(*) | AVG(salary)  | +-------------+----------+--------------+ | Accountant  |        1 | 40000.000000 | | Programmer  |        2 | 44500.000000 | | Secretary   |        1 | 33000.000000 | | Stock clerk |        2 | 28500.000000 | +-------------+----------+--------------+ 

3.

Finally, adding HAVING places an additional constraint on the output rows. In the following query, only those groups consisting of two or more people are displayed:

 mysql> SELECT title, salary, COUNT(*), AVG(salary)     -> FROM personnel WHERE dept_id = 7     -> GROUP BY title     -> HAVING COUNT(*) > 1; +-------------+----------+----------+--------------+ | title       | salary   | COUNT(*) | AVG(salary)  | +-------------+----------+----------+--------------+ | Programmer  | 41000.00 |        2 | 44500.000000 | | Stock clerk | 28000.00 |        2 | 28500.000000 | +-------------+----------+----------+--------------+ 

Sometimes it's possible to place selection criteria in either the WHERE clause or the HAVING clause. In such cases, it's better to do so in the WHERE clause because that eliminates rows from consideration sooner and allows the query to be processed more efficiently. Choosing values in the HAVING clause might cause the query to perform group calculations on groups in which you have no interest.

9.5.3. Using GROUP BY and WITH ROLLUP

The WITH ROLLUP modifier can be used in the GROUP BY clause to produce multiple levels of summary values. Suppose that you need to generate a listing of the population of each continent, as well as the total of the population on all continents. One way to do this is by running one query to get the per-continent totals and another to get the total for all continents:

 mysql> SELECT Continent, SUM(Population) AS pop     -> FROM Country     -> GROUP BY Continent; +---------------+------------+ | Continent     | pop        | | Asia          | 3705025700 | | Europe        | 730074600  | | North America | 482993000  | | Africa        | 784475000  | | Oceania       | 30401150   | | Antarctica    | 0          | | South America | 345780000  | +---------------+------------+ mysql> SELECT SUM(Population) AS pop     -> FROM Country; +------------+ | pop        | +------------+ | 6078749450 | +------------+ 

Another way to get the results requires some application programming: The application can retrieve the per-continent values and sum those to calculate the total population value.

To avoid either of those approaches, use WITH ROLLUP. This enables you to use a single query to get both the detailed results as well as the total sum of all rows, eliminating the need for multiple queries or extra processing on the client side:

 mysql> SELECT Continent, SUM(Population) AS pop     -> FROM Country     -> GROUP BY Continent WITH ROLLUP; +---------------+------------+ | Continent     | pop        | +---------------+------------+ | Asia          | 3705025700 | | Europe        |  730074600 | | North America |  482993000 | | Africa        |  784475000 | | Oceania       |   30401150 | | Antarctica    |          0 | | South America |  345780000 | | NULL          | 6078749450 | +---------------+------------+ 

The difference in the output from this statement compared to one without WITH ROLLUP occurs on the last line, where the Continent value contains NULL and the pop value contains the total sum of all populations.

WITH ROLLUP performs a "super-aggregate" operation. It does not simply generate a sum of the numbers that appear in the pop column. Instead, the final line comprises applications of the given aggregate function, as it is written in the SELECT clause, on every single row selected.

To illustrate this, consider the following example in which we calculate columns using the AVG() function rather than SUM(). The final rollup line contains the overall average, not the sum of averages:

 mysql> SELECT Continent, AVG(Population) AS avg_pop     -> FROM Country     -> GROUP BY Continent WITH ROLLUP; +---------------+---------------+ | Continent     | avg_pop       | +---------------+---------------+ | Asia          | 72647562.7451 | | Europe        | 15871186.9565 | | North America | 13053864.8649 | | Africa        | 13525431.0345 | | Oceania       |  1085755.3571 | | Antarctica    |        0.0000 | | South America | 24698571.4286 | | NULL          | 25434098.1172 | +---------------+---------------+ 

In other words, the rollup line contains the numbers that would appear had there been no grouping columns for the query:

 mysql> SELECT AVG(Population) AS avg_pop     -> FROM Country; +---------------+ | avg_pop       | +---------------+ | 25434098.1172 | +---------------+ 

Without WITH ROLLUP, getting the per-continent and overall AVG() results produced would require two separate statements: one to get the per-continent data and one to get the overall totals. For large data sets, WITH ROLLUP is more efficient because the data need be scanned only once.

The use of WITH ROLLUP gets more interesting when several columns are grouped at once. The results include a summary for each column named in the GROUP BY clause, as well as a final summary row:

 mysql> SELECT Continent, Region,     ->        SUM(Population) AS pop,     ->        AVG(Population) AS avg_pop     -> FROM Country     -> GROUP BY Continent, Region WITH ROLLUP; +---------------+---------------------------+------------+----------------+ | Continent     | Region                    | pop        | avg_pop        | +---------------+---------------------------+------------+----------------+ | Asia          | Eastern Asia              | 1507328000 | 188416000.0000 | | Asia          | Middle East               |  188380700 |  10465594.4444 | | Asia          | Southeast Asia            |  518541000 |  47140090.9091 | | Asia          | Southern and Central Asia | 1490776000 | 106484000.0000 | | Asia          | NULL                      | 3705025700 |  72647562.7451 | | Europe        | Baltic Countries          |    7561900 |   2520633.3333 | | Europe        | British Islands           |   63398500 |  31699250.0000 | ... | Europe        | Eastern Europe            |  307026000 |  30702600.0000 | | Europe        | Nordic Countries          |   24166400 |   3452342.8571 | | Europe        | Southern Europe           |  144674200 |   9644946.6667 | ... | Europe        | Western Europe            |  183247600 |  20360844.4444 | | Europe        | NULL                      |  730074600 |  15871186.9565 | | North America | Caribbean                 |   38140000 |   1589166.6667 | | North America | Central America           |  135221000 |  16902625.0000 | | North America | North America             |  309632000 |  61926400.0000 | | North America | NULL                      |  482993000 |  13053864.8649 | | Africa        | Central Africa            |   95652000 |  10628000.0000 | ... | Africa        | Eastern Africa            |  246999000 |  12349950.0000 | | Africa        | Northern Africa           |  173266000 |  24752285.7143 | | Africa        | Southern Africa           |   46886000 |   9377200.0000 | ... | Africa        | Western Africa            |  221672000 |  13039529.4118 | | Africa        | NULL                      |  784475000 |  13525431.0345 | | Oceania       | Australia and New Zealand |   22753100 |   4550620.0000 | ... | Oceania       | Melanesia                 |    6472000 |   1294400.0000 | | Oceania       | Micronesia                |     543000 |     77571.4286 | | Oceania       | Micronesia/Caribbean      |          0 |         0.0000 | ... | Oceania       | Polynesia                 |     633050 |     63305.0000 | | Oceania       | NULL                      |   30401150 |   1085755.3571 | | Antarctica    | Antarctica                |          0 |         0.0000 | | Antarctica    | NULL                      |          0 |         0.0000 | | South America | South America             |  345780000 |  24698571.4286 | | South America | NULL                      |  345780000 |  24698571.4286 | | NULL          | NULL                      | 6078749450 |  25434098.1172 | +---------------+---------------------------+------------+----------------+ 

Note how the groupwise summaries are presented in the result: In addition to the final summary line, the output includes an intermediate summary of the rows for a given continent whenever the Continent value changes. In these intermediate summary lines, Region is set to NULL.



MySQL 5 Certification Study Guide
MySQL 5.0 Certification Study Guide
ISBN: 0672328127
EAN: 2147483647
Year: 2006
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net