Aggregate Functions, GROUP BY and HAVING | Java Oracle Database Development

Aggregate functions are functions that take a set of values a column or an expression including a column name in a result set and return a single value. For example, we can query a table that contains students' test scores to find the average by using the AVG() function. In SQL, the basic aggregate functions are:

AVG() Calculates average
COUNT() Returns the number of rows
MAX() The highest value in the set of values
MIN() The smallest value in the set
SUM() The sum of the values

When we use an aggregate function in a select list, we can't use anything else but aggregate functions in the select list unless we use a GROUP BY clause.

In the previous chapter, we considered a table of fruit.

 Fruit       Color    Quantity  Price  Picked ----------  -------  --------  -----  --------- Apple       Green          12     .5  12-SEP-02 Apple       Red            12     .5  15-SEP-02 Mango       Yellow         10    1.5  22-SEP-02 Mangosteen  Purple          5      2  25-SEP-02 Durian                      2     15 Orange      Orange         10      1  28-

This is what happens when we try to mix an aggregate function with regular select list items:

 SQL> SELECT FRUIT, PRICE, AVG(PRICE) FROM FRUITS; SELECT FRUIT, PRICE, AVG(PRICE) FROM FRUITS        * ERROR at line 1: ORA-00937: not a single-group group function

The problem, of course, is that this SELECT statement would return six rows for FRUIT and PRICE but only one for AVG(PRICE).

 SQL> SELECT AVG(PRICE) FROM FRUITS; AVG(PRICE) ---------- 3.41666667

We can't reconcile the difference between aggregate functions that return only a single row and select list items that can potentially return multiple rows. But we can break down rows into smaller subsets based on particular columns, using a GROUP BY clause. The basic format of a SELECT statement with a GROUP BY clause is:

 SELECT select_list FROM table_list [WHERE condition] GROUP BY group_by_list;

The group_by_list is essentially a select_list of columns that together are used to generate unique sets of values. In this table, if we were to group fruit together on the FRUIT column, all rows that have the same value in FRUIT would be combined into a single row. This, by itself, is no different than using the keyword DISTINCT with the select list. What is different is that when we use GROUP BY, aggregate functions, instead of applying to the whole result set, apply to each group of distinct rows.

When we use GROUP BY, items in the select list must either appear in the GROUP BY clause or they must include aggregate functions. Here we use the FRUIT column in the GROUP BY clause and in the select list; we additionally use the aggregate SUM() function in the select list.

 SQL> SELECT FRUIT, SUM(QUANTITY) FROM FRUITS   2     GROUP BY FRUIT; FRUIT      SUM(QUANTITY) ---------- ------------- Apple                 24 Durian                 2 Mango                 10 Mangosteen             5 Orange                10

We have two types of apples, 12 of each type, and we can see that here they've been combined into a single row with the total quantity 24.

GROUP BY has an optional subclause, HAVING. HAVING is similar to a WHERE clause for groups. The principal difference is that you can use it only with aggregate functions or with the select list items specified in the GROUP BY clause. Suppose that we want to list only fruits with QUANTITY greater than or equal to 10:

 SQL> SELECT FRUIT, SUM(QUANTITY) FROM FRUITS   2  GROUP BY FRUIT   3  HAVING SUM(QUANTITY)>=10; FRUIT      SUM(QUANTITY) ---------- ------------- Apple                 24 Mango                 10 Orange                10

We cannot use columns that do not appear in the select list in the HAVING clause, but we can use aggregate expressions that don't appear elsewhere. For example, we can limit the results to fruits that have a value totaling 12 or more, even though this calculation doesn't appear in the select list:

 SQL> SELECT FRUIT, SUM(QUANTITY) FROM FRUITS   2  GROUP BY FRUIT   3  HAVING SUM(QUANTITY*PRICE)>=12; FRUIT      SUM(QUANTITY) ---------- ------------- Apple                 24 Durian                 2 Mango                 10

Here is an example with multiple columns in the GROUP BY list and a more complicated condition:

 SQL> SELECT FRUIT, COLOR, SUM(QUANTITY) FROM FRUITS   2  GROUP BY FRUIT, COLOR   3  HAVING SUM(QUANTITY)>=10 AND AVG(PRICE)>1; FRUIT      COLOR      SUM(QUANTITY) ---------- ---------- ------------- Mango      Yellow                10

The GROUP BY clause groups the rows where the combination of FRUIT and COLOR are unique (in this case, all rows are unique, so this returns six rows), finds anywhere the average price is greater than 1 (which rules out apples and oranges), and finally, finds those with a quantity greater than or equal to 10, which leaves only mangos.