4.2 The GROUP BY Clause

   

The GROUP BY clause, along with the aggregate functions, groups a result set into multiple groups, and then produces a single row of summary information for each group. For example, if you want to find the total number of orders for each customer, execute the following query:

SELECT cust_nbr, COUNT(order_nbr)  FROM cust_order  GROUP BY cust_nbr;   CUST_NBR COUNT(ORDER_NBR) ---------- ----------------         1                8         4                4         5                6         8                2

This query produces one summary line of output for each customer. This is the essence of a GROUP BY query. You asked Oracle to GROUP the results BY cust_nbr; therefore, it produced one output row for each distinct value of cust_nbr. Each data value for a given customer represents a summary based on all rows for that customer.

4.2.1 Correspondence Between SELECT and GROUP BY

When you write a query with a GROUP BY clause, there are a number of rules you need to be aware of that govern the correspondence between the columns in the SELECT and GROUP BY clauses. Generally speaking, any nonaggregate expression in your SELECT clause must also be reflected in your GROUP BY clause.

4.2.1.1 Aggregate expressions generally require a GROUP BY clause

The nonaggregate expression cust_nbr in the SELECT list of the query in our most recent example also appears in the GROUP BY clause. If you have a mix of aggregate and nonaggregate expressions in the SELECT list, SQL expects that you are trying to perform a GROUP BY operation, and you must also include a GROUP BY clause in your query. Oracle returns an error if you fail to do so. For example, if you omit the GROUP BY clause, the following error is returned:

SELECT cust_nbr, sales_emp_id, COUNT(order_nbr) FROM cust_order; SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)        * ERROR at line 1: ORA-00937: not a single-group group function

There is one case in which you can write aggregate expressions in a SELECT list without also writing a GROUP BY clause and that is when you wish those aggregate expressions to apply to the entire result set. In such a case, your SELECT list must consist only of aggregate expressions. The queries earlier in this chapter, introducing the aggregate functions, are good examples of this case.

4.2.1.2 GROUP BY clause must include all nonaggregate expressions

If you forget to include all nonaggregate expressions from the SELECT list in the GROUP BY clause, SQL returns the following error:

SELECT cust_nbr, sales_emp_id, COUNT(order_nbr) FROM cust_order GROUP BY cust_nbr; SELECT cust_nbr, sales_emp_id, COUNT(order_nbr)                  * ERROR at line 1: ORA-00979: not a GROUP BY expression

4.2.1.3 Aggregate functions not allowed in GROUP BY clause

You can't use a group function (aggregate function) in the GROUP BY clause. You will get an error if you attempt to do so, as in the following example:

SELECT cust_nbr, COUNT(order_nbr) FROM cust_order GROUP BY cust_nbr, COUNT(order_nbr); GROUP BY cust_nbr, COUNT(order_nbr)                    * ERROR at line 3: ORA-00934: group function is not allowed here

4.2.1.4 Constants can be omitted from the GROUP BY clause

If you have a constant in your SELECT list, you don't need to include it in the GROUP BY clause. However, including the constant in the GROUP BY clause doesn't alter the result. Therefore, both the following statements will produce the same output:

SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr)  FROM cust_order  GROUP BY cust_nbr; SELECT 'CUSTOMER', cust_nbr, COUNT(order_nbr)  FROM cust_order GROUP BY 'CUSTOMER', cust_nbr; 'CUSTOMER'   CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ---------------- CUSTOMER            1                8 CUSTOMER            4                4 CUSTOMER            5                6 CUSTOMER            8                2

4.2.1.5 Scalar functions may be grouped by their underlying column

If a scalar function has been applied to a column in the SELECT list, the syntax doesn't force you to include the scalar function in the GROUP BY clause. For example:

SELECT SUBSTR(lname,1,1), COUNT(*) FROM employee GROUP BY lname; S   COUNT(*) - ---------- A          1 A          1 B          1 C          1 F          1 J          1 J          1 K          1 M          1 M          1 S          1 S          1 T          1 W          1 14 rows selected.

In this example, the SELECT list has SUBSTR(lname,1,1); however, the GROUP BY clause contains just lname, without the SUBSTR function on it. Though this query is syntactically correct, if you look at the result set, you will notice that there are multiple rows with the same value for SUBSTR(lname,1,1). This means that the GROUP BY operation takes place for the entire lname, but only the substring is displayed. If you really want the result set to be grouped by the substring expression, you should include that expression in the GROUP BY clause, as shown in the following example:

SELECT SUBSTR(lname,1,1), COUNT(*) FROM employee GROUP BY SUBSTR(lname,1,1); S   COUNT(*) - ---------- A          2 B          1 C          1 F          1 J          2 K          1 M          2 S          2 T          1 W          1 10 rows selected.

Notice the difference. This time, there is only one row in the result set for each value returned by SUBSTR(lname,1,1). The rows have been grouped on the exact same expression as is displayed.

4.2.1.6 Concatenated columns may be grouped in either of two ways

If an expression in a SELECT list concatenates two columns, you can specify the GROUP BY clause in one of the following two ways both giving the same result:

SELECT manager_emp_id || job_id, COUNT(*) FROM employee GROUP BY manager_emp_id || job_id; SELECT manager_emp_id || job_id, COUNT(*) FROM employee GROUP BY manager_emp_id, job_id; MANAGER_EMP_ID||JOB_ID      COUNT(*) ------------------------- ---------- 672                                1 7566669                            2 7698                               1 7698667                            1 7698670                            3 7782667                            1 7788                               1 7839671                            3 7902667                            1

4.2.1.7 You can sometimes exclude a nonaggregate expression from the GROUP BY clause

There are certain situations in which you want an expression in the SELECT list, but don't want to group by the same. For example, you might want to display a line number along with summary information for each customer. Attempt to do so using the following query, and you will get an error:

SELECT ROWNUM, cust_nbr, COUNT(order_nbr) FROM cust_order GROUP BY cust_nbr; SELECT ROWNUM, cust_nbr, COUNT(order_nbr)        * ERROR at line 1: ORA-00979: not a GROUP BY expression

If you include ROWNUM in the GROUP BY clause, you'll get the following, unexpected result:

SELECT ROWNUM, cust_nbr, COUNT(order_nbr)  FROM cust_order GROUP BY ROWNUM, cust_nbr;     ROWNUM   CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ----------------          1          1                1          2          1                1          3          5                1          4          4                1          5          4                1          6          8                1          7          1                1          8          5                1          9          5                1         10          1                1         11          1                1         12          1                1         13          5                1         14          4                1         15          4                1         16          8                1         17          1                1         18          5                1         19          5                1         20          1                1 20 rows selected.

You certainly didn't want this result, did you? You wanted to receive one summary row for each customer, and then to display ROWNUM for those lines. But when you include ROWNUM in the GROUP BY clause, it produces one summary row for each row selected from the table cust_order. To get the expected result, you should use the following SQL:

SELECT ROWNUM, v.* FROM (SELECT cust_nbr, COUNT(order_nbr)        FROM cust_order GROUP BY cust_nbr) v;     ROWNUM   CUST_NBR COUNT(ORDER_NBR) ---------- ---------- ----------------          1          1                8          2          4                4          3          5                6          4          8                2

The construct in the FROM clause is called an inline view. Read more about inline views in Chapter 5.

4.2.1.8 You are not required to show your GROUP BY columns

Syntactically, it is not mandatory to include all the expressions of the GROUP BY clause in the SELECT list. However, those expressions not in the SELECT list will not be represented in the output; therefore, the output may not make much sense. For example:

SELECT COUNT(order_nbr)  FROM cust_order  GROUP BY cust_nbr; COUNT(ORDER_NBR) ----------------                8                4                6                2

This query produces a count of orders for each customer (by grouping based on cust_nbr), but without the cust_nbr in the output you can't associate the counts with the customers. Extending the previous example, you can see that without a consistent SELECT list and GROUP BY clause, the output may be a bit confusing. The following example produces output that at first glance seems useful:

SELECT cust_nbr, COUNT(order_nbr)  FROM cust_order  GROUP BY cust_nbr, order_dt;   CUST_NBR COUNT(ORDER_NBR) ---------- ----------------          1                2          1                2          1                4          4                2          4                2          5                2          5                2          5                2          8                2 9 rows selected.

From the output, it appears that you are trying to obtain a count of orders for each customer. However, there are multiple rows in the output for some cust_nbr values. The fact that you have included order_dt in the GROUP BY clause, and therefore generated a summary result for each combination of cust_nbr and order_dt, is missing from the output. You can't make sense of the output unless the output and the SQL statement are looked at together. You can't expect all readers of SQL output to understand SQL syntax, can you? Therefore, we always recommend maintaining consistency between the nonaggregate expressions in the SELECT list and the expressions in the GROUP BY clause. A more meaningful version of the previous SQL statement would be as follows:

SELECT cust_nbr, order_dt, COUNT(order_nbr)  FROM cust_order GROUP BY cust_nbr, order_dt;   CUST_NBR ORDER_DT  COUNT(ORDER_NBR) ---------- --------- ----------------          1 18-JUL-01                2          1 19-JUL-01                2          1 22-JUL-01                4          4 16-JUL-01                2          4 18-JUL-01                2          5 12-JUL-01                2          5 20-JUL-01                2          5 21-JUL-01                2          8 22-JUL-01                2 9 rows selected.

This output is consistent with the GROUP BY clause in the query. Readers of the report are more likely to make the correct assumption about what this output represents.

4.2.2 GROUP BY Clause and NULL Values

When you GROUP BY a column that contains NULL values for some rows, all the rows with NULL values are placed into a single group and presented as one summary row in the output. For example:

SELECT sale_price, COUNT(order_nbr)  FROM cust_order  GROUP BY sale_price; SALE_PRICE COUNT(ORDER_NBR) ---------- ----------------         25                4         34                2         56                4         99                4                           6

Notice that the last row in the output consists of a NULL value for the column sale_price. If you want the row containing the NULL value to be the first row in the output, you can perform an ORDER BY on sale_price in descending order:

SELECT sale_price, COUNT(order_nbr)  FROM cust_order  GROUP BY sale_price ORDER BY sale_price DESC; SALE_PRICE COUNT(ORDER_NBR) ---------- ----------------                           6         99                4         56                4         34                2         25                4

Whether you are using a GROUP BY or not, the ORDER BY clause can have an optional NULLS FIRST or NULLS LAST option to put the NULLs either at the beginning or at the end of the result set, respectively. For example, to sort NULLs first:

SELECT sale_price, COUNT(order_nbr)  FROM cust_order  GROUP BY sale_price ORDER BY sale_price NULLS FIRST; SALE_PRICE COUNT(ORDER_NBR) ---------- ----------------                           6         25                4         34                2         56                4         99                4

Or, to sort NULLs last:

SELECT sale_price, COUNT(order_nbr)  FROM cust_order  GROUP BY sale_price ORDER BY sale_price NULLS LAST; SALE_PRICE COUNT(ORDER_NBR) ---------- ----------------         25                4         34                2         56                4         99                4                           6

4.2.3 GROUP BY Clause with WHERE Clause

While producing summary results using the GROUP BY clause, you can filter records from the table based on a WHERE clause, as in the following example, which produces a count of orders in which the sale price exceeds $25 for each customer:

SELECT cust_nbr, COUNT(order_nbr) FROM cust_order WHERE sale_price > 25 GROUP BY cust_nbr;   CUST_NBR COUNT(ORDER_NBR) ---------- ----------------          1                4          4                4          8                2

While executing a SQL statement with a WHERE clause and a GROUP BY clause, Oracle first applies the WHERE clause and filters out the rows that don't satisfy the WHERE condition. The rows that satisfy the WHERE clause are then grouped using the GROUP BY clause.

SQL syntax requires that the WHERE clause must come before the GROUP BY clause. Otherwise, the following error is returned:

SELECT cust_nbr, COUNT(order_nbr) FROM cust_order GROUP BY cust_nbr WHERE sale_price > 25; WHERE sale_price > 25 * ERROR at line 4: ORA-00933: SQL command not properly ended



Mastering Oracle SQL
Mastering Oracle SQL, 2nd Edition
ISBN: 0596006322
EAN: 2147483647
Year: 2003
Pages: 154

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net