The ROLLUP clause extends GROUP BY to return a row containing a subtotal for each group along with a total for all groups. As you saw in Chapter 3, you use GROUP BY to group rows into blocks with a common column value. For example, the following query uses GROUP BY to group the rows from the employees2 table by department_id and uses SUM() to get the sum of the salaries for each division_id :
SELECT division_id, SUM(salary) FROM employees2 GROUP BY division_id; DIV SUM(SALARY) --- ----------- BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000
The following query rewrites the previous example to use ROLLUP . Notice the additional row at the end that contains the total salaries for all groups:
SELECT division_id, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id); DIV SUM(SALARY) --- ----------- BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 8881000
You can pass multiple columns to ROLLUP , which then groups the rows into blocks with the same column values. The following example passes the division_id and job_id columns of the employees2 table to ROLLUP , which groups the rows by identical values in those columns. Notice the salaries are summed by division_id and job_id , and ROLLUP returns a row with the sum of the salaries in each division_id along with another row at the very end with the total salaries for all groups:
SELECT division_id, job_id, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id, job_id); DIV JOB SUM(SALARY) --- --- ----------- BUS MGR 530000 BUS PRE 800000 BUS WOR 280000 BUS 1610000 OPE ENG 245000 OPE MGR 805000 OPE WOR 270000 OPE 1320000 SAL MGR 4446000 SAL WOR 490000 SAL 4936000 SUP MGR 465000 SUP TEC 115000 SUP WOR 435000 SUP 1015000 8881000
The next example switches division_id and job_id so that job_id is listed before division_id . This causes ROLLUP to return a row with the sum of the salaries in each job_id :
SELECT job_id, division_id, SUM(salary) FROM employees2 GROUP BY ROLLUP(job_id, division_id); JOB DIV SUM(SALARY) --- --- ----------- ENG OPE 245000 ENG 245000 MGR BUS 530000 MGR OPE 805000 MGR SAL 4446000 MGR SUP 465000 MGR 6246000 PRE BUS 800000 PRE 800000 TEC SUP 115000 TEC 115000 WOR BUS 280000 WOR OPE 270000 WOR SAL 490000 WOR SUP 435000 WOR 1475000 8881000
You can use any of the aggregate functions with ROLLUP (for a list of the aggregate functions, see Table 3-5 in Chapter 3). The following example uses AVG() to compute average salaries:
SELECT division_id, job_id, AVG(salary) FROM employees2 GROUP BY ROLLUP(division_id, job_id); DIV JOB AVG(SALARY) --- --- ----------- BUS MGR 176666.667 BUS PRE 800000 BUS WOR 280000 BUS 322000 OPE ENG 245000 OPE MGR 201250 OPE WOR 135000 OPE 188571.429 SAL MGR 261529.412 SAL WOR 245000 SAL 259789.474 SUP MGR 232500 SUP TEC 115000 SUP WOR 145000 SUP 169166.667 240027.027
The CUBE clause extends GROUP BY to return rows containing a subtotal for all combinations of columns included in the CUBE clause along with a total at the end. The following example passes division_id and job_id to CUBE , which groups the rows by identical values in those columns. Notice the salaries are summed by division_id and job_id , and CUBE returns a row with the sum of the salaries in each division_id , along with the sum of all salaries in each job_id near the end, followed by another row at the very end with the total salaries:
SELECT division_id, job_id, SUM(salary) FROM employees2 GROUP BY CUBE(division_id, job_id); DIV JOB SUM(SALARY) --- --- ----------- BUS MGR 530000 BUS PRE 800000 BUS WOR 280000 BUS 1610000 OPE ENG 245000 OPE MGR 805000 OPE WOR 270000 OPE 1320000 SAL MGR 4446000 SAL WOR 490000 SAL 4936000 SUP MGR 465000 SUP TEC 115000 SUP WOR 435000 SUP 1015000 ENG 245000 MGR 6246000 PRE 800000 TEC 115000 WOR 1475000 8881000
Note | Oracle9 i puts the extra rows from the CUBE at the end, but Oracle Database 10g puts the extra rows at the start. So, depending on the version of the database you are using, you may see the rows at the end or the beginning. You can always put in an ORDER BY clause if you must have the rows in a specific order. For example, you could add ORDER BY division_id, job_id to the previous query. |
The next example switches division_id and job_id so that job_id is listed before division_id . This still results in CUBE returning the sums of the salaries by job_id and division _id, but the sums by job_id are returned first:
SELECT job_id, division_id, SUM(salary) FROM employees2 GROUP BY CUBE(job_id, division_id); JOB DIV SUM(SALARY) --- --- ----------- ENG OPE 245000 ENG 245000 MGR BUS 530000 MGR OPE 805000 MGR SAL 4446000 MGR SUP 465000 MGR 6246000 PRE BUS 800000 PRE 800000 TEC SUP 115000 TEC 115000 WOR BUS 280000 WOR OPE 270000 WOR SAL 490000 WOR SUP 435000 WOR 1475000 BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 8881000
The GROUPING() function accepts a column and returns 0 or 1. GROUPING() returns 1 when the column value is null, and returns 0 when the column value is non-null. GROUPING() is only used in queries that use ROLLUP or CUBE . GROUPING() is useful when you want to display a value when a null would otherwise be returned.
As you saw earlier in the section Passing a Single Column to ROLLUP, the last row in the example s result set contained a total of the salaries:
SELECT division_id, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id); DIV SUM(SALARY) --- ----------- BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 8881000
The division_id column for the last row is null. You can use the GROUPING() function to determine whether this column is null, as shown in the following query. Notice GROUPING() returns 0 for the rows that have non-null division_id values, and 1 for the last row that has a null division_id :
SELECT GROUPING(division_id), division_id, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id); GROUPING(DIVISION_ID) DIV SUM(SALARY) --------------------- --- ----------- 0 BUS 1610000 0 OPE 1320000 0 SAL 4936000 0 SUP 1015000 1 8881000
You can use the DECODE() function to convert the 1 in the previous example to a meaningful value. The following example uses DECODE() to convert 1 to the string All divisions :
SELECT DECODE(GROUPING(division_id), 1, 'All divisions', division_id) AS div, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id); DIV SUM(SALARY) ------------- ----------- BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 All divisions 8881000
The next example extends the idea of replacing null values to a ROLLUP containing both the division_id and job_id . Notice null division_id values are replaced with the string All divisions , and null job_id values are replaced with All jobs :
SELECT DECODE(GROUPING(division_id), 1, 'All divisions', division_id) AS div, DECODE(GROUPING(job_id), 1, 'All jobs', job_id) AS job, SUM(salary) FROM employees2 GROUP BY ROLLUP(division_id, job_id); DIV JOB SUM(SALARY) ------------- -------- ----------- BUS MGR 530000 BUS PRE 800000 BUS WOR 280000 BUS All jobs 1610000 OPE ENG 245000 OPE MGR 805000 OPE WOR 270000 OPE All jobs 1320000 SAL MGR 4446000 SAL WOR 490000 SAL All jobs 4936000 SUP MGR 465000 SUP TEC 115000 SUP WOR 435000 SUP All jobs 1015000 All divisions All jobs 8881000
You can use the GROUPING() function with CUBE . For example:
SELECT DECODE(GROUPING(division_id), 1, 'All divisions', division_id) AS div, DECODE(GROUPING(job_id), 1, 'All jobs', job_id) as job, SUM(salary) FROM employees2 GROUP BY CUBE(division_id, job_id); DIV JOB SUM(SALARY) ------------- -------- ----------- BUS MGR 530000 BUS PRE 800000 BUS WOR 280000 BUS All jobs 1610000 OPE ENG 245000 OPE MGR 805000 OPE WOR 270000 OPE All jobs 1320000 SAL MGR 4446000 SAL WOR 490000 SAL All jobs 4936000 SUP MGR 465000 SUP TEC 115000 SUP WOR 435000 SUP All jobs 1015000 All divisions ENG 245000 All divisions MGR 6246000 All divisions PRE 800000 All divisions TEC 115000 All divisions WOR 1475000 All divisions All jobs 8881000
You use the GROUPING SETS clause to just get the subtotal rows. The following example uses GROUPING SETS to get the subtotals for salaries by division_id and job_id :
SELECT division_id, job_id, SUM(salary) FROM employees2 GROUP BY GROUPING SETS(division_id, job_id); DIV JOB SUM(SALARY) --- --- ----------- BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 ENG 245000 MGR 6246000 PRE 800000 TEC 115000 WOR 1475000
Notice only subtotals for the division_id and job_id columns are returned: the total for all salaries is not returned. You ll see how you can get the total as well as the subtotals using the GROUPING_ID() function in the next section.
You can use the GROUPING_ID() function to filter rows using a HAVING clause to exclude rows that don t contain a subtotal or total. The GROUPING_ID() function accepts one or more columns and returns the decimal equivalent of the GROUPING bit vector. The GROUPING bit vector is computed by combining the results of a call to the GROUPING() function for each column in order.
Earlier in the section Using the GROUPING() Function, you saw that GROUPING() returns 1 when the column value is null, and returns 0 when the column value is non-null.
For example:
If both division_id and job_id are non-null, GROUPING() would return 0 for both columns. The result for division_id is combined with the result for job_id , giving a bit vector of 00, whose decimal equivalent is 0. GROUPING_ID() therefore returns 0 when division_id and job_id are non-null.
If division_id is non-null (the GROUPING bit is 0), but job_id is null (the GROUPING bit is 1), the resulting bit vector is 01 and GROUPING_ID() returns 1.
If division_id is null (the GROUPING bit is 1), but job_id is non-null (the GROUPING bit is 0), the resulting bit vector is 10 and GROUPING_ID() returns 2.
If both division_id and job_id are null (both GROUPING bits are 0), the bit vector is 11 and GROUPING_ID() returns 3.
The following table summarizes these results.
division_id | job_id | Bit Vector | GROUPING_ID() Return Value |
---|---|---|---|
non-null | non-null | 00 |
|
non-null | null | 01 | 1 |
null | non-null | 10 | 2 |
null | null | 11 | 3 |
The following example passes division_id and job_id to GROUPING_ID() . Notice the output from the GROUPING_ID() function agrees with the expected returned values documented in the previous section:
SELECT division_id, job_id, GROUPING(division_id) AS DIV_GRP, GROUPING(job_id) AS JOB_GRP, GROUPING_ID(division_id, job_id) AS grp_id, SUM(salary) FROM employees2 GROUP BY CUBE(division_id, job_id); DIV JOB DIV_GRP JOB_GRP GRP_ID SUM(SALARY) --- --- ---------- ---------- ---------- ----------- BUS MGR 0 0 0 530000 BUS PRE 0 0 0 800000 BUS WOR 0 0 0 280000 BUS 0 1 1 1610000 OPE ENG 0 0 0 245000 OPE MGR 0 0 0 805000 OPE WOR 0 0 0 270000 OPE 0 1 1 1320000 SAL MGR 0 0 0 4446000 SAL WOR 0 0 0 490000 SAL 0 1 1 4936000 SUP MGR 0 0 0 465000 SUP TEC 0 0 0 115000 SUP WOR 0 0 0 435000 SUP 0 1 1 1015000 ENG 1 0 2 245000 MGR 1 0 2 6246000 PRE 1 0 2 800000 TEC 1 0 2 115000 WOR 1 0 2 1475000 1 1 3 8881000
One useful application of GROUPING_ID() is to filter rows using a HAVING clause. Your HAVING clause can exclude rows that don t contain a subtotal or total by simply checking if GROUPING_ID() returns a value greater than 0. For example:
SELECT division_id, job_id, GROUPING_ID(division_id, job_id) AS grp_id, SUM(salary) FROM employees2 GROUP BY CUBE(division_id, job_id) HAVING GROUPING_ID(division_id, job_id) > 0; DIV JOB GRP_ID SUM(SALARY) --- --- ---------- ----------- BUS 1 1610000 OPE 1 1320000 SAL 1 4936000 SUP 1 1015000 ENG 2 245000 MGR 2 6246000 PRE 2 800000 TEC 2 115000 WOR 2 1475000 3 8881000
You can use a column multiple times in a GROUP BY clause, which can enable you to reorganize your data or report on different groupings of data. For example, the following query contains a GROUP BY clause that uses division_id twice, once to group by division_id and the second in a ROLLUP :
SELECT division_id, job_id, SUM(salary) FROM employees2 GROUP BY division_id, ROLLUP(division_id, job_id); DIV JOB SUM(SALARY) --- --- ----------- BUS MGR 530000 BUS PRE 800000 BUS WOR 280000 OPE ENG 245000 OPE MGR 805000 OPE WOR 270000 SAL MGR 4446000 SAL WOR 490000 SUP MGR 465000 SUP TEC 115000 SUP WOR 435000 BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000 BUS 1610000 OPE 1320000 SAL 4936000 SUP 1015000
Notice, however, that the last four rows are duplicates of the previous four rows. You can eliminate these duplicates using the GROUP_ID() function, which you ll learn about next.
You can use the GROUP_ID() function to remove duplicate rows returned by a GROUP BY clause. GROUP_ID() doesn t accept any parameters. If n duplicates exist for a particular grouping, GROUP_ID returns numbers in the range 0 to n -1.
The following example rewrites the query shown in the previous section to include the output from GROUP_ID() . Notice GROUP_ID() returns 0 for all rows except the last four, which are duplicates of the previous four rows and GROUP_ID() returns 1:
SELECT division_id, job_id, GROUP_ID(), SUM(salary) FROM employees2 GROUP BY division_id, ROLLUP(division_id, job_id); DIV JOB GROUP_ID() SUM(SALARY) --- --- ---------- ----------- BUS MGR 0 530000 BUS PRE 0 800000 BUS WOR 0 280000 OPE ENG 0 245000 OPE MGR 0 805000 OPE WOR 0 270000 SAL MGR 0 4446000 SAL WOR 0 490000 SUP MGR 0 465000 SUP TEC 0 115000 SUP WOR 0 435000 BUS 0 1610000 OPE 0 1320000 SAL 0 4936000 SUP 0 1015000 BUS 1 1610000 OPE 1 1320000 SAL 1 4936000 SUP 1 1015000
You can eliminate duplicate rows using a HAVING clause that only allows rows whose GROUP_ID() is 0; for example:
SELECT division_id, job_id, GROUP_ID(), SUM(salary) FROM employees2 GROUP BY division_id, ROLLUP(division_id, job_id) HAVING GROUP_ID() = 0; DIV JOB GROUP_ID() SUM(SALARY) --- --- ---------- ----------- BUS MGR 0 530000 BUS PRE 0 800000 BUS WOR 0 280000 OPE ENG 0 245000 OPE MGR 0 805000 OPE WOR 0 270000 SAL MGR 0 4446000 SAL WOR 0 490000 SUP MGR 0 465000 SUP TEC 0 115000 SUP WOR 0 435000 BUS 0 1610000 OPE 0 1320000 SAL 0 4936000 SUP 0 1015000
This concludes the discussion of the extended GROUP BY clauses.