Recipe12.18.Performing Aggregations over Different GroupsPartitions Simultaneously

Recipe 12.18. Performing Aggregations over Different Groups/Partitions Simultaneously

Problem

You want to aggregate over different dimensions at the same time. For example, you want to return a result set that lists each employee's name, his department, the number of employees in his department (himself included), the number of employees that have the same job as he does (himself included in this count as well), and the total number of employees in the EMP table. The result set should look like the following:

 ENAME  DEPTNO DEPTNO_CNT JOB       JOB_CNT   TOTAL ------ ------ ---------- --------- -------- ------ MILLER     10          3 CLERK            4     14 CLARK      10          3 MANAGER          3     14 KING       10          3 PRESIDENT        1     14 SCOTT      20          5 ANALYST          2     14 FORD       20          5 ANALYST          2     14 SMITH      20          5 CLERK            4     14 JONES      20          5 MANAGER          3     14 ADAMS      20          5 CLERK            4     14 JAMES      30          6 CLERK            4     14 MARTIN     30          6 SALESMAN         4     14 TURNER     30          6 SALESMAN         4     14 WARD       30          6 SALESMAN         4     14 ALLEN      30          6 SALESMAN         4     14 BLAKE      30          6 MANAGER          3     14

Solution

Window functions make this problem quite easy to solve. If you do not have window functions available to you, you can use scalar subqueries.

DB2, Oracle, and SQL Server

Use the COUNT OVER window function while specifying different partitions, or groups of data on which to perform aggregation:

 select ename,        deptno,        count(*)over(partition by deptno) deptno_cnt,        job,        count(*)over(partition by job) job_cnt,        count(*)over() total   from emp

PostgreSQL and MySQL

Use scalar subqueries in your SELECT list to perform the aggregate count operations on different groups of rows:

  1 select e.ename,  2        e.deptno,  3        (select count(*) from emp d  4          where d.deptno = e.deptno) as deptno_cnt,  5        job,  6        (select count(*) from emp d  7          where d.job = e.job) as job_cnt,  8        (select count(*) from emp) as total  9   from emp e

Discussion

DB2, Oracle, and SQL Server

This example really shows off the power and convenience of window functions. By simply specifying different partitions or groups of data to aggregate, you can create immensely detailed reports without having to self join over and over, and without having to write cumbersome and perhaps poorly performing subqueries in your SELECT list. All the work is done by the window function COUNT OVER. To understand the output, focus on the OVER clause for a moment for each COUNT operation:

 count(*)over(partition by deptno) count(*)over(partition by job) count(*)over()

Remember the main parts of the OVER clause: the partition, specified by PARTITION BY: and the frame or window, specified by ORDER BY. Look at the first COUNT, which partitions by DEPTNO. The rows in table EMP will be grouped by DEPTNO and the COUNT operation will be performed on all the rows in each group. Since there is no frame or window clause specified (no ORDER BY), all the rows in the group are counted. The PARTITION BY clause finds all the unique DEPTNO values, and then the COUNT function counts the number of rows having each value. In the specific example of COUNT(*)OVER(PARTITION BY DEPTNO), The PARTITION BY clause identifies the partitions or groups to be values 10, 20, and 30.

The same processing is applied to the second COUNT, which partitions by JOB. The last count does not partition by anything, and simply has an empty parenthesis. An empty parenthesis implies "the whole table." So, whereas the two prior COUNTs aggregate values based on the defined groups or partitions, the final COUNT counts all rows in table EMP.

Keep in mind that window functions are applied after the WHERE clause. If you were to filter the result set in some way, for example, excluding all employees in DEPTNO 10, the value for TOTAL would not be 14, it would be 11. To filter results after window functions have been evaluated, you must make your windowing query into an inline view and then filter on the results from that view.

PostgreSQL and MySQL

For every row returned by the main query (rows from EMP E), use multiple scalar subqueries in the SELECT list to perform different counts for each DEPTNO and JOB. To get the TOTAL, simply use another scalar subquery to get the count of all employees in table EMP.