Recipe 12.18. Performing Aggregations over Different Groups/Partitions SimultaneouslyProblemYou want to aggregate over different dimensions at the same time. For example, you want to return a result set that lists each employee's name, his department, the number of employees in his department (himself included), the number of employees that have the same job as he does (himself included in this count as well), and the total number of employees in the EMP table. The result set should look like the following: ENAME DEPTNO DEPTNO_CNT JOB JOB_CNT TOTAL ------ ------ ---------- --------- -------- ------ MILLER 10 3 CLERK 4 14 CLARK 10 3 MANAGER 3 14 KING 10 3 PRESIDENT 1 14 SCOTT 20 5 ANALYST 2 14 FORD 20 5 ANALYST 2 14 SMITH 20 5 CLERK 4 14 JONES 20 5 MANAGER 3 14 ADAMS 20 5 CLERK 4 14 JAMES 30 6 CLERK 4 14 MARTIN 30 6 SALESMAN 4 14 TURNER 30 6 SALESMAN 4 14 WARD 30 6 SALESMAN 4 14 ALLEN 30 6 SALESMAN 4 14 BLAKE 30 6 MANAGER 3 14 SolutionWindow functions make this problem quite easy to solve. If you do not have window functions available to you, you can use scalar subqueries. DB2, Oracle, and SQL ServerUse the COUNT OVER window function while specifying different partitions, or groups of data on which to perform aggregation: select ename, deptno, count(*)over(partition by deptno) deptno_cnt, job, count(*)over(partition by job) job_cnt, count(*)over() total from emp PostgreSQL and MySQLUse scalar subqueries in your SELECT list to perform the aggregate count operations on different groups of rows: 1 select e.ename, 2 e.deptno, 3 (select count(*) from emp d 4 where d.deptno = e.deptno) as deptno_cnt, 5 job, 6 (select count(*) from emp d 7 where d.job = e.job) as job_cnt, 8 (select count(*) from emp) as total 9 from emp e DiscussionDB2, Oracle, and SQL ServerThis example really shows off the power and convenience of window functions. By simply specifying different partitions or groups of data to aggregate, you can create immensely detailed reports without having to self join over and over, and without having to write cumbersome and perhaps poorly performing subqueries in your SELECT list. All the work is done by the window function COUNT OVER. To understand the output, focus on the OVER clause for a moment for each COUNT operation: count(*)over(partition by deptno) count(*)over(partition by job) count(*)over() Remember the main parts of the OVER clause: the partition, specified by PARTITION BY: and the frame or window, specified by ORDER BY. Look at the first COUNT, which partitions by DEPTNO. The rows in table EMP will be grouped by DEPTNO and the COUNT operation will be performed on all the rows in each group. Since there is no frame or window clause specified (no ORDER BY), all the rows in the group are counted. The PARTITION BY clause finds all the unique DEPTNO values, and then the COUNT function counts the number of rows having each value. In the specific example of COUNT(*)OVER(PARTITION BY DEPTNO), The PARTITION BY clause identifies the partitions or groups to be values 10, 20, and 30. The same processing is applied to the second COUNT, which partitions by JOB. The last count does not partition by anything, and simply has an empty parenthesis. An empty parenthesis implies "the whole table." So, whereas the two prior COUNTs aggregate values based on the defined groups or partitions, the final COUNT counts all rows in table EMP.
PostgreSQL and MySQLFor every row returned by the main query (rows from EMP E), use multiple scalar subqueries in the SELECT list to perform different counts for each DEPTNO and JOB. To get the TOTAL, simply use another scalar subquery to get the count of all employees in table EMP. |