Recipe 3.9. Performing Joins when Using AggregatesProblemYou want to perform an aggregation but your query involves multiple tables. You want to ensure that joins do not disrupt the aggregation. For example, you want to find the sum of the salaries for employees in department 10 along with the sum of their bonuses. Some employees have more than one bonus and the join between table EMP and table EMP_BONUS is causing incorrect values to be returned by the aggregate function SUM. For this problem, table EMP_BONUS contains the following data: select * from emp_bonus EMPNO RECEIVED TYPE ----- ----------- ---------- 7934 17-MAR-2005 1 7934 15-FEB-2005 2 7839 15-FEB-2005 3 7782 15-FEB-2005 1 Now, consider the following query that returns the salary and bonus for all employees in department 10. Table BONUS.TYPE determines the amount of the bonus. A type 1 bonus is 10% of an employee's salary, type 2 is 20%, and type 3 is 30%. select e.empno, e.ename, e.sal, e.deptno, e.sal*case when eb.type = 1 then .1 when eb.type = 2 then .2 else .3 end as bonus from emp e, emp_bonus eb where e.empno = eb.empno and e.deptno = 10 EMPNO ENAME SAL DEPTNO BONUS ------- ---------- ---------- ---------- --------- 7934 MILLER 1300 10 130 7934 MILLER 1300 10 260 7839 KING 5000 10 1500 7782 CLARK 2450 10 245 So far, so good. However, things go awry when you attempt a join to the EMP_ BONUS table in order to sum the bonus amounts: select deptno, sum(sal) as total_sal, sum(bonus) as total_bonus from ( select e.empno, e.ename, e.sal, e.deptno, e.sal*case when eb.type = 1 then .1 when eb.type = 2 then .2 else .3 end as bonus from emp e, emp_bonus eb where e.empno = eb.empno and e.deptno = 10 ) x group by deptno DEPTNO TOTAL_SAL TOTAL_BONUS ------ ----------- ----------- 10 10050 2135 While the TOTAL_BONUS is correct, the TOTAL_SAL is incorrect. The sum of all salaries in department 10 is 8750, as the following query shows: select sum(sal) from emp where deptno=10 SUM(SAL) ---------- 8750 Why is TOTAL_SAL incorrect? The reason is the duplicate rows in the SAL column created by the join. Consider the following query, which joins table EMP and EMP_ BONUS: select e.ename, e.sal from emp e, emp_bonus eb where e.empno = eb.empno and e.deptno = 10 ENAME SAL ---------- ---------- CLARK 2450 KING 5000 MILLER 1300 MILLER 1300 Now it is easy to see why the value for TOTAL_SAL is incorrect: MILLER's salary is counted twice. The final result set that you are really after is: DEPTNO TOTAL_SAL TOTAL_BONUS ------ --------- ----------- 10 8750 2135 SolutionYou have to be careful when computing aggregates across joins. Typically when duplicates are returned due to a join, you can avoid miscalculations by aggregate functions in two ways: you can simply use the keyword DISTINCT in the call to the aggregate function, so only unique instances of each value are used in the computation; or you can perform the aggregation first (in an inline view) prior to joining, thus avoiding the incorrect computation by the aggregate function because the aggregate will already be computed before you even join, thus avoiding the problem altogether. The solutions that follow use DISTINCT. The "Discussion" section will discuss the technique of using an inline view to perform the aggregation prior to joining. MySQL and PostgreSQLPerform a sum of only the DISTINCT salaries: 1 select deptno, 2 sum(distinct sal) as total_sal, 3 sum(bonus) as total_bonus 4 from ( 5 select e.empno, 6 e.ename, 7 e.sal, 8 e.deptno, 9 e.sal*case when eb.type = 1 then .1 10 when eb.type = 2 then .2 11 else .3 12 end as bonus 13 from emp e, emp_bonus eb 14 where e.empno = eb.empno 15 and e.deptno = 10 16 ) x 17 group by deptno DB2, Oracle, and SQL ServerThese platforms support the preceding solution, but they also support an alternative solution using the window function SUM OVER: 1 select distinct deptno,total_sal,total_bonus 2 from ( 3 select e.empno, 4 e.ename, 5 sum(distinct e.sal) over 6 (partition by e.deptno) as total_sal, 7 e.deptno, 8 sum(e.sal*case when eb.type = 1 then .1 9 when eb.type = 2 then .2 10 else .3 end) over 11 (partition by deptno) as total_bonus 12 from emp e, emp_bonus eb 13 where e.empno = eb.empno 14 and e.deptno = 10 15 ) x DiscussionMySQL and PostgreSQLThe second query in the "Problem" section of this recipe joins table EMP and table EMP_BONUS and returns two rows for employee "MILLER", which is what causes the error on the sum of EMP.SAL (the salary is added twice). The solution is to simply sum the distinct EMP.SAL values that are returned by the query. The following query is an alternative solution. The sum of all salaries in department 10 is computed first and that row is then joined to table EMP, which is then joined to table EMP_BONUS. The following query works for all DBMSs: select d.deptno, d.total_sal, sum(e.sal*case when eb.type = 1 then .1 when eb.type = 2 then .2 else .3 end) as total_bonus from emp e, emp_bonus eb, ( select deptno, sum(sal) as total_sal from emp where deptno = 10 group by deptno ) d where e.deptno = d.deptno and e.empno = eb.empno group by d.deptno,d.total_sal DEPTNO TOTAL_SAL TOTAL_BONUS --------- ---------- ------------ 10 8750 2135 DB2, Oracle, and SQL ServerThis alternative solution takes advantage of the window function SUM OVER. The following query is taken from lines 314 in "Solution" and returns the following result set: select e.empno, e.ename, sum(distinct e.sal) over (partition by e.deptno) as total_sal, e.deptno, sum(e.sal*case when eb.type = 1 then .1 when eb.type = 2 then .2 else .3 end) over (partition by deptno) as total_bonus from emp e, emp_bonus eb where e.empno = eb.empno and e.deptno = 10 EMPNO ENAME TOTAL_SAL DEPTNO TOTAL_BONUS ----- ---------- ---------- ------ ----------- 7934 MILLER 8750 10 2135 7934 MILLER 8750 10 2135 7782 CLARK 8750 10 2135 7839 KING 8750 10 2135 The windowing function, SUM OVER, is called twice, first to compute the sum of the distinct salaries for the defined partition or group. In this case, the partition is DEPTNO 10 and the sum of the distinct salaries for DEPTNO 10 is 8750. The next call to SUM OVER computes the sum of the bonuses for the same defined partition. The final result set is produced by taking the distinct values for TOTAL_SAL, DEPTNO, and TOTAL_BONUS. |