Recipe3.9.Performing Joins when Using Aggregates


Recipe 3.9. Performing Joins when Using Aggregates

Problem

You want to perform an aggregation but your query involves multiple tables. You want to ensure that joins do not disrupt the aggregation. For example, you want to find the sum of the salaries for employees in department 10 along with the sum of their bonuses. Some employees have more than one bonus and the join between table EMP and table EMP_BONUS is causing incorrect values to be returned by the aggregate function SUM. For this problem, table EMP_BONUS contains the following data:

  select * from emp_bonus EMPNO RECEIVED          TYPE ----- ----------- ----------  7934 17-MAR-2005          1  7934 15-FEB-2005          2  7839 15-FEB-2005          3  7782 15-FEB-2005          1 

Now, consider the following query that returns the salary and bonus for all employees in department 10. Table BONUS.TYPE determines the amount of the bonus. A type 1 bonus is 10% of an employee's salary, type 2 is 20%, and type 3 is 30%.

  select e.empno,        e.ename,        e.sal,        e.deptno,        e.sal*case when eb.type = 1 then .1                   when eb.type = 2 then .2                   else .3              end as bonus  from emp e, emp_bonus eb where e.empno  = eb.empno   and e.deptno = 10   EMPNO ENAME             SAL     DEPTNO     BONUS ------- ---------- ---------- ---------- ---------    7934 MILLER           1300         10       130    7934 MILLER           1300         10       260    7839 KING             5000         10      1500    7782 CLARK            2450         10       245 

So far, so good. However, things go awry when you attempt a join to the EMP_ BONUS table in order to sum the bonus amounts:

  select deptno,        sum(sal) as total_sal,        sum(bonus) as total_bonus   from ( select e.empno,        e.ename,        e.sal,        e.deptno,        e.sal*case when eb.type = 1 then .1                   when eb.type = 2 then .2                   else .3              end as bonus   from emp e, emp_bonus eb  where e.empno  = eb.empno    and e.deptno = 10        ) x  group by deptno DEPTNO   TOTAL_SAL  TOTAL_BONUS ------ -----------  -----------     10       10050         2135 

While the TOTAL_BONUS is correct, the TOTAL_SAL is incorrect. The sum of all salaries in department 10 is 8750, as the following query shows:

  select sum(sal) from emp where deptno=10   SUM(SAL) ----------       8750 

Why is TOTAL_SAL incorrect? The reason is the duplicate rows in the SAL column created by the join. Consider the following query, which joins table EMP and EMP_ BONUS:

  select e.ename,        e.sal   from emp e, emp_bonus eb  where e.empno  = eb.empno    and e.deptno = 10 ENAME             SAL ---------- ---------- CLARK            2450 KING             5000 MILLER           1300 MILLER           1300 

Now it is easy to see why the value for TOTAL_SAL is incorrect: MILLER's salary is counted twice. The final result set that you are really after is:

 DEPTNO TOTAL_SAL TOTAL_BONUS ------ --------- -----------     10      8750        2135 

Solution

You have to be careful when computing aggregates across joins. Typically when duplicates are returned due to a join, you can avoid miscalculations by aggregate functions in two ways: you can simply use the keyword DISTINCT in the call to the aggregate function, so only unique instances of each value are used in the computation; or you can perform the aggregation first (in an inline view) prior to joining, thus avoiding the incorrect computation by the aggregate function because the aggregate will already be computed before you even join, thus avoiding the problem altogether. The solutions that follow use DISTINCT. The "Discussion" section will discuss the technique of using an inline view to perform the aggregation prior to joining.

MySQL and PostgreSQL

Perform a sum of only the DISTINCT salaries:

  1 select deptno,  2        sum(distinct sal) as total_sal,  3        sum(bonus) as total_bonus  4   from (  5 select e.empno,  6        e.ename,  7        e.sal,  8        e.deptno,  9        e.sal*case when eb.type = 1 then .1 10                   when eb.type = 2 then .2 11                   else .3 12               end as bonus 13   from emp e, emp_bonus eb 14   where e.empno = eb.empno 15    and e.deptno = 10 16        ) x 17  group by deptno 

DB2, Oracle, and SQL Server

These platforms support the preceding solution, but they also support an alternative solution using the window function SUM OVER:

  1 select distinct deptno,total_sal,total_bonus  2   from (  3 select e.empno,  4        e.ename,  5        sum(distinct e.sal) over  6        (partition by e.deptno) as total_sal,  7         e.deptno,  8         sum(e.sal*case when eb.type = 1 then .1  9                        when eb.type = 2 then .2 10                        else .3 end) over 11         (partition by deptno) as total_bonus 12    from emp e, emp_bonus eb 13   where e.empno = eb.empno 14     and e.deptno = 10 15         ) x 

Discussion

MySQL and PostgreSQL

The second query in the "Problem" section of this recipe joins table EMP and table EMP_BONUS and returns two rows for employee "MILLER", which is what causes the error on the sum of EMP.SAL (the salary is added twice). The solution is to simply sum the distinct EMP.SAL values that are returned by the query. The following query is an alternative solution. The sum of all salaries in department 10 is computed first and that row is then joined to table EMP, which is then joined to table EMP_BONUS. The following query works for all DBMSs:

  select d.deptno,        d.total_sal,        sum(e.sal*case when eb.type = 1 then .1                       when eb.type = 2 then .2                       else .3 end) as total_bonus   from emp e,        emp_bonus eb,        ( select deptno, sum(sal) as total_sal   from emp  where deptno = 10  group by deptno         ) d  where e.deptno = d.deptno    and e.empno = eb.empno  group by d.deptno,d.total_sal    DEPTNO  TOTAL_SAL  TOTAL_BONUS --------- ---------- ------------        10       8750         2135 

DB2, Oracle, and SQL Server

This alternative solution takes advantage of the window function SUM OVER. The following query is taken from lines 314 in "Solution" and returns the following result set:

  select e.empno,           e.ename,           sum(distinct e.sal) over           (partition by e.deptno) as total_sal,           e.deptno,           sum(e.sal*case when eb.type = 1 then .1                          when eb.type = 2 then .2                         else .3 end) over          (partition by deptno) as total_bonus     from emp e, emp_bonus eb    where e.empno  = eb.empno      and e.deptno = 10 EMPNO ENAME        TOTAL_SAL   DEPTNO  TOTAL_BONUS ----- ----------  ----------   ------  -----------  7934 MILLER            8750       10         2135  7934 MILLER            8750       10         2135  7782 CLARK             8750       10         2135  7839 KING              8750       10         2135 

The windowing function, SUM OVER, is called twice, first to compute the sum of the distinct salaries for the defined partition or group. In this case, the partition is DEPTNO 10 and the sum of the distinct salaries for DEPTNO 10 is 8750. The next call to SUM OVER computes the sum of the bonuses for the same defined partition. The final result set is produced by taking the distinct values for TOTAL_SAL, DEPTNO, and TOTAL_BONUS.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net