Recipe 12.19. Performing Aggregations over a Moving Range of ValuesProblemYou want to compute a moving aggregation, such as a moving sum on the salaries in table EMP. You want to compute a sum for every 90 days, starting with the HIREDATE of the first employee. You want to see how spending has fluctuated for every 90-day period between the first and last employee hired. You want to return the following result set: HIREDATE SAL SPENDING_PATTERN ----------- ------- ---------------- 17-DEC-1980 800 800 20-FEB-1981 1600 2400 22-FEB-1981 1250 3650 02-APR-1981 2975 5825 01-MAY-1981 2850 8675 09-JUN-1981 2450 8275 08-SEP-1981 1500 1500 28-SEP-1981 1250 2750 17-NOV-1981 5000 7750 03-DEC-1981 950 11700 03-DEC-1981 3000 11700 23-JAN-1982 1300 10250 09-DEC-1982 3000 3000 12-JAN-1983 1100 4100 SolutionBeing able to specify a moving window in the framing or windowing clause of window functions makes this problem very easy to solve, if your RDBMS supports such functions. The key is to order by HIREDATE in your window function and then specify a window of 90 days starting from the earliest employee hired. The sum will be computed using the salaries of employees hired up to 90 days prior to the current employee's HIREDATE (the current employee is included in the sum). If you do not have window functions available, you can use scalar subqueries, but the solution will be more complex. DB2 and OracleFor DB2 and Oracle, use the window function SUM OVER and order by HIREDATE. Specify a range of 90 days in the window or "framing" clause to allow the sum to be computed for each employee's salary and to include the salaries of all employees hired up to 90 days earlier. Because DB2 does not allow you to specify HIREDATE in the ORDER BY clause of a window function (line 3 below), you can order by DAYS(HIREDATE) instead: 1 select hiredate, 2 sal, 3 sum(sal)over(order by days(hiredate) 4 range between 90 preceding 5 and current row) spending_pattern 6 from emp e The Oracle solution is more straightforward than DB2's, because Oracle allows window functions to order by datetime types: 1 select hiredate, 2 sal, 3 sum(sal)over(order by hiredate 4 range between 90 preceding 5 and current row) spending_pattern 6 from emp e MySQL, PostgreSQL, and SQL ServerUse a scalar subquery to sum the salaries of all employees hired up to 90 days prior to the day each employee was hired: 1 select e.hiredate, 2 e.sal, 3 (select sum(sal) from emp d 4 where d.hiredate between e.hiredate-90 5 and e.hiredate) as spending_pattern 6 from emp e 7 order by 1 DiscussionDB2 and OracleDB2 and Oracle share the same solution. The only difference, and it's minor between the two solutions, lies in how you specify HIREDATE in the ORDER BY clause of the window function. At the time of this book's writing, DB2 doesn't allow a DATE value in such an ORDER BY clause if you are using a numeric value to set the window's range. (For example, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW allows you to order by a date, but RANGE BETWEEN 90 PRECEDING AND CURRENT ROW does not.) To understand what the solution query is doing, you simply need to understand what the window clause is doing. The window you are defining orders the salaries for all employees by HIREDATE. Then the function computes a sum. The sum is not computed for all salaries. Instead, the processing is as follows:
The HIREDATE of the first employee is December 17, 1980, and the HIREDATE of the next hired employee is February 20, 1981. The second employee was hired less than 90 days after the first employee, and thus the moving sum for the second employee is 2400 (1600 + 800). If you are having trouble understanding where the values in SPENDING_PATTERN come from, examine the following query and result set: select distinct dense_rank( )over(order by e.hiredate) window, e.hiredate current_hiredate, d.hiredate hiredate_within_90_days, d.sal sals_used_for_sum from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate WINDOW CURRENT_HIREDATE HIREDATE_WITHIN_90_DAYS SALS_USED_FOR_SUM ------ ---------------- ----------------------- ----------------- 1 17-DEC-1980 17-DEC-1980 800 2 20-FEB-1981 17-DEC-1980 800 2 20-FEB-1981 20-FEB-1981 1600 3 22-FEB-1981 17-DEC-1980 800 3 22-FEB-1981 20-FEB-1981 1600 3 22-FEB-1981 22-FEB-1981 1250 4 02-APR-1981 20-FEB-1981 1600 4 02-APR-1981 22-FEB-1981 1250 4 02-APR-1981 02-APR-1981 2975 5 01-MAY-1981 20-FEB-1981 1600 5 01-MAY-1981 22-FEB-1981 1250 5 01-MAY-1981 02-APR-1981 2975 5 01-MAY-1981 01-MAY-1981 2850 6 09-JUN-1981 02-APR-1981 2975 6 09-JUN-1981 01-MAY-1981 2850 6 09-JUN-1981 09-JUN-1981 2450 7 08-SEP-1981 08-SEP-1981 1500 8 28-SEP-1981 08-SEP-1981 1500 8 28-SEP-1981 28-SEP-1981 1250 9 17-NOV-1981 08-SEP-1981 1500 9 17-NOV-1981 28-SEP-1981 1250 9 17-NOV-1981 17-NOV-1981 5000 10 03-DEC-1981 08-SEP-1981 1500 10 03-DEC-1981 28-SEP-1981 1250 10 03-DEC-1981 17-NOV-1981 5000 10 03-DEC-1981 03-DEC-1981 950 10 03-DEC-1981 03-DEC-1981 3000 11 23-JAN-1982 17-NOV-1981 5000 11 23-JAN-1982 03-DEC-1981 950 11 23-JAN-1982 03-DEC-1981 3000 11 23-JAN-1982 23-JAN-1982 1300 12 09-DEC-1982 09-DEC-1982 3000 13 12-JAN-1983 09-DEC-1982 3000 13 12-JAN-1983 12-JAN-1983 1100 If you look at the WINDOW column, only those rows with the same WINDOW value will be considered for each sum. Take for example, WINDOW 3. The salaries used for the sum for that window are 800, 1600, and 1250, which total 3650. If you look at the final result set in the "Problem" section, you'll see the SPENDING_PATTERN for February 22, 1981 (WINDOW 3) is 3650. As proof, to verify that the above self join includes the correct salaries for the windows defined, simply sum the values in SALS_USED_FOR_SUM and group by CURRENT_DATE. The result should be the same as the result set shown in the "Problem" section (with the duplicate row for December 3, 1981, filtered out): select current_hiredate, sum(sals_used_for_sum) spending_pattern from ( select distinct dense_rank()over(order by e.hiredate) window, e.hiredate current_hiredate, d.hiredate hiredate_within_90_days, d.sal sals_used_for_sum from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate ) x group by current_hiredate CURRENT_HIREDATE SPENDING_PATTERN ---------------- ---------------- 17-DEC-1980 800 20-FEB-1981 2400 22-FEB-1981 3650 02-APR-1981 5825 01-MAY-1981 8675 09-JUN-1981 8275 08-SEP-1981 1500 28-SEP-1981 2750 17-NOV-1981 7750 03-DEC-1981 11700 23-JAN-1982 10250 09-DEC-1982 3000 12-JAN-1983 4100 MySQL, PostgreSQL, and SQL ServerThe key to this solution is to use a scalar subquery (a self join will work as well) while using the aggregate function SUM to compute a sum for every 90 days based on HIREDATE. If you are having trouble seeing how this works, simply convert the solution to a self join and examine which rows are included in the computations. Consider the result set below, which returns the same result set as that in the solution: select e.hiredate, e.sal, sum(d.sal) as spending_pattern from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate group by e.hiredate,e.sal order by 1\ HIREDATE SAL SPENDING_PATTERN ----------- ----- ---------------- 17-DEC-1980 800 800 20-FEB-1981 1600 2400 22-FEB-1981 1250 3650 02-APR-1981 2975 5825 01-MAY-1981 2850 8675 09-JUN-1981 2450 8275 08-SEP-1981 1500 1500 28-SEP-1981 1250 2750 17-NOV-1981 5000 7750 03-DEC-1981 950 11700 03-DEC-1981 3000 11700 23-JAN-1982 1300 10250 09-DEC-1982 3000 3000 12-JAN-1983 1100 4100 If it is still unclear, simply remove the aggregation and start with the Cartesian product. The first step is to generate a Cartesian product using table EMP so that each HIREDATE can be compared with all the other HIREDATEs. [Only a snippet of the result set is shown below because there are 196 rows (14x14) returned by a Cartesian of EMP.] select e.hiredate, e.sal, d.sal, d.hiredate from emp e, emp d HIREDATE SAL SAL HIREDATE ----------- ----- ----- ----------- 17-DEC-1980 800 800 17-DEC-1980 17-DEC-1980 800 1600 20-FEB-1981 17-DEC-1980 800 1250 22-FEB-1981 17-DEC-1980 800 2975 02-APR-1981 17-DEC-1980 800 1250 28-SEP-1981 17-DEC-1980 800 2850 01-MAY-1981 17-DEC-1980 800 2450 09-JUN-1981 17-DEC-1980 800 3000 09-DEC-1982 17-DEC-1980 800 5000 17-NOV-1981 17-DEC-1980 800 1500 08-SEP-1981 17-DEC-1980 800 1100 12-JAN-1983 17-DEC-1980 800 950 03-DEC-1981 17-DEC-1980 800 3000 03-DEC-1981 17-DEC-1980 800 1300 23-JAN-1982 20-FEB-1981 1600 800 17-DEC-1980 20-FEB-1981 1600 1600 20-FEB-1981 20-FEB-1981 1600 1250 22-FEB-1981 20-FEB-1981 1600 2975 02-APR-1981 20-FEB-1981 1600 1250 28-SEP-1981 20-FEB-1981 1600 2850 01-MAY-1981 20-FEB-1981 1600 2450 09-JUN-1981 20-FEB-1981 1600 3000 09-DEC-1982 20-FEB-1981 1600 5000 17-NOV-1981 20-FEB-1981 1600 1500 08-SEP-1981 20-FEB-1981 1600 1100 12-JAN-1983 20-FEB-1981 1600 950 03-DEC-1981 20-FEB-1981 1600 3000 03-DEC-1981 20-FEB-1981 1600 1300 23-JAN-1982 If you examine the result set above, you'll notice that there is no HIREDATE 90 days earlier or equal to December 17, except for December 17. So, the sum for that row should be only 800. If you examine the next HIREDATE, February 20, you'll notice that there is one HIREDATE that falls within the 90-day window (within 90 days prior), and that is December 17. If you sum the SAL from December 17 with the SAL from February 20 (because we are looking for HIREDATEs equal to each HIREDATE or within 90 days earlier) you get 2400, which happens to be the final result for that HIREDATE. Now that you know how it works, use a filter in the WHERE clause to return for each HIREDATE and HIREDATE that is equal to it or is no more than 90 days earlier: select e.hiredate, e.sal, d.sal sal_to_sum, d.hiredate within_90_days from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate order by 1 HIREDATE SAL SAL_TO_SUM WITHIN_90_DAYS ----------- ----- ---------- -------------- 17-DEC-1980 800 800 17-DEC-1980 20-FEB-1981 1600 800 17-DEC-1980 20-FEB-1981 1600 1600 20-FEB-1981 22-FEB-1981 1250 800 17-DEC-1980 22-FEB-1981 1250 1600 20-FEB-1981 22-FEB-1981 1250 1250 22-FEB-1981 02-APR-1981 2975 1600 20-FEB-1981 02-APR-1981 2975 1250 22-FEB-1981 02-APR-1981 2975 2975 02-APR-1981 01-MAY-1981 2850 1600 20-FEB-1981 01-MAY-1981 2850 1250 22-FEB-1981 01-MAY-1981 2850 2975 02-APR-1981 01-MAY-1981 2850 2850 01-MAY-1981 09-JUN-1981 2450 2975 02-APR-1981 09-JUN-1981 2450 2850 01-MAY-1981 09-JUN-1981 2450 2450 09-JUN-1981 08-SEP-1981 1500 1500 08-SEP-1981 28-SEP-1981 1250 1500 08-SEP-1981 28-SEP-1981 1250 1250 28-SEP-1981 17-NOV-1981 5000 1500 08-SEP-1981 17-NOV-1981 5000 1250 28-SEP-1981 17-NOV-1981 5000 5000 17-NOV-1981 03-DEC-1981 950 1500 08-SEP-1981 03-DEC-1981 950 1250 28-SEP-1981 03-DEC-1981 950 5000 17-NOV-1981 03-DEC-1981 950 950 03-DEC-1981 03-DEC-1981 950 3000 03-DEC-1981 03-DEC-1981 3000 1500 08-SEP-1981 03-DEC-1981 3000 1250 28-SEP-1981 03-DEC-1981 3000 5000 17-NOV-1981 03-DEC-1981 3000 950 03-DEC-1981 03-DEC-1981 3000 3000 03-DEC-1981 23-JAN-1982 1300 5000 17-NOV-1981 23-JAN-1982 1300 950 03-DEC-1981 23-JAN-1982 1300 3000 03-DEC-1981 23-JAN-1982 1300 1300 23-JAN-1982 09-DEC-1982 3000 3000 09-DEC-1982 12-JAN-1983 1100 3000 09-DEC-1982 12-JAN-1983 1100 1100 12-JAN-1983 Now that you know which SALs are to be included in the moving window of summation, simply use the aggregate function SUM to produce a more expressive result set: select e.hiredate, e.sal, sum(d.sal) as spending_pattern from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate group by e.hiredate,e.sal order by 1 If you compare the result set for the query above and the result set for the query below (which is the original solution presented), you will see they are the same: select e.hiredate, e.sal, (select sum(sal) from emp d where d.hiredate between e.hiredate-90 and e.hiredate) as spending_pattern from emp e order by 1 HIREDATE SAL SPENDING_PATTERN ----------- ----- ---------------- 17-DEC-1980 800 800 20-FEB-1981 1600 2400 22-FEB-1981 1250 3650 02-APR-1981 2975 5825 01-MAY-1981 2850 8675 09-JUN-1981 2450 8275 08-SEP-1981 1500 1500 28-SEP-1981 1250 2750 17-NOV-1981 5000 7750 03-DEC-1981 950 11700 03-DEC-1981 3000 11700 23-JAN-1982 1300 10250 09-DEC-1982 3000 3000 12-JAN-1983 1100 4100 |