Recipe7.7.Generating a Running Product | SQL Cookbook (Cookbooks (OReilly))

Recipe 7.7. Generating a Running Product

Problem

You want to compute a running product on a numeric column. The operation is similar to "Calculating a Running Total," but using multiplication instead of addition.

Solution

By way of example, the solutions all compute running products of employee salaries. While a running product of salaries may not be all that useful, the technique can easily be applied to other, more useful domains.

DB2 and Oracle

Use the windowing function SUM OVER and take advantage of the fact that you can simulate multiplication by adding logarithms:

  1 select empno,ename,sal, 2        exp(sum(ln(sal))over(order by sal,empno)) as running_prod 3   from emp 4  where deptno = 10 EMPNO ENAME          SAL            RUNNING_PROD ----- ----------    ----    --------------------  7934 MILLER        1300                    1300  7782 CLARK         2450                 3185000  7839 KING          5000             15925000000

It is not valid in SQL to compute logarithms of values less than or equal to zero. If you have such values in your tables you need to avoid passing those invalid values to SQL's LN function. Precautions against invalid values and NULLs are not provided in this solution for the sake of readability, but you should consider whether to place such precautions in production code that you write. If you absolutely must work with negative and zero values, then this solution may not work for you.

An alternative, Oracle-only solution is to use the MODEL clause that became available in Oracle Database 10g. In the following example, each SAL is returned as a negative number to show that negative values will not cause a problem for the running product:

  1 select empno, ename, sal, tmp as running_prod 2   from ( 3 select empno,ename,-sal as sal 4   from emp 5  where deptno=10 6        ) 7 model 8   dimension by(row_number()over(order by sal desc) rn ) 9   measures(sal, 0 tmp, empno, ename) 10  rules ( 11    tmp[any] = case when sal[cv()-1] is null then sal[cv()] 12                    else tmp[cv()-1]*sal[cv()] 13               end 14  ) EMPNO ENAME       SAL         RUNNING_PROD ----- ---------- ---- --------------------  7934 MILLER     -1300               -1300  7782 CLARK      -2450             3185000  7839 KING       -5000        -15925000000

MySQL, PostgreSQL, and SQL Server

You still use the approach of summing logarithms, but these platforms do not support windowing functions, so use a scalar subquery instead:

  1 select e.empno,e.ename,e.sal, 2        (select exp(sum(ln(d.sal))) 3           from emp d 4          where d.empno <= e.empno 5            and e.deptno=d.deptno) as running_prod 6 from emp e 7 where e.deptno=10 EMPNO  ENAME         SAL            RUNNING_PROD -----  ----------   ----    --------------------  7782  CLARK        2450                    2450  7839  KING         5000                12250000  7934  MILLER       1300             15925000000

SQL Server users use LOG instead of LN.

Discussion

Except for the MODEL clause solution, which is only usable with Oracle Database 10g or later, all the solutions take advantage of the fact that you can sum two numbers by:

Computing their respective natural logarithms
Summing those logarithms
Raising the result to the power of the mathematical constant e (using the EXP function)

The one caveat when using this approach is that it doesn't work for summing zero or negative values, because any value less than or equal to zero is out of range for an SQL logarithm.

DB2 and Oracle

For an explanation of how the window function SUM OVER works, see the previous recipe "Generating a Running Total."

In Oracle Database 10g and later, you can generate running products via the MODEL clause. Using the MODEL clause along with the window function ROW_NUMBER allows you to easily access prior rows. Each item in the MEASURES list can be accessed like an array. The arrays can then be searched by using the items in the DIMENSIONS list (which are the values returned by ROW_NUMBER, alias RN):

  select empno, ename, sal, tmp as running_prod,rn   from ( select empno,ename,-sal as sal   from emp  where deptno=10        )  model    dimension by(row_number()over(order by sal desc) rn )    measures(sal, 0 tmp, empno, ename)   rules () EMPNO   ENAME              SAL  RUNNING_PROD            RN -----   ----------  ----------  ------------    ----------  7934   MILLER           -1300             0             1  7782   CLARK            -2450             0             2  7839   KING             -5000             0             3

Observe that SAL[1] has a value of1300. Because the numbers are increasing by one with no gaps, you can reference prior rows by subtracting one. The RULES clause:

 rules (    tmp[any] = case when sal[cv()-1] is null then sal[cv()]                    else tmp[cv()-1]*sal[cv()]               end )

uses the built-in operator, ANY, to work through each row without hard-coding. ANY in this case will be the values 1, 2, and 3. TMP[n] is initialized to zero. A value is assigned to TMP[n] by evaluating the current value (the function CV returns the current value) of the corresponding SAL row. TMP[1] is initially zero and SAL[1] is1300. There is no value for SAL[0] so TMP[1] is set to SAL[1]. After TMP[1] is set, the next row is TMP[2]. First SAL[1] is evaluated (SAL[CV( )1] is SAL[1] because the current value of ANY is now 2). SAL[1] is not null, it is1300, so TMP[2] is set to the product of TMP[1] and SAL[2]. This is continued for all the rows.

MySQL, PostgreSQL, and SQL Server

See "Generating a Running Total" earlier in this chapter for an explanation of the subquery approach used for the MySQL, PostgreSQL, and SQL Server solutions.

Be aware that the output of the subquery-based solution is slightly different from that of the Oracle and DB2 solutions due to the EMPNO comparison (the running product is computed in a different order). Like a running total, the summation is driven by the predicate of the scalar subquery; the ordering of rows is by EMPNO for this solution whereas the Oracle/DB2 solution order is by SAL.