Recipe14.13.Creating CSV Output from Oracle | SQL Cookbook (Cookbooks (OReilly))

Recipe 14.13. Creating CSV Output from Oracle

Problem

You want to create a delimited list (perhaps comma delimited) from rows in a table. For example, using table EMP, you want to return the following result set:

 DEPTNO LIST ------ --------------------------------------     10 MILLER,KING,CLARK     20 FORD,ADAMS,SCOTT,JONES,SMITH     30 JAMES,TURNER,BLAKE,MARTIN,WARD,ALLEN

You are on an Oracle system (Oracle Database 10g or later) and want to use the MODEL clause.

Solution

This solution takes advantage of the iteration capabilities of Oracle's MODEL clause. The technique is to use the window function ROW_NUMBER OVER to rank each employee (by EMPNO, which is arbitrary) in each DEPTNO. Because MODEL provides array access, you can access prior array elements by subtracting from the rank. So, for each row, create a list that includes each employee's name, plus the name of the employee ranked before the current employee:

  1  select deptno,  2         list  3    from (  4  select *  5   from (  6 select deptno,empno,ename,  7        lag(deptno)over(partition by deptno  8                            order by empno) prior_deptno  9   from emp 10        ) 11  model 12   dimension by 13   ( 14     deptno, 15     row_number()over(partition by deptno order by empno) rn 16   ) 17   measures 18   ( 19     ename, 20     prior_deptno,cast(null as varchar2(60)) list, 21     count(*)over(partition by deptno) cnt, 22     row_number()over(partition by deptno order by empno) rnk 23   ) 24   rules 25   ( 26    list[any,any] 27    order by deptno,rn = case when prior_deptno[cv(),cv( )] is null 28                              then ename[cv( ),cv( )] 29                              else ename[cv( ),cv( )]||','|| 30                                   list[cv(),rnk[cv( ),cv( )]-1] 31                         end 32   ) 33   ) 34  where cnt = rn

Discussion

The first step is to use the window function LAG OVER to return the DEPTNO of the previous employee (sorted by EMPNO). The results are partitioned by DEPTNO, so the return value will be NULL for the first employee (by EMPNO) in the department and DEPTNO for the rest. The results are as follows:

  select deptno,empno,ename,        lag(deptno)over(partition by deptno                            order by empno) prior_deptno   from emp DEPTNO      EMPNO ENAME  PRIOR_DEPTNO ------ ---------- ------ ------------     10       7782 CLARK     10       7839 KING             10     10       7934 MILLER           10     20       7369 SMITH     20       7566 JONES            20     20       7788 SCOTT            20     20       7876 ADAMS            20     20       7902 FORD             20     30       7499 ALLEN     30       7521 WARD             30     30       7654 MARTIN           30     30       7698 BLAKE            30     30       7844 TURNER           30     30       7900 JAMES            30

The next step is to examine the MEASURES subclause of the MODEL clause. The items in the MEASURES list are the arrays:

ENAME: An array of all the ENAMEs in EMP
PRIOR_DEPTNO: An array of the values returned by the LAG OVER window function
CNT: An array of the number of employees in each DEPTNO
RNK: An array of rankings (by EMPNO) for each employee in each DEPTNO

The array indices are DEPTNO and RN (the value returned by the ROW_NUMBER OVER window function in the DIMENSION BY subclause). To see what all these arrays contain, simply comment out the code listed in the RULES subclause of the MODEL clause and execute the query, as follows:

  select *   from ( select deptno,empno,ename,        lag(deptno)over(partition by deptno                            order by empno) prior_deptno   from emp        )  model    dimension by    (    deptno,    row_number()over(partition by deptno order by empno) rn    )    measures    (      ename,      prior_deptno,cast(null as varchar2(60)) list,      count(*)over(partition by deptno) cnt,      row_number()over(partition by deptno order by empno) rnk    )    rules    ( /*     list[any,any]     order by deptno,rn = case when prior_deptno[cv(),cv()] is null                               then ename[cv(),cv()]                               else ename[cv(),cv()]||','||                                    list[cv(),rnk[cv(),cv( )]-1]                          end */    )  order by 1 DEPTNO  RN ENAME  PRIOR_DEPTNO LIST       CNT RNK ------ --- ------ ------------ ---------- --- ----     10   1 CLARK                            3   1     10   2 KING             10              3   2     10   3 MILLER           10              3   3     20   1 SMITH                            5   1     20   2 JONES            20              5   2     20   4 ADAMS            20              5   4     20   5 FORD             20              5   5     20   3 SCOTT            20              5   3     30   1 ALLEN                            6   1     30   6 JAMES            30              6   6     30   4 BLAKE            30              6   4     30   3 MARTIN           30              6   3     30   5 TURNER           30              6   5     30   2 WARD             30              6   2

Now that you know exactly what each item declared in the MODEL clause does, continue on to the RULES subclause. If you look at the CASE expression, you'll see that the current value for PRIOR_DEPTNO is being evaluated. If that value is NULL, it signifies that the first employee in each DEPTNO and ENAME should be returned to that employee's LIST array. If the value for PRIOR_DEPTNO is not NULL, then append the value of the prior employee's LIST to the current employee's name (ENAME array), and then return that result as the current employee's LIST. This CASE expression operation, when performed for each row in DEPTNO, results in an iteratively built comma-separated values (CSV) list. You can see the intermediate results in the following example:

  select deptno,        list   from ( select *   from ( select deptno,empno,ename,        lag(deptno)over(partition by deptno                            order by empno) prior_deptno   from emp        )  model    dimension by    (      deptno,      row_number()over(partition by deptno order by empno) rn    )    measures    (      ename,      prior_deptno,cast(null as varchar2(60)) list,      count(*)over(partition by deptno) cnt,      row_number()over(partition by deptno order by empno) rnk    )    rules    (      list[any,any]      order by deptno,rn = case when prior_deptno[cv(),cv()] is null                                then ename[cv(),cv()]                                else ename[cv(),cv()]||','||                                     list[cv(),rnk[cv(),cv( )]-1]                          end    )    ) DEPTNO LIST ------ ---------------------------------------     10 CLARK     10 KING,CLARK     10 MILLER,KING,CLARK     20 SMITH     20 JONES,SMITH     20 SCOTT,JONES,SMITH     20 ADAMS,SCOTT,JONES,SMITH     20 FORD,ADAMS,SCOTT,JONES,SMITH     30 ALLEN     30 WARD,ALLEN     30 MARTIN,WARD,ALLEN     30 BLAKE,MARTIN,WARD,ALLEN     30 TURNER,BLAKE,MARTIN,WARD,ALLEN     30 JAMES,TURNER,BLAKE,MARTIN,WARD,ALLEN

The last step is to keep only the last employee in each DEPTNO to ensure that you have a complete CSV list for each DEPTNO. Use the values stored in the CNT array and the values stored in the RN array to keep only the completed CSV for each DEPTNO. Because RN represents a ranking of employees in each DEPTNO by EMPNO, the last employee in each DEPTNO will be the one where CNT = RN, as the following example shows:

  select deptno,        list   from ( select *   from ( select deptno,empno,ename,        lag(deptno)over(partition by deptno                            order by empno) prior_deptno   from emp        )  model    dimension by    (      deptno,      row_number()over(partition by deptno order by empno) rn    )    measures    (      ename,      prior_deptno,cast(null as varchar2(60)) list,      count(*)over(partition by deptno) cnt,      row_number()over(partition by deptno order by empno) rnk    )    rules    (      list[any,any]      order by deptno,rn = case when prior_deptno[cv(),cv()] is null                                then ename[cv(),cv()]                                else ename[cv(),cv()]||','||                                     list[cv(),rnk[cv(),cv( )]-1]                           end    )    )  where cnt = rn DEPTNO LIST ------ ----------------------------------------     10 MILLER,KING,CLARK     20 FORD,ADAMS,SCOTT,JONES,SMITH     30 JAMES,TURNER,BLAKE,MARTIN,WARD,ALLEN