Recipe6.10.Creating a Delimited List from Table Rows | SQL Cookbook (Cookbooks (OReilly))

Recipe 6.10. Creating a Delimited List from Table Rows

Problem

You want to return table rows as values in a delimited list, perhaps delimited by commas, rather than in vertical columns as they normally appear. You want to convert a result set from this:

 DEPTNO EMPS ------ ----------     10 CLARK     10 KING     10 MILLER     20 SMITH     20 ADAMS     20 FORD     20 SCOTT     20 JONES     30 ALLEN     30 BLAKE     30 MARTIN     30 JAMES     30 TURNER     30 WARD

to this:

  DEPTNO EMPS ------- ------------------------------------      10 CLARK,KING,MILLER      20 SMITH,JONES,SCOTT,ADAMS,FORD      30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Solution

Each DBMS requires a different approach to this problem. The key is to take advantage of the built-in functions provided by your DBMS. Understanding what is available to you will allow you to exploit your DBMS's functionality and come up with creative solutions for a problem that is typically not solved in SQL.

DB2

Use recursive WITH to build the delimited list:

  1   with x (deptno, cnt, list, empno, len)  2     as (  3 select deptno, count(*) over (partition by deptno),  4        cast(ename as varchar(100)), empno, 1  5   from emp  6  union all  7  select x.deptno, x.cnt, x.list ||','|| e.ename, e.empno, x.len+1  8   from emp e, x  9  where e.deptno = x.deptno 10    and e.empno > x. empno 11        ) 12 select deptno,list 13   from x 14  where len = cnt

MySQL

Use the built-in function GROUP_CONCAT to build the delimited list:

 1 select deptno, 2        group_concat(ename order by empno separator, ',') as emps 3   from emp 4  group by deptno

Oracle

Use the built-in function SYS_CONNECT_BY_PATH to build the delimited list:

  1 select deptno,  2        ltrim(sys_connect_by_path(ename,','),',') emps  3   from (  4 select deptno,  5        ename,  6        row_number() over  7                 (partition by deptno order by empno) rn,  8        count(*) over  9                 (partition by deptno) cnt 10   from emp 11        ) 12  where level = cnt 13  start with rn = 1 14 connect by prior deptno = deptno and prior rn = rn-1

PostgreSQL

PostgreSQL does not offer a standard built-in function for creating a delimited list, so it is necessary to know how many values will be in the list in advance. Once you know the size of the largest list, you can determine the number of values to append to create your list by using standard transposition and concatenation:

  1 select deptno,  2        rtrim(  3            max(case when pos=1 then emps else '' end)||  4            max(case when pos=2 then emps else '' end)||  5            max(case when pos=3 then emps else '' end)||  6            max(case when pos=4 then emps else '' end)||  7            max(case when pos=5 then emps else '' end)||  8            max(case when pos=6 then emps else '' end),','  9        ) as emps 10   from ( 11 select a.deptno, 12        a.ename||',' as emps, 13        d.cnt, 14        (select count(*) from emp b 15           where a.deptno=b.deptno and b.empno <= a.empno) as pos 16   from emp a, 17        (select deptno, count(ename) as cnt 18           from emp 19           group by deptno) d 20  where d.deptno=a.deptno 21        ) x 22  group by deptno 23  order by 1

SQL Server

Use recursive WITH to build the delimited list:

  1  with x (deptno, cnt, list, empno, len)  2     as (  3 select deptno, count(*) over (partition by deptno),  4        cast(ename as varchar(100)),  5        empno,  6        1  7   from emp  9  union all  9 select x.deptno, x.cnt, 10        cast(x.list + ',' + e.ename as varchar(100)), 11        e.empno, x.len+1 12   from emp e, x 13 where e.deptno = x.deptno 14   and e.empno > x. empno 15            ) 16 select deptno,list 17   from x 18  where len = cnt 19  order by 1

Discussion

Being able to create delimited lists in SQL is useful because it is a common requirement. Yet each DBMS offers a unique method for building such a list in SQL. There's very little commonality between the vendor-specific solutions; the techniques vary from using recursion, to hierarchal functions, to classic transposition, to aggregation.

DB2 and SQL Server

The solution for these two databases differ slightly in syntax (the concatenation operators are "||" for DB2 and "+" for SQL Server), but the technique is the same. The first query in the WITH clause (upper portion of the UNION ALL) returns the following information about each employee: the department, the number of employees in that department, the name, the ID, and a constant 1 (which at this point doesn't do anything). Recursion takes place in the second query (lower half of the UNION ALL) to build the list. To understand how the list is built, examine the following excerpts from the solution: first, the third SELECT-list item from the second query in the union:

 x.list ||','|| e.ename

and then the WHERE clause from that same query:

 where e.deptno = x.deptno   and e.empno > x.empno

The solution works by first ensuring the employees are in the same department. Then, for every employee returned by the upper portion of the UNION ALL, append the name of the employees who have a greater EMPNO. By doing this, you ensure that no employee will have his own name appended. The expression

 x.len+1

increments LEN (which starts at 1) every time an employee has been evaluated. When the incremented value equals the number of employees in the department:

 where len = cnt

you know you have evaluated all the employees and have completed building the list. That is crucial to the query as it not only signals when the list is complete, but also stops the recursion from running longer than necessary.

MySQL

The function GROUP_CONCAT does all the work. It concatenates the values found in the column passed to it, in this case ENAME. It's an aggregate function, thus the need for GROUP BY in the query.

Oracle

The first step to understanding the Oracle query is to break it down. Running the inline view by itself (lines 410), you generate a result set that includes the following for each employee: her department, her name, a rank within her respective department that is derived by an ascending sort on EMPNO, and a count of all employees in her department. For example:

  select deptno,         ename,         row_number() over                   (partition by deptno order by empno) rn,         count(*) over (partition by deptno) cnt   from emp DEPTNO ENAME      RN CNT ------ ---------- -- ---     10 CLARK       1   3     10 KING        2   3     10 MILLER      3   3     20 SMITH       1   5     20 JONES       2   5     20 SCOTT       3   5     20 ADAMS       4   5     20 FORD        5   5     30 ALLEN       1   6     30 WARD        2   6     30 MARTIN      3   6     30 BLAKE       4   6     30 TURNER      5   6     30 JAMES       6   6

The purpose of the rank (aliased RN in the query) is to allow you to walk the tree. Since the function ROW_NUMBER generates an enumeration starting from one with no duplicates or gaps, just subtract one (from the current value) to reference a prior (or parent) row. For example, the number prior to 3 is 3 minus 1, which equals 2. In this context, 2 is the parent of 3; you can observe this on line 12. Additionally, the lines

 start with rn = 1 connect by prior deptno = deptno

identify the root for each DEPTNO as having RN equal to 1 and create a new list whenever a new department is encountered (whenever a new occurrence of 1 is found for RN).

At this point, it's important to stop and look at the ORDER BY portion of the ROW_NUMBER function. Keep in mind the names are ranked by EMPNO and the list will be created in that order. The number of employees per department is calculated (aliased CNT) and is used to ensure that the query returns only the list that has all the employee names for a department. This is done because SYS_CONNECT_ BY_PATH builds the list iteratively, and you do not want to end up with partial lists.

For heirarchical queries, the pseudocolumn LEVEL starts with 1 (for queries not using CONNECT BY, LEVEL is 0, unless you are on 10g and later when LEVEL is only available when using CONNECT BY) and increments by one after each employee in a department has been evaluated (for each level of depth in the hierarchy). Because of this, you know that once LEVEL reaches CNT, you have reached the last EMPNO and will have a complete list.

The SYS_CONNECT_BY_PATH function prefixes the list with your chosen delimiter (in this case, a comma). You may or may not want that behavior. In this recipe's solution, the call to the function LTRIM removes the leading comma from the list.

PostgreSQL

PostgreSQL's solution requires you to know in advance the maximum number of employees in any one department. Running the inline view by itself (lines 1118) generates a result set that includes (for each employee) his department, his name with a comma appended, the number of employees in his department, and the number of employees who have an EMPNO that is less than his:

 deptno  |  emps    | cnt | pos --------+----------+-----+-----     20  |  SMITH,  |   5 |   1     30  |  ALLEN,  |   6 |   1     30  |  WARD,   |   6 |   2     20  |  JONES,  |   5 |   2     30  |  MARTIN, |   6 |   3     30  |  BLAKE,  |   6 |   4     10  |  CLARK,  |   3 |   1     20  |  SCOTT,  |   5 |   3     10  |  KING,   |   3 |   2     30  |  TURNER, |   6 |   5     20  |  ADAMS,  |   5 |   4     30  |  JAMES,  |   6 |   6     20  |  FORD,   |   5 |   5     10  |  MILLER, |   3 |   3

The scalar subquery, POS (lines 14-15), is used to rank each employee by EMPNO. For example, the line

 max(case when pos = 1 then ename else '' end)||

evaluates whether or not POS equals 1. The CASE expression returns the employee name when POS is 1, and otherwise returns NULL.

You must query your table first to find the largest number of values that could be in any one list. Based on the EMP table, the largest number of employees in any one department is six, so the largest number of items in a list is six.

The next step is to begin creating the list. Do this by performing some conditional logic (in the form of CASE expressions) on the rows returned from the inline view.

You must write as many CASE expressions as there are possible values to be concatenated together.

If POS equals one, the current name is added to the list. The second CASE expression evaluates whether or not POS equals two; if it does, then the second name is appended to the first. If there is no second name, then an additional comma is appended to the first name (this process is repeated for each distinct value of POS until the last one is reached).

The use of the MAX function is necessary because you want to build only one list per department (you can also use MIN; it makes no difference in this case, since POS returns only one value for each case evaluation). Whenever an aggregate function is used, any items in the SELECT list not acted upon by the aggregate must be specified in the GROUP BY clause. This guarantees you will have only one row per item in the SELECT list not acted upon by the aggregate function.

Notice that you also need the function RTRIM to remove trailing commas; the number of commas will always be equal to the maximum number of values that could potentially be in a list (in this case, six).