Recipe12.2.Pivoting a Result Set into Multiple Rows | SQL Cookbook (Cookbooks (OReilly))

Recipe 12.2. Pivoting a Result Set into Multiple Rows

Problem

You want to turn rows into columns by creating a column corresponding to each of the values in a single given column. However, unlike in the previous recipe, you need multiple rows of output.

For example, you want to return each employee and their position (JOB), and you currently use a query that returns the following result set:

 JOB        ENAME ---------  ---------- ANALYST    SCOTT ANALYST    FORD CLERK      SMITH CLERK      ADAMS CLERK      MILLER CLERK      JAMES MANAGER    JONES MANAGER    CLARK MANAGER    BLAKE PRESIDENT  KING SALESMAN   ALLEN SALESMAN   MARTIN SALESMAN   TURNER SALESMAN   WARD

You would like to format the result set such that each job gets its own column:

 CLERKS  ANALYSTS  MGRS   PREZ  SALES ------  --------  -----  ----  ------ MILLER  FORD      CLARK  KING  TURNER JAMES   SCOTT     BLAKE        MARTIN ADAMS             JONES        WARD SMITH                          ALLEN

Solution

Unlike the first recipe in this chapter, the result set for this recipe consists of more than one row. Using the previous recipe's technique will not work for this recipe, as the MAX(ENAME) for each JOB would be returned, which would result in one ENAME for each JOB (i.e., one row will be returned as in the first recipe). To solve this problem, you must make each JOB/ENAME combination unique. Then, when you apply an aggregate function to remove NULLs, you don't lose any ENAMEs.

DB2, Oracle, and SQL Server

Use the window function ROW_NUMBER OVER to make each JOB/ENAME combination unique. Pivot the result set using a CASE expression and the aggregate function MAX while grouping on the value returned by the window function:

  1  select max(case when job='CLERK'  2                  then ename else null end) as clerks,  3         max(case when job='ANALYST'  4                  then ename else null end) as analysts,  5         max(case when job='MANAGER'  6                  then ename else null end) as mgrs,  7         max(case when job='PRESIDENT'  8                  then ename else null end) as prez,  9         max(case when job='SALESMAN' 10                  then ename else null end) as sales 11   from ( 12 select job, 13        ename, 14        row_number()over(partition by job order by ename) rn 15   from emp 16        ) x 17  group by rn

PostgreSQL and MySQL

Use a scalar subquery to rank each employee by EMPNO. Pivot the result set using a CASE expression and the aggregate function MAX while grouping on the value returned by the scalar subquery:

  1  select max(case when job='CLERK'  2                  then ename else null end) as clerks,  3         max(case when job='ANALYST'  4                  then ename else null end) as analysts,  5         max(case when job='MANAGER'  6                  then ename else null end) as mgrs,  7         max(case when job='PRESIDENT'  8                  then ename else null end) as prez,  9         max(case when job='SALESMAN' 10                  then ename else null end) as sales 11    from ( 12  select e.job, 13         e.ename, 14         (select count(*) from emp d 15           where e.job=d.job and e.empno < d.empno) as rnk 16    from emp e 17         ) x 18   group by rnk

Discussion

DB2, Oracle, and SQL Server

The first step is to use the window function ROW_NUMBER OVER to help make each JOB/ENAME combination unique:

  select job,        ename,        row_number()over(partition by job order by ename) rn   from emp   JOB       ENAME              RN   --------- ---------- ----------   ANALYST   FORD                1   ANALYST   SCOTT               2   CLERK     ADAMS               1   CLERK     JAMES               2   CLERK     MILLER              3   CLERK     SMITH               4   MANAGER   BLAKE               1   MANAGER   CLARK               2   MANAGER   JONES               3   PRESIDENT KING                1   SALESMAN  ALLEN               1   SALESMAN  MARTIN              2   SALESMAN  TURNER              3   SALESMAN  WARD                4

Giving each ENAME a unique "row number" within a given job prevents any problems that might otherwise result from two employees having the same name and job. The goal here is to be able to group on row number (on RN) without dropping any employees from the result set due to the use of MAX. This step is the most important step in solving the problem. Without this first step, the aggregation in the outer query will remove necessary rows. Consider what the result set would look like without using ROW_NUMBER OVER, using the same technique as seen in the first recipe:

  select max(case when job='CLERK'                 then ename else null end) as clerks,        max(case when job='ANALYST'                 then ename else null end) as analysts,        max(case when job='MANAGER'                 then ename else null end) as mgrs,        max(case when job='PRESIDENT'                 then ename else null end) as prez,        max(case when job='SALESMAN'                 then ename else null end) as sales   from emp CLERKS      ANALYSTS    MGRS        PREZ        SALES ----------  ----------  ----------  ----------  ---------- SMITH       SCOTT       JONES       KING        WARD

Unfortunately, only one row is returned for each JOB: the employee with the MAX ENAME. When it comes time to pivot the result set, using MIN or MAX should serve as a means to remove NULLs from the result set, not restrict the ENAMEs returned. How this works will be come clearer as you continue through the explanation.

The next step uses a CASE expression to organize the ENAMEs into their proper column (JOB):

  select rn,        case when job='CLERK'             then ename else null end as clerks,        case when job='ANALYST'             then ename else null end as analysts,        case when job='MANAGER'             then ename else null end as mgrs,        case when job='PRESIDENT'             then ename else null end as prez,        case when job='SALESMAN'             then ename else null end as sales   from ( Select job,        ename,        row_number()over(partition by job order by ename) rn   from emp        ) x RN  CLERKS      ANALYSTS    MGRS        PREZ        SALES --  ----------  ----------  ----------  ----------  ----------  1              FORD  2              SCOTT  1  ADAMS  2  JAMES  3  MILLER  4  SMITH  1                          BLAKE  2                          CLARK  3                          JONES  1                                      KING  1                                                  ALLEN  2                                                  MARTIN  3                                                  TURNER  4                                                  WARD

At this point, the rows are transposed into columns and the last step is to remove the NULLs to make the result set more readable. To remove the NULLs use the aggregate function MAX and group by RN. (You can use the function MIN as well. The choice to use MAX is arbitrary, as you will only ever be aggregating one value per group.) There is only one value for each RN/JOB/ENAME combination. Grouping by RN in conjunction with the CASE expressions embedded within the calls to MAX ensures that each call to MAX results in picking only one name from a group of otherwise NULL values:

  select max(case when job='CLERK'                 then ename else null end) as clerks,        max(case when job='ANALYST'                 then ename else null end) as analysts,        max(case when job='MANAGER'                 then ename else null end) as mgrs,        max(case when job='PRESIDENT'                 then ename else null end) as prez,        max(case when job='SALESMAN'                 then ename else null end) as sales   from ( Select job,        ename,        row_number()over(partition by job order by ename) rn   from emp        ) x group by rn CLERKS  ANALYSTS  MGRS   PREZ  SALES ------  --------  -----  ----  ------ MILLER  FORD      CLARK  KING  TURNER JAMES   SCOTT     BLAKE        MARTIN ADAMS             JONES        WARD SMITH                          ALLEN

The technique of using ROW_NUMBER OVER to create unique combinations of rows is extremely useful for formatting query results. Consider the query below that creates a sparse report showing employees by DEPTNO and JOB:

  select deptno dno, job,        max(case when deptno=10                 then ename else null end) as d10,        max(case when deptno=20                 then ename else null end) as d20,        max(case when deptno=30                 then ename else null end) as d30,        max(case when job='CLERK'                 then ename else null end) as clerks,        max(case when job='ANALYST'                 then ename else null end) as anals,        max(case when job='MANAGER'                 then ename else null end) as mgrs,        max(case when job='PRESIDENT'                 then ename else null end) as prez,        max(case when job='SALESMAN'                 then ename else null end) as sales   from ( Select deptno,        job,        ename,        row_number()over(partition by job order by ename) rn_job,        row_number()over(partition by job order by ename) rn_deptno   from emp        ) x  group by deptno, job, rn_deptno, rn_job  order by 1 DNO JOB       D10    D20   D30    CLERKS ANALS MGRS  PREZ SALES --- --------- ------ ----- ------ ------ ----- ----- ---- ------  10 CLERK     MILLER              MILLER  10 MANAGER   CLARK                            CLARK  10 PRESIDENT KING                                   KING  20 ANALYST          FORD                FORD  20 ANALYST          SCOTT               SCOTT  20 CLERK            ADAMS        ADAMS  20 CLERK            SMITH        SMITH  20 MANAGER          JONES                     JONES  30 CLERK                  JAMES  JAMES  30 MANAGER                BLAKE               BLAKE  30 SALESMAN               ALLEN                          ALLEN  30 SALESMAN               MARTIN                         MARTIN  30 SALESMAN               TURNER                         TURNER  30 SALESMAN               WARD                           WARD

By simply modifying what you group by (hence the nonaggregate items in the SELECT list above), you can produce reports with different formats. It is worth the time of changing things around to understand how these formats change based on what you include in your GROUP BY clause.

PostgreSQL and MySQL

The technique for these RDBMSs is the same as for the others once a method of creating unique JOB/ENAME combinations is established. The first step is to use a scalar subquery to provide a "row number" or "rank" for each JOB/ENAME combination:

  select e.job,        e.ename,        (select count(*) from emp d          where e.job=d.job and e.empno < d.empno) as rnk   from emp e JOB       ENAME             RNK --------- ---------- ---------- CLERK     SMITH               3 SALESMAN  ALLEN               3 SALESMAN  WARD                2 MANAGER   JONES               2 SALESMAN  MARTIN              1 MANAGER   BLAKE               1 MANAGER   CLARK               0 ANALYST   SCOTT               1 PRESIDENT KING                0 SALESMAN  TURNER              0 CLERK     ADAMS               2 CLERK     JAMES               1 ANALYST   FORD                0 CLERK     MILLER              0

Giving each JOB/ENAME combination a unique "rank" makes each row unique. Even if there are employees with the same name working the same job, no two employees will share the same rank within a job. This step is the most important step in solving the problem. Without this first step, the aggregation in the outer query will remove necessary rows. Consider what the result set would look like without applying a rank to each JOB/ENAME combination, using the same technique as seen in the first recipe:

  select max(case when job='CLERK'                 then ename else null end) as clerks,        max(case when job='ANALYST'                 then ename else null end) as analysts,        max(case when job='MANAGER'                 then ename else null end) as mgrs,        max(case when job='PRESIDENT'                 then ename else null end) as prez,        max(case when job='SALESMAN'                 then ename else null end) as sales   from emp CLERKS     ANALYSTS   MGRS       PREZ       SALES ---------- ---------- ---------- ---------- ---------- SMITH      SCOTT      JONES      KING       WARD

Now, that you see the purpose of applying a rank, you can move on to the next step. The next step uses a CASE expression to organize the ENAMEs into their proper column (JOB):

  select rnk,        case when job='CLERK'             then ename else null end as clerks,        case when job='ANALYST'             then ename else null end as analysts,        case when job='MANAGER'             then ename else null end as mgrs,        case when job='PRESIDENT'             then ename else null end as prez,        case when job='SALESMAN'             then ename else null end as sales   from ( Select e.job,        e.ename,        (select count(*) from emp d          where e.job=d.job and e.empno < d.empno) as rnk   from emp e        ) x RNK CLERKS ANALYSTS MGRS  PREZ SALES --- ------ -------- ----- ---- ----------   3 SMITH   3                             ALLEN   2                             WARD   2                 JONES   1                             MARTIN   1                 BLAKE   0                 CLARK   1        SCOTT   0                        KING   0                             TURNER   2 ADAMS   1 JAMES   0        FORD   0 MILLER

At this point, the rows are transposed into columns and the last step is to remove the NULLs to make the result set more readable. To remove the NULLs use the aggregate function MAX and group by RNK. (MAX is an arbitrary choice. You can use the function MIN as well.) There is only one value for each RN/JOB/ENAME combination, so the application of the aggregate function is simply to remove NULLs:

  select max(case when job='CLERK'                 then ename else null end) as clerks,        max(case when job='ANALYST'                 then ename else null end) as analysts,        max(case when job='MANAGER'                 then ename else null end) as mgrs,        max(case when job='PRESIDENT'                 then ename else null end) as prez,        max(case when job='SALESMAN'                 then ename else null end) as sales   from ( Select e.job,        e.ename,        (select count(*) from emp d          where e.job=d.job and e.empno < d.empno) as rnk   from emp e        ) x  group by rnk CLERKS ANALYSTS MGRS  PREZ SALES ------ -------- ----- ---- ------ MILLER FORD     CLARK KING TURNER JAMES  SCOTT    BLAKE      MARTIN ADAMS           JONES      WARD SMITH                      ALLEN