Recipe13.5.Determining Which Rows Are Leaf, Branch, or Root Nodes


Recipe 13.5. Determining Which Rows Are Leaf, Branch, or Root Nodes

Problem

You want to determine what type of node a given row is: a leaf, branch, or root. For this example, a leaf node is an employee who is not a manager. A branch node is an employee who is both a manager and also has a manager. A root node is an employee without a manager. You want to return 1 (TRUE) or 0 (FALSE) to reflect the status of each row in the hierarchy. You want to return the following result set:

 ENAME          IS_LEAF   IS_BRANCH     IS_ROOT ----------  ----------  ----------  ---------- KING                 0           0           1 JONES                0           1           0 SCOTT                0           1           0 FORD                 0           1           0 CLARK                0           1           0 BLAKE                0           1           0 ADAMS                1           0           0 MILLER               1           0           0 JAMES                1           0           0 TURNER               1           0           0 ALLEN                1           0           0 WARD                 1           0           0 MARTIN               1           0           0 SMITH                1           0           0 

Solution

It is important to realize that the EMP table is modeled in a tree hierarchy, not a recursive hierarchy, the value for MGR for root nodes is NULL. If EMP was modeled to use a recursive hierarchy, root nodes would be self-referencing (i.e., the value for MGR for employee KING would be KING's EMPNO). I find self-referencing to be counterintuitive and thus am using NULL values for root nodes' MGR. For Oracle users using CONNECT BY and DB2/SQL Server users using WITH, you'll find tree hierarchies easier to work with and potentially more efficient than recursive hierarchies. If you are in a situation where you have a recursive hierarchy and are using CONNECT BY or WITH, watch out: you can end up with a loop in your SQL. You need to code around such loops if you are stuck with recursive hierarchies.

DB2, PostgreSQL, MySQL, and SQL Server

Use three scalar subqueries to determine the correct "Boolean" value (either a 1 or a 0) to return for each node type:

  1 select e.ename,  2        (select sign(count(*)) from emp d  3          where 0 =  4            (select count(*) from emp f  5              where f.mgr = e.empno)) as is_leaf,  6        (select sign(count(*)) from emp d  7          where d.mgr = e.empno  8           and e.mgr is not null) as is_branch,  9        (select sign(count(*)) from emp d 10          where d.empno = e.empno 11            and d.mgr is null) as is_root 12   from emp e 13 order by 4 desc,3 desc 

Oracle

The scalar subquery solution will work for Oracle as well, and should be used if you are on a version of Oracle prior to Oracle Database 10g. The following solution highlights built-in functions provided by Oracle (that were introduced in Oracle Database 10g) to identify root and leaf rows. The functions are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively:

  1  select ename,  2         connect_by_isleaf is_leaf,  3         (select count(*) from emp e  4           where e.mgr = emp.empno  5             and emp.mgr is not null  6             and rownum = 1) is_branch,  7         decode(ename,connect_by_root(ename),1,0) is_root  8    from emp  9   start with mgr is null 10 connect by prior empno = mgr 11 order by 4 desc, 3 desc 

Discussion

DB2, PostgreSQL, MySQL, and SQL Server

This solution simply applies the rules defined in the "Problem" section to determine leaves, branches, and roots. The first step is to find determine whether an employee is a leaf node. If the employee is not a manager (no one works under her), then she is a leaf node. The first scalar subquery, IS_LEAF, is shown below:

  select e.ename,        (select sign(count(*)) from emp d          where 0 =            (select count(*) from emp f              where f.mgr = e.empno)) as is_leaf   from emp e order by 2 desc ENAME        IS_LEAF ----------- -------- SMITH              1 ALLEN              1 WARD               1 ADAMS              1 TURNER             1 MARTIN             1 JAMES              1 MILLER             1 JONES              0 BLAKE              0 CLARK              0 FORD               0 SCOTT              0 KING               0 

Because the output for IS_LEAF should be a 0 or 1, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you would get 14 instead of 1 for leaf rows. As an alternative, you can use a table with only one row to count against, because you only want to return 0 or 1. For example:

  select e.ename,        (select count(*) from t1 d          where not exists            (select null from emp f              where f.mgr = e.empno)) as is_leaf   from emp e order by 2 desc ENAME         IS_LEAF ---------- ---------- SMITH               1 ALLEN               1 WARD                1 ADAMS               1 TURNER              1 MARTIN              1 JAMES               1 MILLER              1 JONES               0 BLAKE               0 CLARK               0 FORD                0 SCOTT               0 KING                0 

The next step is to find branch nodes. If an employee is a manager (someone works for them), and they also happen to work for someone else, then the employee is a branch node. The results of the scalar subquery IS_BRANCH are shown below:

  select e.ename,        (select sign(count(*)) from emp d          where d.mgr = e.empno           and e.mgr is not null) as is_branch   from emp e order by 2 desc ENAME       IS_BRANCH ----------- --------- JONES               1 BLAKE               1 SCOTT               1 CLARK               1 FORD                1 SMITH               0 TURNER              0 MILLER              0 JAMES               0 ADAMS               0 KING                0 ALLEN               0 MARTIN              0 WARD                0 

Again, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you will get (potentially) values greater than 1 when a node is a branch. Like scalar subquery IS_LEAF, you can use a table with one row to avoid using SIGN. The following solution uses a one-row table named dual:

  select e.ename,       (select count(*) from t1 t         where exists (          select null from emp f           where f.mgr = e.empno             and e.mgr is not null)) as is_branch   from emp e order by 2 desc ENAME            IS_BRANCH --------------- ---------- JONES                    1 BLAKE                    1 SCOTT                    1 CLARK                    1 FORD                     1 SMITH                    0 TURNER                   0 MILLER                   0 JAMES                    0 ADAMS                    0 KING                     0 ALLEN                    0 MARTIN                   0 WARD                     0 

The last step is to find the root nodes. A root node is defined as an employee who is a manager but who does not work for anyone else. In table EMP, only KING is a root node. Scalar subquery IS_ROOT is shown below:

  select e.ename,        (select sign(count(*)) from emp d          where d.empno = e.empno            and d.mgr is null) as is_root   from emp e order by 2 desc ENAME         IS_ROOT ----------  --------- KING                1 SMITH               0 ALLEN               0 WARD                0 JONES               0 TURNER              0 JAMES               0 MILLER              0 FORD                0 ADAMS               0 MARTIN              0 BLAKE               0 CLARK               0 SCOTT               0 

Because EMP is a small 14-row table, it is easy to see that employee KING is the only root node, so in this case taking the SIGN of the COUNT(*) operation is not strictly necessary. If there can be multiple root nodes, then you can use SIGN, or you can use a one-row table in the scalar subquery as is shown earlier for IS_BRANCH and IS_LEAF.

Oracle

For those of you on versions of Oracle prior to Oracle Database 10g, you can follow the discussion for the other RDBMSs, as that solution will work (without modifications) in Oracle. If you are on Oracle Database 10g or later, you may want to take advantage of two functions to make identifying root and leaf nodes a simple task: they are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively. As of the time of this writing, it is necessary to use CONNECT BY in your SQL statement in order for you to be able to use CONNECT_BY_ROOT and CONNECT_BY_ISLEAF. The first step is to find the leaf nodes by using CONNECT_BY_ISLEAF as follows:

  select ename,         connect_by_isleaf is_leaf   from emp  start with mgr is null connect by prior empno = mgr order by 2 desc ENAME          IS_LEAF ----------  ---------- ADAMS                1 SMITH                1 ALLEN                1 TURNER               1 MARTIN               1 WARD                 1 JAMES                1 MILLER               1 KING                 0 JONES                0 BLAKE                0 CLARK                0 FORD                 0 SCOTT                0 

The next step is to use a scalar subquery to find the branch nodes. Branch nodes are employees who are managers but who also work for someone else:

  select ename,         (select count(*) from emp e           where e.mgr = emp.empno             and emp.mgr is not null             and rownum = 1) is_branch   from emp  start with mgr is null connect by prior empno = mgr order by 2 desc ENAME       IS_BRANCH ---------- ---------- JONES               1 SCOTT               1 BLAKE               1 FORD                1 CLARK               1 KING                0 MARTIN              0 MILLER              0 JAMES               0 TURNER              0 WARD                0 ADAMS               0 ALLEN               0 SMITH               0 

The filter on ROWNUM is necessary to ensure that you return a count of 1 or 0, and nothing else.

The last step is to identify the root nodes by using the function CONNECT_BY_ROOT. The solution finds the ENAME for the root node and compares it with all the rows returned by the query. If there is a match, that row is the root node:

  select ename,         decode(ename,connect_by_root(ename),1,0) is_root   from emp  start with mgr is null connect by prior empno = mgr order by 2 desc ENAME          IS_ROOT ----------  ---------- KING                 1 JONES                0 SCOTT                0 ADAMS                0 FORD                 0 SMITH                0 BLAKE                0 ALLEN                0 WARD                 0 MARTIN               0 TURNER               0 JAMES                0 CLARK                0 MILLER               0 

If using Oracle9i Database or later, you can use the SYS_CONNECT_BY_PATH function as an alternative to CONNECT_BY_ROOT. The Oracle9i Database version of the preceding would be:

  select ename,        decode(substr(root,1,instr(root,',')-1),NULL,1,0) root   from ( select ename,        ltrim(sys_connect_by_path(ename,','),',') root   from emp start with mgr is null connect by prior empno=mgr       ) ENAME       ROOT ----------  ---- KING           1 JONES          0 SCOTT          0 ADAMS          0 FORD           0 SMITH          0 BLAKE          0 ALLEN          0 WARD           0 MARTIN         0 TURNER         0 JAMES          0 CLARK          0 MILLER         0 

The SYS_CONNECT_BY_PATH function rolls up a hierarchy starting from the root value as is shown below:

  select ename,        ltrim(sys_connect_by_path(ename,','),',') path   from emp start with mgr is null connect by prior empno=mgr ENAME      PATH ---------- ---------------------------- KING       KING JONES      KING,JONES SCOTT      KING,JONES,SCOTT ADAMS      KING,JONES,SCOTT,ADAMS FORD       KING,JONES,FORD SMITH      KING,JONES,FORD,SMITH BLAKE      KING,BLAKE ALLEN      KING,BLAKE,ALLEN WARD       KING,BLAKE,WARD MARTIN     KING,BLAKE,MARTIN TURNER     KING,BLAKE,TURNER JAMES      KING,BLAKE,JAMES CLARK      KING,CLARK MILLER     KING,CLARK,MILLER 

To get the root row, simply substring out the first ENAME in PATH:

  select ename,        substr(root,1,instr(root,',')-1) root   from ( select ename,        ltrim(sys_connect_by_path(ename,','),',') root   from emp start with mgr is null connect by prior empno=mgr        ) ENAME      ROOT ---------- ---------- KING JONES      KING SCOTT      KING ADAMS      KING FORD       KING SMITH      KING BLAKE      KING ALLEN      KING WARD       KING MARTIN     KING TURNER     KING JAMES      KING CLARK      KING MILLER     KING 

The last step is to flag the result from the ROOT column if it is NULL; that is your root row.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net