You ll quite often encounter data that is organized into a hierarchy, such as the people who work in a company, a family tree, or the parts that make up an engine. In this section, you ll learn how to perform queries against a table that stores the employees who work for the example store.
You ll see the use of a table named more_employees , which is created by the store_schema.sql script as follows :
CREATE TABLE more_employees (employee_id INTEGER CONSTRAINT more_employees_pk PRIMARY KEY, manager_id INTEGER CONSTRAINT more_empl_fk_fk_more_empl REFERENCES more_employees(employee_id), first_name VARCHAR2(10) NOT NULL, last_name VARCHAR2(10) NOT NULL, title VARCHAR2(20), salary NUMBER(6, 0));
The manager_id column is a self-reference back to the employee_id column of the more_employees table; manager_id indicates the manager of an employee (if any). The more _employees table contains the following rows:
EMPLOYEE_ID MANAGER_ID FIRST_NAME LAST_NAME TITLE SALARY ----------- ---------- ---------- ---------- ------------- ---------- 1 James Smith CEO 800000 2 1 Ron Johnson Sales Manager 600000 3 2 Fred Hobbs Sales Person 200000 4 1 Susan Jones Support Manager 500000 5 2 Rob Green Sales Person 40000 6 4 Jane Brown Support Person 45000 7 4 John Grey Support Manager 30000 8 7 Jean Blue Support Person 29000 9 6 Henry Heyson Support Person 30000 10 1 Kevin Black Ops Manager 100000 11 10 Keith Long Ops Person 50000 12 10 Frank Howard Ops Person 45000 13 10 Doreen Penn Ops Person 47000
As you can see, it is a little difficult to pick out the various employee relationships from this data. Figure 7-1 shows the relationships in a graphical form.
As you can see from Figure 7-1, the elements ”or nodes ”form a tree. Trees of nodes have some technical terms associated with them, as follows:
Root node The root is the node at the top of the tree. In the example shown in Figure 7-1, the root node is James Smith, the CEO.
Parent node A parent is a node that has one or more nodes beneath it. For example, James Smith is the parent to the following nodes: Ron Johnson, Susan Jones, and Kevin Black.
Child node A child is a node that has one parent node above it. For example, Ron Johnson s parent is James Smith.
Leaf node A leaf is a node that has no children. For example, Fred Hobbs and Rob Green are leaf nodes.
You can use the CONNECT BY and START WITH clauses of a SELECT statement to perform hierarchical queries.
The syntax for the CONNECT BY and START WITH clauses of a SELECT statement is as follows:
SELECT [LEVEL], column , expression , ... FROM table [WHERE where_clause ] [[START WITH start_condition ] [CONNECT BY PRIOR prior_condition ]];
where
LEVEL is a pseudo-column that tells you how far into a tree you are. LEVEL returns 1 for a root node, 2 for a child of the root, and so on.
start_condition specifies where to start the hierarchical query from. You must specify a START WITH clause when writing a hierarchical query. An example start_condition is employee_id = 1 , which specifies the query starts from employee #1.
prior_condition specifies the relationship between the parent and child rows. You must specify a CONNECT BY PRIOR clause when writing a hierarchical query. An example prior_condition is employee_id = manager_id , which specifies the relationship is between the parent employee_id and the child manager_id ”that is, the child s manager_id points to the parent s employee_id .
The following query illustrates the use of the START WITH and CONNECT BY PRIOR clauses. Notice the first row contains the details of James Smith (employee #1), the second row contains the details of Ron Johnson, whose manager_id is 1, and so on down the tree:
SELECT employee_id, manager_id, first_name, last_name FROM more_employees START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id; EMPLOYEE_ID MANAGER_ID FIRST_NAME LAST_NAME ----------- ---------- ---------- --------- 1 James Smith 2 1 Ron Johnson 3 2 Fred Hobbs 5 2 Rob Green 4 1 Susan Jones 6 4 Jane Brown 9 6 Henry Heyson 7 4 John Grey 8 7 Jean Blue 10 1 Kevin Black 11 10 Keith Long 12 10 Frank Howard 13 10 Doreen Penn
The next query illustrates the use of the LEVEL pseudo-column to display the level in the tree:
SELECT LEVEL, employee_id, manager_id, first_name, last_name FROM more_employees START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id ORDER BY LEVEL; LEVEL EMPLOYEE_ID MANAGER_ID FIRST_NAME LAST_NAME ---------- ----------- ---------- ---------- --------- 1 1 James Smith 2 2 1 Ron Johnson 2 4 1 Susan Jones 2 10 1 Kevin Black 3 3 2 Fred Hobbs 3 7 4 John Grey 3 12 10 Frank Howard 3 13 10 Doreen Penn 3 11 10 Keith Long 3 5 2 Rob Green 3 6 4 Jane Brown 4 9 6 Henry Heyson 4 8 7 Jean Blue
The next query uses the COUNT() function and LEVEL to get the number of levels in the tree:
SELECT COUNT(DISTINCT LEVEL) FROM more_employees START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id; COUNT(DISTINCTLEVEL) -------------------- 4
You can format the results from a hierarchical query using LEVEL and the LPAD() function, which left-pads values with characters . The following query uses LPAD(' ', 2 * LEVEL - 1) to left-pad with 2 * LEVEL - 1 spaces. This is used to indent an employee s name with spaces based on their LEVEL ( LEVEL 1 isn t padded , LEVEL 2 is padded by two spaces, LEVEL 3 by four spaces, and so on).
SET PAGESIZE 999 COLUMN employee FORMAT A25 SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id; LEVEL EMPLOYEE ---------- ------------------ 1 James Smith 2 Ron Johnson 3 Fred Hobbs 3 Rob Green 2 Susan Jones 3 Jane Brown 4 Henry Heyson 3 John Grey 4 Jean Blue 2 Kevin Black 3 Keith Long 3 Frank Howard 3 Doreen Penn
You don t have to start at the root node when traversing a tree: you can start at any node using the START WITH clause. The following query starts with Susan Jones; notice LEVEL returns 1 for Susan Jones, 2 for Jane Brown, and so on:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH last_name = 'Jones' CONNECT BY PRIOR employee_id = manager_id; LEVEL EMPLOYEE ---------- ----------------- 1 Susan Jones 2 Jane Brown 3 Henry Heyson 2 John Grey 3 Jean Blue
If the store had more than one employee with the same name, you could simply use the employee_id in the query s START WITH clause. The following query uses Susan Jones employee_id of 4:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH employee_id = 4 CONNECT BY PRIOR employee_id = manager_id;
This query returns the same rows as the previous one.
You can use a subquery in a START WITH clause. For example, the following query uses a subquery to select the employee_id whose name is Kevin Black. This employee_id is passed to the START WITH clause:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH employee_id = ( SELECT employee_id FROM more_employees WHERE first_name = 'Kevin' AND last_name = 'Black' ) CONNECT BY PRIOR employee_id = manager_id; LEVEL EMPLOYEE ---------- --------------- 1 Kevin Black 2 Keith Long 2 Frank Howard 2 Doreen Penn
You don t have to traverse a tree downward from parents to children: you can start at a child and traverse upward. You do this by switching child and parent columns in the CONNECT BY PRIOR clause. For example, CONNECT BY PRIOR manager_id = employee_id connects the child s manager_id to the parent s employee_id .
The following query starts with Jean Blue and traverses upward all the way to James Smith; notice LEVEL returns 1 for Jean Blue, 2 for John Grey, and so on:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH last_name = 'Blue' CONNECT BY PRIOR manager_id = employee_id; LEVEL EMPLOYEE ---------- ------------------ 1 Jean Blue 2 John Grey 3 Susan Jones 4 James Smith
You can eliminate a particular node from a query tree using a WHERE clause. The following query eliminates Ron Johnson from the results using WHERE last_name != 'Johnson' :
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees WHERE last_name != 'Johnson' START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id; LEVEL EMPLOYEE ---------- ------------------ 1 James Smith 3 Fred Hobbs 3 Rob Green 2 Susan Jones 3 Jane Brown 4 Henry Heyson 3 John Grey 4 Jean Blue 2 Kevin Black 3 Keith Long 3 Frank Howard 3 Doreen Penn
You ll notice although Ron Johnson is eliminated from the results, his employees Fred Hobbs and Rob Green are still included. To eliminate an entire branch of nodes from the results of a query, you add an AND clause to your CONNECT BY PRIOR clause. The following query uses AND last_name != 'Johnson' to eliminate Ron Johnson and all his employees from the results:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee FROM more_employees START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id AND last_name != 'Johnson'; LEVEL EMPLOYEE ---------- ------------------- 1 James Smith 2 Susan Jones 3 Jane Brown 4 Henry Heyson 3 John Grey 4 Jean Blue 2 Kevin Black 3 Keith Long 3 Frank Howard 3 Doreen Penn
You can include other conditions in a hierarchical query using a WHERE clause. The following example uses a WHERE clause to only show employees whose salaries are less than or equal to $50,000:
SELECT LEVEL, LPAD(' ', 2 * LEVEL - 1) first_name ' ' last_name AS employee, salary FROM more_employees WHERE salary < = 50000 START WITH employee_id = 1 CONNECT BY PRIOR employee_id = manager_id; LEVEL EMPLOYEE SALARY ---------- ------------------------- ---------- 3 Rob Green 40000 3 Jane Brown 45000 4 Henry Heyson 30000 3 John Grey 30000 4 Jean Blue 29000 3 Keith Long 50000 3 Frank Howard 45000 3 Doreen Penn 47000
This concludes the discussion of hierarchical queries. In the next section, you ll learn about advanced group clauses.