Table Joins | A Lightning SQL Tutorial

Table of contents:

Old Join Syntax

It's often the case that the data you want to return from a query are spread across more than one table. Commonly this occurs as a result of master-detail relationships such as the one between the project and project_hours tables. The project table contains one row describing each project, whereas, for each project, project_hours contains many rows detailing time charged to that project. To link such master and detail data together in the results from a SELECT statement, you use a relational operation called a join .

4.4.1 Inner Joins

Imagine that you need to generate a detailed listing of time charged to each of your projects. You want that listing to include project name , the date on which work was performed, and the number of hours involved. What makes this request interesting is that you need to list two tables in your FROM clause. Project names come from the project table, while dates and hours come from the project_hours table. How do you SELECT from two tables at once? The answer is to use a JOIN clause such as the one in Example 4-25.

Example 4-25. Selecting related rows from two tables

SELECT p.project_name, ph. employee_id, 

 ph.time_log_date, ph.hours_logged

FROM project p JOIN project_hours ph

 ON p.project_id = ph.project_id;

The JOIN clause, which is a subclause of the FROM clause, is new in Oracle9 i Database and specifies that data are to be drawn from not one table, but from two, in this case from the project and project_hours tables. The ON clause, subsidiary to JOIN, defines the conditions that rows of combined data must meet to be returned from the join operation. This will make more sense as you scrutinize Figures Figure 4-2 through Figure 4-4.

Conceptually, all joins begin as a set of all possible combinations of rows from the two tables involved. Such a set is known as a Cartesian product . Figure 4-2 shows the beginning of a join between a three-row project table and a four-row project_hours table.

Databases implement all kinds of optimizations to avoid generating Cartesian products. However, understanding the conceptual process defining the join operation is critical to writing accurate SQL queries.

Each row from project has been combined with each row from project_hours . I've highlighted the JOIN clause in the query to show how much of the query has been executed so far.

Looking at Figure 4-2, you can see that not all combinations make sense. Detail from project 1004 has no business being associated with project 1002, nor does the reverse make sense. Project 1003 should have no detail at all. This is where the join conditions in the ON clause come into play. When you write a join, you should write join conditions to identify those rows in the Cartesian product that make sense, that you wish returned from the join operation. Rows not matching the join conditions are filtered out. Think of a sieve filtering out large gravel and passing only the sand. Figure 4-3 shows the results of evaluating the join condition in the example query.

Figure 4-2. All joins begin as a Cartesian product

figs/sqp2_0402.gif

Figure 4-3. Join conditions pass only those row combinations that make sense, eliminating all the others

figs/sqp2_0403.gif

As Figure 4-3 illustrates, the next step after forming the Cartesian product is to apply the conditions in the ON clause, using those to eliminate row combinations that make no business sense. Look at the project ID numbers in Figure 4-3s results. Each row output from the join operation corresponds to a row from project_hours . Each row contains corresponding project information (name and ID) from project . At this intermediate stage, you can think of all columns from both tables as being present in each row. Thus, each row has two copies of the project ID number, one from each table, and those two values are used (see the ON clause) to identify those row combinations that make business sense.

The join illustrated in Figure 4-3 is known as an inner join . In fact, you can write it using the keywords INNER JOIN instead of just JOIN. The key characteristic of an inner join is that each row of output from a join operation contains values from both tables. The choice of the term inner join is unfortunate because that term is not at all evocative of the results produced.

It's entirely possible to place WHERE conditions in a query containing a join and this is often done. Such WHERE conditions are evaluated after all the joins. Figure 4-4 shows the same query as Figure 4-3 but with the addition of a WHERE clause. You can see the effect of WHERE and when that clause is evaluated. You can see that the generation of columns in the SELECT list is the final operation in the query's execution.

Figure 4-4. WHERE clause conditions are evaluated following all joins

figs/sqp2_0404.gif

The process I've described in this section is conceptual. It's a good way to think about joins that will help you write better queries and to write them more easily. However, this conceptual process is seldom efficient when applied in real life. Instead, databases implement many techniques to optimize joins and query executions. For example, with respect to the WHERE clause in Figure 4-4, a database's query optimizer may "decide" to evaluate that condition early, eliminating rows from project_hours for February before the join, not after it.

Don't get caught up in thinking about query optimizations when writing join queries. Think conceptually until you've written a query that yields the correct results. No matter how a database optimizes a query, especially a join query, in the end the results must match the conceptual model I've just described. All joins begin with a Cartesian product of all possible combinations of rows from two tables. Join conditions then eliminate unwanted rows. It's that simple.

If you're interested in the topic of join optimization, particularly for queries containing large numbers of joins, I heartily recommend and endorse Dan Tow's book, SQL Tuning (O'Reilly).

4.4.2 Outer Joins

Go back to Figure 4-3 and look at the rows feeding into and out of the ON clause evaluation. What happened to project 1003? Rows referencing that project existed in the project table, were part of the Cartesian product, but were completely eliminated by the ON clause. Why? Because the project rows for 1003 had no counterparts in the project_hours table. As a result, no rows were in the Cartesian product for project 1003 in which both project_id values matched, and thus, project 1003 completely vanished from the query's result set. This behavior is not always what you want.

To join project to project_hours using the data shown in Figure 4-3, yet preserving projects having no corresponding detail in project_hours , you need to use an outer join . An outer join designates one table as optional and the other as an anchor. Rows from the anchor table in an outer join are preserved even when no corresponding detail rows exist.

Figure 4-5 shows a left outer join , designating the table on the left-hand side of the JOIN keyword as the anchor table. The project table is to the left, so project rows are preserved.

Figure 4-5. An outer join preserving a row with no detail

figs/sqp2_0405.gif

Look carefully at what goes on in Figure 4-5. The condition in the ON clause results in the elimination of all rows in the Cartesian product associated with the row from the project table for project 1003. This is when the outer join logic kicks in. The four combination rows for project 1003 are, in fact, removed from the join, but a new row is slipped in for project 1003 containing values only from the project table. Column values that would otherwise come from project_hours are set to null.

Remember that my explanation is conceptual, a useful way of visualizing the outer join process. How the database engine implements this conceptual process may be quite different from what I've described.

Another type of join is the right outer join . Right and left outer joins are the same fundamental operation. The only difference lies in which side of the JOIN keyword you place the anchor table. Example 4-26 shows semantically identical left and right outer joins.

Example 4-26. Left and right outer joins are fundamentally the same operation

SELECT p.project_name, ph.time_log_date, ph.hours_logged

FROM project p LEFT OUTER JOIN project_hours ph

 ON p.project_id = ph.project_id;



SELECT p.project_name, ph.time_log_date, ph.hours_logged

FROM project_hours ph RIGHT OUTER JOIN project p

 ON p.project_id = ph.project_id;

Example 4-26s first query is a left outer join: project is to the left, so project is the anchor table. The second query is a right outer join, but I've moved project to the right-hand side of the JOIN keyword, so it is still the anchor table. Both queries generate the same result.

Old Join Syntax

The join syntax I show in this chapter is that introduced in the 1992 ANSI/ISO SQL standard. You'll frequently encounter an older syntax in which JOIN and ON clauses were not used and which placed join conditions in the WHERE clause. For example, the following two inner joins are semantically equivalent:

SELECT p.project_name, ph.time_log_date, ph.hours_logged

FROM project p JOIN project_hours ph

 ON p.project_id = ph.project_id;

SELECT p.project_name, ph.time_log_date, ph.hours_logged

FROM project p, project_hours ph

WHERE p.project_id = ph.project_id;

In the second join, a comma delimits the table names in the FROM clause. That comma notation generates the Cartesian product. The conditions in the WHERE clause reduce that Cartesian product ( assuming you've written your conditions properly) to a set of rows that makes business sense.

Oracle supports an older outer join syntax in which the optional table is designated by placing a (+) following each reference to a column from that table in the WHERE clause:

SELECT p.project_name, ph.time_log_date, ph.hours_logged

FROM project p, project_hours ph

WHERE p.project_id = ph.project_id(+);

You'll see these older forms of join often, especially in older code. Avoid them in new code that you write. The newer notation is much clearer, and it almost eliminates the possibility of accidentally returning the entire Cartesian product as the result from a join.

4.4.3 Full Outer Joins

The final type of outer join that I want to discuss is the full outer join . Think of a full outer join as the combination of all three types of joins so far: inner, left, and right:

You get all the rows that you would get from an INNER JOIN (e.g., projects and their attendant detail):
```
project p INNER JOIN project_hours ph
```

Plus you get the additional rows that would result from making that inner join into a LEFT OUTER JOIN (e.g., projects without any detail):
```
project p LEFT OUTER JOIN project_hours ph
```

Plus you further get those additional rows that would result from RIGHT OUTER JOIN, but keeping the same ordering of the tables (e.g., detail without any projects):
```
project p RIGHT OUTER JOIN project_hours ph
```

A full outer join will return at least one of each row from both tables. You would typically then have WHERE clause conditions to further filter the results.

Oracle Database 10 g introduced yet another type of join, the partition outer join . Read about that at http:// gennick .com/partition.html.

Introduction to SQL*Plus

Command-Line SQL*Plus