Joins


The heart of SQL in a relational system is the ability to perform relational joins. Joins are specified by using more than one table name in the FROM clause of a query. A joined table specifies an intermediate-result table that is the result of either an inner join or an outer join.

NOTE

Until version 8, the maximum number of tables that could be coded in a single FROM clause was 15, but up to 225 table references can be in an SQL statement. To increase the limit of 15, update the macro SPRMMXT to a value greater than 15. Also, the 15-tables limit does not apply to star join queries. Version 8 of DB2 fully supports 225 tables in a query, including subqueries.


Inner Joins

The join operation used most often is the one we have been using in the book exercises, which is known as an inner join. An inner join combines each row of the left table with each row of the right table, keeping only the rows where the join condition is trUE. The result table may be missing rows from either or both of the joined tables. The following example shows an inner join without using the INNER JOIN syntax, which means that the inner join is implicitly defined. The tables in the FROM clause are separated by commas to indicate the join. This type of syntax is valid only for inner joins.

In the example following, we wish to find the scores of all DB2 tests taken by U.S. citizens in the small test centers year by year, in order of ascending year and descending score:

 SELECT T.NAME, YEAR(TT.DATE_TAKEN) AS YEAR,    TT.SCORE FROM TEST T, TEST_TAKEN TT, TEST_CENTER TC,    CANDEDATE C WHERE C.CID = TT.CID AND    TC.TCID = TT.TCID AND    T.NUMBER = TT.NUMBER AND    T.NAME LIKE 'DB2%' AND    C.COUNTRY='USA' AND    TC.NOSEATS < 10    ORDER BY YEAR(TT.DATE_TAKEN), TT.SCORE DESC 

If the inner join used the explicit syntax, it would appear as follows:

 SELECT T.NAME, YEAR(TT.DATE_TAKEN) AS YEAR,    TT.SCORE FROM TEST T INNER JOIN TEST_TAKEN TT ON T.NUMBER = TT.NUMBER INNER JOIN TEST_CENTER TC ON TC.TCID = TT.TCID  INNER JOIN CANDIDATE C ON C.CID = TT.CID WHERE T.NAME LIKE 'DB2%'   AND C.COUNTRY='USA'   AND TC.NOSEATS < 10 ORDER BY YEAR(TT.DATE_TAKEN), TT.SCORE DESC 

For the explicit join syntax, the keyword INNER is optional and, if omitted, implies an inner join. The ON clause determines the join predicates between the tables, and the WHERE clause applies the local or table access predicates. For an inner join, the result is not impacted if the local predicates appear in the WHERE or ON clause. It is recommended that, for consistency with outer joins, join predicates appear only in the ON clause; local predicates, in the WHERE clause. For outer joins, the placement of predicates in either the ON or WHERE clause can dramatically change the result. This is demonstrated in the section on outer joins.

Each join must be followed by its ON clause, which dictates the two tables that are being joined. The joined tables must appear in the FROM clause before they are referenced by join predicates in the ON clause. In the preceding example, the first join is between TEST and TEST_TAKEN. The second join is between TEST_CENTER and TEST_TAKEN as dictated by the ON clause. The third join is between CANDIDATE and TEST_TAKEN.

One exception to the rule that the ON clause must follow each join is when nesting joins within other joins. For simplicity, this syntax is not recommended and can easily be recoded to suit the rule just cited.

Deciding whether to code the explicit join syntax or to separate the joined tables with commas is purely a matter of preference and is not related to performance. However, the explicit join syntax has the benefits of compatibility with the outer join syntax; improved readability of large SQL statements, as each join is followed by its join predicate; and increased likelihood that all join predicates will be coded correctly.

Outer Joins

A different kind of join operation is the outer join. The result set of an inner join consists only of those matched rows that are present in both joined tables. What happens when we need to include those values that are present in one or another joined table but not in both of them? In this case, an OUTER JOIN operation is needed. Outer joins are designed to generate an answer set that includes those values that are present in joined tables and those that are not. This type of optional relationship between tables is very common. Figure 6-2 shows the best way to view outer joins, representing the data from each table that is included/excluded in each type of join.

Figure 6-2. Outer joins


Outer joins are of three types: left outer join, right outer join, and full outer join.

Left Outer Join

A LEFT OUTER JOIN operation, or left join, produces an answer set that includes the matching values of both joined tablesinner join rowsand those values present only in the left joined tablerows exclusive to the left table. The left joined table is the one used in the left part of the LEFT OUTER JOIN operator when coding the join operation in the FROM clause.

In a left outer join, the left table is referred to as the preserved row table. If a row is unmatched between the two joined tables, it is still returned, or preserved, from the left table. The opposite, or right, table is the null-supplying table. When a row is unmatched from the left to the right table in the join, the row is supplied with nulls from the right table.

NOTE

With left and right outer joins, reversing the order of tables in the FROM clause may alter the result.


Suppose that we have been requested to generate a report that includes the first name, the phone number, and the highest score for all the candidates present in the CANDIDATE table. If an inner join is used, the report will include data for only those candidates present in the TEST_TAKEN table.

Without using the outer-join syntax, this request could be solved using some SQL statements already discussed; however, the construction would be complex. Instead, we will use the left outer join to satisfy the request, as the following example shows:

 SELECT C.FNAME, C.WPHONE, MAX(TT.SCORE) FROM CANDIDATE C    LEFT OUTER JOIN TEST_TAKEN TT ON C.CID=TT.CID GROUP BY FNAME, WPHONE; 

NOTE

The keyword OUTER is optional. Provided that the JOIN is preceded by LEFT, RIGHT, or FULL, OUTER is implied.


Note the syntax used to indicate a left outer join. The LEFT OUTER JOIN operator is used to indicate the left outer join operation. In this example, the answer set includes those candidates not present in the TEST_TAKEN table. The MAX(TT.SCORE) column will show nulls for those candidates.

When coding outer joins, it is important to note the difference between WHERE and ON clause predicates. From the previous example, we may want to retrieve only those candidates with a CID of 111, 333, or 444:

 SELECT C.CID, C.FNAME, C.WPHONE, MAX(TT.SCORE) AS MAX_SCORE FROM CANDIDATE C    LEFT OUTER JOIN TEST_TAKEN TT ON C.CID=TT.CID WHERE C.CID IN (111, 333, 444) GROUP BY C.CID, C.FNAME, C.WPHONE CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 111 SUSAN  1115551234        92 333 DAN    4442314244        73 444 JOHN   5552143244         - 3 record(s) selected. 

The WHERE clause predicate in this example is applied to the left, or preserved, row table. DB2 is able, therefore, to apply this predicate before the join to the CANDIDATE table and join only candidates 111, 333, and 444 to the TEST_TAKEN table. CID 444 in this example returns a MAX_SCORE of null because the candidate has not taken any tests and is therefore not found in the TEST_TAKEN table.

Consider the result if the query is written to also exclude those rows with a score less than 75, if found:

 SELECT C.CID, C.FNAME, C.WPHONE, MAX(TT.SCORE) AS MAX_SCORE FROM CANDIDATE C    LEFT OUTER JOIN TEST_TAKEN TT ON C.CID=TT.CID WHERE C.CID IN (111, 333, 444) AND   TT.SCORE >= 75 GROUP BY C.CID, C.FNAME, C.WPHONE CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 111 SUSAN 1115551234        92 1 record(s) selected. 

Only one row qualifies from this query. The WHERE clause predicate on the preserved row table (WHERE C.CID IN (111, 333, 444)) can be applied before the join to limit the number of rows that must be joined. The additional WHERE clause predicate (AND TT.SCORE >= 75) is applied after the join because it is not known until after the join what the value of tt.score will be. This predicate is applied to each row, with the following result:

  • Is Dan's score of 73 >= 75? FALSE

  • Is Susan's score of 92 >= 75? TRUE

  • Is John's score of NULL >= 75? UNKNOWN

Remember that for a row to be returned, the WHERE clause must evaluate to TRUE. Because the nulls introduced by the outer join are discarded by the WHERE clause applied to the null-supplying table, DB2 will rewrite the left join to an INNER JOIN, thus allowing the WHERE clause predicates to be applied to both tables before the joining of the rows occurs:

 SELECT C.CID, C.FNAME, C.WPHONE, MAX(TT.SCORE) AS MAX_SCORE FROM CANDIDATE C    INNER JOIN TEST_TAKEN TT ON C.CID=TT.CID WHERE C.CID IN (111, 333, 444) AND   TT.SCORE >= 75 GROUP BY C.CID, C.FNAME, C.WPHONE 

NOTE

The outer join simplification will be recorded in the JOIN_TYPE column of the plan table. If a left, right, or full join has been simplified to an inner join, the JOIN_TYPE value will be blank.


If you wish to maintain the null-supplied rows, you must include an additional IS NULL predicate, such as:

 SELECT C.CID, C.FNAME, C.WPHONE, MAX(TT.SCORE) AS MAX_SCORE FROM CANDIDATE C    LEFT OUTER JOIN TEST_TAKEN TT ON C.CID=TT.CID WHERE C.CID IN (111, 333, 444) AND   (TT.SCORE >= 75 OR TT.SCORE IS NULL) GROUP BY C.CID, C.FNAME, C.WPHONE CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 333 SUSAN 4442314244        92 444 JOHN  5552143244         - 2 record(s) selected. 

In this case, DB2 will not simplify the left outer join to an inner join; because the null-supplied row is not discarded by the WHERE clause predicate. Thus, the row for Glen is included in the result set because it is null.

NOTE

WHERE clause predicates applied to the preserved row table can be applied before the join. Predicates applied to the null-supplying table will cause DB2 to simplify the join unless a predicate of the form OR COL IS NULLor equivalent expression that retains nullsis coded.


We've looked at WHERE clause predicates applied to either the preserved row or null-supplied tables. The next objective is to demonstrate the impact of coding local predicates in the ON clause. The following example displays the output if we specify the candidate predicate (111, 333, or 444) in the ON clause rather than in the WHERE clause:

 SELECT C.CID, C.FNAME, C.WPHONE, MAX(TT.SCORE) AS MAX_SCORE FROM CANDIDATE C    LEFT OUTER JOIN TEST_TAKEN TT ON C.CID=TT.CID AND C.CID IN (111, 333, 444) GROUP BY C.CID, C.FNAME, C.WPHONE CID FNAME  WPHONE     MAX_SCORE --- -----  ---------- --------- 111 BILL   1115551234        73 222 GEORGE 2226543455         - 333 SUSAN  4442314244        92 444 GLEN   5552143244         - 555 JIM    6662341234         - 666 KEVIN  7773142134         - 777 BERT   8886534534         - 888 PAUL   9992112212         - 8 record(s) selected. 

Because no WHERE clause predicates restrict the result, this query returns all rows of the CANDIDATE table. Columns from the TEST_TAKEN table are supplied only when the ON clause is TRUE. Remember, WHERE clause predicates restrict the result, whereas ON clause predicates determine the criteria for the join. For the rows not matching the full ON clause (CID NOT IN 111, 333note that 444 is not included, because it will not be found in the TEST_TAKEN table), DB2 will preserve the row from the left table and supply nulls from the right table. Rows matching the ON clause (CID IN 111, 333) will match and thus return columns from both tables.

NOTE

If it can determine before joining the rows that the ON clause will fail, DB2 will automatically supply nulls for the null-supplied columns without attempting the join. This improves performance, as rows are not joined unnecessarily if the final result will not match.


Right Outer Join

A RIGHT OUTER JOIN operation, or right join, produces an answer set that includes the matching values of both joined tables and those values present only in the right joined table. The right joined table is the one used in the right part of the RIGHT OUTER JOIN operator when coding the join operation.

For a right join, the right table is the preserved row table, and the left table is the null-supplied table for unmatched rows. This is the opposite of a left join. An example using a right outer join follows:

 SELECT NAME, COUNT(DISTINCT TT.CID) FROM TEST_TAKEN TT RIGHT OUTER JOIN TEST T ON TT.NUMBER = T.NUMBER 

This example requests all test names present in the TEST table and the number of candidates who scheduled or took each test. Note that there may be some tests for which no candidate was scheduled. You cannot report such tests using an inner join statement; however, you can do so using a right outer join.

NOTE

The COUNT(DISTINCT TT.CID) will count unique occurrences of TT.CID after the exclusion of nulls. Thus, nulls introduced by the outer join will not be counted. This is intentional, as the addition of a null will falsely add 1 to the count. The right outer join is required, however, to also report those tests that have not been taken by any candidates.


As with a left join, the order of the tables in the FROM clause for a right join can impact the result. However, a right join can be converted to a left join by reversing the order of the tables in the FROM clause and by changing the RIGHT keyword to LEFT.

NOTE

DB2 converts all right joins to left joins at bind time. The JOIN_TYPE column of the plan table will show an L for a right or a left join unless the join has been simplified to an inner. For this reason, you should avoid coding right joins, to minimize confusion when evaluating the access path.


Full Outer Join

The FULL OUTER JOIN operation produces an answer set that includes the matching values of both joined tables and those values not present in one or the other of the tables. Thus, it returns the inclusive inner join rows plus the exclusive left and right outer rows.

To show a FULL OUTER JOIN operation, we will create two sample tables: CITY and COUNTRY. They show the relationship between a city and a country (CITY table) and a country and a continent (COUNTRY table). The CITY table is designed to have countries that are not in the COUNTRY table. The COUNTRY table is also designed to have countries that are not in the city table. The contents of both tables are as follows:

 TABLE:CITY CITY_NAME           COUNTRY_NAME ---------------     --------------- Sydney              Australia London              England Dublin              Ireland Firenze             Italy Milano              Italy Mexico City         Mexico Lima                Peru Toronto             Canada Vienna              Austria Hannover            Germany 10 record(s) selected. TABLE:COUNTRY COUNTRY_NAME        CONTINENT ---------------     -------------------------- Australia           Australian Continent England             European Continent Ireland             European Continent Italy               European Continent Mexico              American Continent Austria             European Continent South Africa        African Continent Spain               European Continent 8 record(s) selected. 

We want to show all the countries, cities, and continents that are in the tables. Therefore, we are using a full outer join, as the following example shows:

 SELECT CTRY.CONTINENT, CTRY.COUNTRY_NAME, CTY.COUNTRY_NAME, CTY.CITY_NAME FROM COUNTRY CTRY    FULL OUTER JOIN CITY CTY ON CTY.COUNTRY_NAME=CTRY.COUNTRY_NAME ORDER BY CTRY.CONTINENT,    CTY.COUNTRY_NAME,    CTY.CITY_NAME CONTINENT            COUNTRY_NAME     COUNTRY_NAME     CITY_NAME --------------       -------------    -------------- ------------ African Continent    South Africa     -              - American Continent   Mexico           Mexico         Mexico City Australian Continent Australia        Australia      Sydney European Continent   Austria          Austria        Vienna European Continent   England          England        London European Continent   Ireland          Ireland        Dublin European Continent   Italy            Italy          Firenze European Continent   Italy            Italy          Milano European Continent   Spain            -              - -                    -                Canada         Toronto -                    -                Germany        Hannover -                    -                Peru           Lima 12 record(s) selected. 

The rows that have a null value were added by the outer join operation. The COUNTRY_NAME column is shown twice to see those countries present in the COUNTRY table but not in the CITY table, and vice versa.

As shown by the full join example, both tables can preserve rows and also supply nulls for unmatched rows. Because WHERE clause predicates applied to null-supplied tables may cause DB2 to perform simplification of the outer join, predicates that are to be applied before the join to limit the result set must be coded within nested table expressions.

NOTE

DB2 can simplify a full join to either a left or an inner join if the WHERE clause predicates discard nulls from one or both tables, respectively. This is shown in the JOIN_TYPE column of the plan table as either an L for left, or blank, for an inner join rather than an F for a full join.


Consider the following example, which lists the information for only South Africa and Peru:

 SELECT CTRY.CONTINENT, CTRY.COUNTRY_NAME, CTY.COUNTRY_NAME, CTY.CITY_NAME FROM (SELECT CONTINENT, COUNTRY_NAME       FROM COUNTRY       WHERE COUNTRY_NAME IN ('SOUTH AFRICA', 'PERU')) AS CTRY    FULL OUTER JOIN      (SELECT COUNTRY_NAME, CITY_NAME       FROM CITY       WHERE COUNTRY_NAME IN ('SOUTH AFRICA', 'PERU')) AS CTY ON CTY.COUNTRY_NAME=CTRY.COUNTRY_NAME ORDER BY CTRY.CONTINENT, CTY.COUNTRY_NAME,    CTY.CITY_NAME CONTINENT          COUNTRY_NAME   COUNTRY_NAME   CITY_NAME -----------------  -------------  -------------  --------- African Continent  South Africa   -              - -                  -              Peru           Lima 2 record(s) selected. 

Because the country name can return null from either table, it is common to code the COALESCE or VALUE function to ensure that the value is always supplied and is not null. The select list for this example would be as follows:

 SELECT CTRY.CONTINENT, COALESCE(CTRY.COUNTRY_NAME,CTY.COUNTRY_NAME)     , CTY.CITY_NAME 

NOTE

COALESCE returns the first list element that is not null.


Combining Outer Joins

To this point, we have discussed each outer join operation separately. Now, we will show a more complex example: combining two outer joins in a single query. Let's display all the candidates and all the tests with their respective scores. To create this query, we need two outer joins. The first outer join will obtain all candidates and their scores, including candidates who did not schedule or take any tests. The second outer join will retrieve all the tests present in the TEST table, even if no candidate scheduled or took those tests:

 SELECT C.CID, T.NAME, SCORE FROM (CANDIDATE C LEFT OUTER JOIN TEST_TAKEN TT ON TT.CID=C.CID)    FULL OUTER JOIN TEST T    ON TT.NUMBER = T.NUMBER ORDER BY C.CID 

The first outer join is enclosed in parentheses, which are used for readability and to indicate that the left outer join will be resolved first. In reality, the join sequence is determined by the precedence of the join predicates. For example, because the full join in this example joins to a column from the null-supplied table of the previous left join, the left join must precede the full join.

NOTE

When the join sequence is not dictated by the join predicates, DB2 will evaluate the most cost-effective join sequence.


The left outer join gathers all the candidate IDs. We need only a left outer join here. Because of referential integrity constraints, table TEST_TAKEN can have candidates present only in the CANDIDATE table.

The second part is a full outer join. With this outer join, we take all the tests taken by the candidates and the result of the left outer join, and we join them with all the tests in the TEST table. We need a full outer join this time, even though the TEST_TAKEN table can have only test numbers that are present in the TEST table; the left table of this outer join, which is the result of the first join, may include null values as a test number. The TEST table does not have null values, so we need to use a full outer join. Even if the TEST table did contain nulls, a null will never equal another null unless the predicates of the form IS NULL are included in the join criteria. The results are as follows:

 CID NAME                           SCORE --- ------------------------------ ----- 111 DB2 Fundamentals               65 111 DB2 Administration             73 111 DB2 Application Development    67 222 DB2 Fundamentals               55 222 DB2 Application Development    53 222 DB2 Application Development    75 333 DB2 Fundamentals               82 333 DB2 Administration             - 333 DB2 Application Development    92 444 -                              - 555 -                              - 666 -                              - 777 -                              - 888 -                              - 14 record(s) selected. 

NOTE

Care must be taken when joining the result of an outer join to subsequent tables. If the join predicate refers to columns from a null-supplied table, the join predicate must also cater for the occurrence of nulls. Where possible, code the join predicate from the preserved row table; if the preceding join was a full join, use the COALESCE function to ensure that the join predicate cannot be null.


Joins versus Denormalization

One issue that needs to be examined when discussing joins is that of denormalization, the process of pulling normalized tables back together into a single table to avoid joins. This is often done in an effort to minimize the joins and to improve performance. Because this is generally applied to tables in a one-to-many relationship, it results in multiple occurrences of the same data, introducing update anomalies, or a single piece of data occurring multiple times. When an attempt to update the data is made, all occurrences must be found, and there is no fixed way to determine the number or location of the multiple occurrences. Some denormalization in some environments, such as read-only data warehouses, may be legitimate, but each situation must be evaluated carefully. For additional information about normalization/denormalization and update anomalies, refer to Chapter 4.



DB2 for z. OS Version 8 DBA Certification Guide
DB2 for z/OS Version 8 DBA Certification Guide
ISBN: 0131491202
EAN: 2147483647
Year: 2003
Pages: 175
Authors: Susan Lawson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net