Joins

Team-Fly    

 
DB2 Universal Database for OS/390 v7.1 Application Certification Guide
By Susan Lawson
Table of Contents
Chapter 4.  Advanced SQL Coding


The heart of SQL in a relational system is the ability to perform relational joins. Joins are specified by using more than one table name in the FROM clause of a query. A joined table specifies an intermediate result table that is the result of either an inner join or an outer join.

NOTE

graphics/note_icon.jpg

The maximum number of tables that can be coded in a single FROM clause is 15, but there can be up to 225 table references in an SQL statement. The 15 table limit can be increased by updating the macro SPRMMXT to a value greater than 15. Also, the 15 table limit does not apply to star join queries.


Inner Joins

The join operation used most often is the one we have been using in the book exercises, which is known as an inner join. An inner join combines each row of the left table with each row of the right table, keeping only the rows where the join condition is true. The result table may be missing rows from either or both of the joined tables. The following example shows an inner join without using the INNER JOIN syntax, which means that the inner join is implicitly defined. The tables in the FROM clause are separated by commas to indicate the join. This type of syntax is valid only for inner joins.

In the example following, we wish to find the scores of all DB2 tests taken by U.S. citizens in the small test centers year by year, in order of ascending year and descending score.

 SELECT t.name, YEAR(tt.date_taken) AS year,     tt.score FROM test t, test_taken tt, test_center tc,    candidate c WHERE c.cid = tt.cid AND    tc.tcid = tt.tcid AND    t.number = tt.number AND    t.name LIKE 'DB2%' AND    c.country='USA' AND    tc.noseats < 10    ORDER BY YEAR(tt.date_taken), TT.SCORE DESC 

If the inner join used the explicit syntax, it would appear as

 SELECT t.name, YEAR(tt.date_taken) AS year,     tt.score FROM test t INNER JOIN test_taken tt on t.number = tt.number INNER JOIN test_center tcon tc.tcid = tt.tcid  INNER JOIN candidate c ON c.cid = tt.cid WHERE t.name LIKE 'DB2%'   AND c.country='USA'   AND tc.noseats < 10 ORDER BY YEAR(tt.date_taken), TT.SCORE DESC 

For the explicit join syntax, the keyword INNER is optional and, if omitted, implies an inner join. The ON clause determines the join predicates between the tables, and the WHERE clause applies the local or table access predicates. For an inner join, it does not impact the result if the local predicates appear in the WHERE or ON clause. For consistency with outer joins, it is recommended that join predicates appear only in the ON clause, and local predicates in the WHERE clause. For outer joins, the placement of predicates in either the ON or WHERE clause can dramatically change the result. This is demonstrated in the next section on outer joins.

Each join must be followed by its ON clause. The ON clause dictates the two tables that are being joined. The joined tables must appear in the FROM clause before they are referenced by join predicates in the ON clause. In the above example, the first join is between TEST and TEST_TAKEN. The second join is between TEST_CENTER and TEST_TAKEN as dictated by the ON clause. And the third is between CANDIDATE and TEST_TAKEN.

There is one exception to the rule that the ON clause must follow each join, and that is when nesting joins within other joins. For simplicity, this syntax is not recommended and can easily be recoded to suit the rule just mentioned. Deciding whether to code the explicit join syntax or to separate the joined tables with commas is purely a matter of preference and is not related to performance.

NOTE

graphics/note_icon.jpg

The explicit join syntax has the following benefits: (1) compatibility with the outer join syntax; (2) improves readability of large SQL statements, as each join is followed by its join predicate; and (3) increases the likelihood that all join predicates will be coded correctly.


Outer Joins

Now, we will talk about a different kind of join operation, the outer join. The result set of an inner join consists only of those matched rows that are present in both joined tables. What happens when we need to include those values that are present in one or another joined table, but not in both of them? This is the case when we need to use an OUTER JOIN operation. Outer joins are designed to generate an answer set that includes those values that are present in joined tables and those that are not. This type of optional relationship between tables is very common. There are three types of outer joins: left outer join, right outer join, and full outer join.

Left Outer Join

A LEFT OUTER JOIN operation, also known as left join, produces an answer set that includes the matching values of both joined tables (inner join rows) and those values only present in the left joined table (rows exclusive to the left table). The left joined table is the one used in the left part of the LEFT OUTER JOIN operator when coding the join operation in the FROM clause.

In a left outer join, the left table is referred to as the preserved row table. If a row is unmatched between the two joined tables, it is still returned (preserved) from the left table. The opposite (right) table is the null-supplying table. When a row is unmatched from the left to the right table in the join, the row is supplied with nulls from the right table.

NOTE

graphics/note_icon.jpg

With left and right outer joins, reversing the order of tables in the FROM clause may alter the result.


We have been requested to generate a report that includes the first name, the phone number, and the highest score for all the candidates present in the candidate table. If an inner join is used, the report will include data only for those candidates present in the TEST_TAKEN table.

Without utilizing the outer join syntax, this request could be solved using some SQL statements already discussed; however, the construction will be complex. Instead, we will use the left outer join to satisfy the request, as the following example shows.

 SELECT C.fname, C.wphone, MAX(TT.score)  FROM candidate c    LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid GROUP BY fname, wphone 

NOTE

graphics/note_icon.jpg

The keyword OUTER is optional. Provided the JOIN is preceded by either LEFT, RIGHT, or FULL, then OUTER is implied .


Observe the syntax used to indicate a left outer join. The LEFT OUTER JOIN operator is used to indicate the left outer join operation. In this example, the answer set includes those candidates not present in the TEST_TAKEN table. The MAX(TT.SCORE) column will show nulls for those candidates.

It is important to note when coding outer joins the difference between WHERE and on clause predicates. From the previous example, say we want to retrieve only those candidates with a CID of 111, 333, or 444.

 SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE  FROM candidate c    LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 111 Bill  1115551234        73 333 Susan 4442314244        92 444 Glen  5552143244         - 3 record(s) selected. 

The WHERE clause predicate in this example is applied to the left or preserved row table. DB2 is able, therefore, to apply this predicate before the join to the CANDIDATE table and join only candidates 111, 333, and 444 to the TEST_TAKEN table. CID 444 in this example returns a MAX_SCORE of null because they have not taken any tests and are therefore not found in the TEST_TAKEN table.

Consider the result if the query was written to also exclude those rows with a score less than 75 if found.

 SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE  FROM candidate c    LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND   tt.score >= 75 GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 333 Susan 4442314244        92 1 record(s) selected. 

There is only one row that qualifies from this query. The WHERE clause predicate on the preserved row table (WHERE c.cid IN (111, 333, 444)) can be applied before the join to limit the number of rows that must be joined. The additional WHERE clause predicate (AND tt.score >= 75) is applied after the join because it is not known until after the join what the value of tt.score will be. This predicate is applied to each row, with the following result:

  • Is Bill's score of 73 >= 75? FALSE

  • Is Susan's score of 92 >= 75? TRUE

  • Is Glen's score of NULL >= 75? UNKNOWN

Remember that for a row to be returned, the WHERE clause must evaluate to TRUE. Since the nulls introduced by the outer join are discarded by the WHERE clause applied to the null-supplying table, DB2 will actually rewrite the left join to an INNER JOIN, thus allowing the WHERE clause predicates to be applied to both tables before the joining of the rows occurs.

 SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE  FROM candidate c    INNER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND   tt.score >= 75 GROUP BY C.CID, C.fname, C.wphone 

NOTE

graphics/note_icon.jpg

The outer join simplification will be recorded in the JOIN_TYPE column of the plan table. If a left, right, or full join has been simplified to an inner join, the JOIN_TYPE value will be blank.


If you wish to maintain the null-supplied rows, then you must include an additional IS NULL predicate, such as:

 SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE  FROM candidate c    LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND   (tt.score >= 75 OR tt.score IS NULL) GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE     MAX_SCORE --- ----- ---------- --------- 333 Susan 4442314244        92 444 Glen  5552143244         - 2 record(s) selected. 

In this case, DB2 will not simplify the left outer join to an inner join because the null-supplied row is not discarded by the WHERE clause predicate. Thus, the row for Glen is included in the result set because it is null.

NOTE

graphics/note_icon.jpg

WHERE clause predicates applied to the preserved row table can be applied before the join. Predicates applied to the null-supplying table will cause DB2 to simplify the join unless a predicate of the form OR COL IS NULL (or equivalent expression that retains nulls) is coded.


We've looked at WHERE clause predicates applied to either the preserved row or null-supplied tables. The next objective is to demonstrate the impact of coding local predicates in the ON clause. The following example displays the output if we specify the candidate predicate (111, 333, or 444) in the ON clause rather than in the WHERE clause.

 SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE  FROM candidate c    LEFT OUTER JOIN Test_taken tt ON c.cid=tt.cid AND c.cid IN (111, 333, 444) GROUP BY C.CID, C.fname, C.wphone CID FNAME  WPHONE     MAX_SCORE --- -----  ---------- --------- 111 Bill   1115551234        73 222 George 2226543455         - 333 Susan  4442314244        92 444 Glen   5552143244         - 555 Jim    6662341234         - 666 Kevin  7773142134         - 777 Bert   8886534534         - 888 Paul   9992112212         - 8 record(s) selected. 

Since there are no WHERE clause predicates to restrict the result, this query returns all rows of the CANDIDATE table. Columns from the TEST_TAKEN table are supplied only when the ON clause is true. Remember, WHERE clause predicates restrict the result, whereas ON clause predicates determine the criteria for the join. For the rows not matching the full ON clause (CID NOT IN (111, 333)note that 444 is not included because it will not be found in the TEST_TAKEN table), DB2 will preserve the row from the left table and supply nulls from the right table. Rows matching the ON clause (CID IN (111, 333)) will match and thus return columns from both tables.

NOTE

graphics/note_icon.jpg

If DB2 can determine that the ON clause will fail before joining the rows, then it will automatically supply nulls for the null-supplied columns without attempting the join. This improves performance, since rows are not joined unnecessarily if the final result will not match regardless.


Right Outer Join

A RIGHT OUTER JOIN operation, also known as right join, produces an answer set that includes the matching values of both joined tables and those values only present in the right joined table. The right joined table is the one used in the right part of the RIGHT OUTER JOIN operator when coding the join operation.

For a right join, the right table is the preserved row table, and the left table is the null-supplied table for unmatched rows. This is the opposite of a left join. The example using a right outer join is shown below:

 SELECT name, count(DISTINCT tt.cid)  FROM test_taken tt RIGHT OUTER JOIN test t ON tt.number = t.number 

In this example, all test names present in the TEST table and the number of candidates who scheduled or took each test are requested. Notice there may be some tests for which no candidate was scheduled. You cannot report such tests using an inner join statement; however, you can do it using right outer join.

NOTE

graphics/note_icon.jpg

The COUNT(DISTINCT TT.CID) will count unique occurrences of TT.CID after the exclusion of nulls. Thus, nulls introduced by the outer join will not be counted. This is intentional, as the addition of a null will falsely add one to the count. The right outer join is required, however, to also report those tests that have not been taken by any candidates.


As with a left join, the order of the tables in the FROM clause for a right join can impact the result. However, a right join can be converted to a left join by reversing the order of the tables in the FROM clause, and also by changing the RIGHT keyword to LEFT.

NOTE

graphics/note_icon.jpg

DB2 converts all right joins to left joins at bind time. In the JOIN_TYPE column of the plan table will show an L for a right or a left join unless the join has been simplified to an inner. For this reason, it is recommended to avoid coding right joins to minimize confusion when evaluating the access path .


Full Outer Join

The FULL OUTER JOIN operation produces an answer set that includes the matching values of both joined tables and those values not present in one or the other of the tables. Thus, it returns the inclusive inner join rows plus the exclusive left and right outer rows.

To show a FULL OUTER JOIN operation, we will create two sample tables: CITY and COUNTRY. They show the relationship between a city and a country (CITY table) and a country and a continent (COUNTRY table). The CITY table is designed to have countries that are not in the COUNTRY table. The COUNTRY table is also designed to have countries that are not in the city table. The contents of both tables are shown below.

 CITY_NAME            COUNTRY_NAME  ---------------      --------------- Sydney               Australia London               England Dublin               Ireland Firenze              Italy Milano               Italy Mexico               Mexico Lima                 Peru Toronto              Canada Vienna               Austria Hannover             Germany 10 record(s) selected. COUNTRY_NAME         CONTINENT ---------------      -------------------------- Australia            Australian Continent England              European Continent Ireland              European Continent Italy                European Continent Mexico               American Continent Austria              European Continent South Africa         African Continent Spain                European Continent 8 record(s) selected. 

We want to show all the countries, cities, and continents that are in the tables. Therefore, we are using a full outer join, as the following example shows:

 SELECT ctry.continent, ctry.country_name,  cty.country_name, cty.city_name FROM country ctry    FULL OUTER JOIN city cty ON cty.country_name=ctry.country_name ORDER BY ctry.continent,    cty.country_name,    cty.city_name CONTINENT              COUNTRY_NAME       COUNTRY_NAME     CITY_NAME --------------         -------------      -------------    ---------- African Continent      South Africa       -                - American Continent     Mexico             Mexico           Mexico Australian Continent   Australia          Australia        Sydney European Continent     Austria            Austria          Vienna European Continent     England            England          London European Continent     Ireland            Ireland          Dublin European Continent     Italy              Italy            Firenze European Continent     Italy              Italy            Milano European Continent     Spain              -                - -                      -                  Canada           Toronto -                      -                  Germany          Hannover -                      -                  Peru             Lima 12 record(s) selected. 

As shown in the above example, the rows that have a null value were added by the outer join operation. The COUNTRY_NAME column is shown twice to see those countries present in the COUNTRY table that are not present in the CITY table and vice versa.

As shown by the full join example, both tables can preserve rows and also supply nulls for unmatched rows. Since WHERE clause predicates applied to null-supplied tables may cause DB2 to perform simplification of the outer join, predicates that are to be applied before the join to limit the result set must be coded within nested table expressions.

NOTE

graphics/note_icon.jpg

DB2 can simplify a full join to either a left or inner join if the WHERE clause predicates discard nulls from one or both tables respectively. This is shown in the JOIN_TYPE column of the plan table as either an L for left or blank for inner join rather than F for a full join.


Consider the following example where we wish to list the information for only South Africa and Peru.

 SELECT ctry.continent, ctry.country_name,  cty.country_name, cty.city_name FROM (SELECT CONTINENT, COUNTRY_NAME       FROM country       WHERE COUNTRY_NAME IN ('South Africa', 'Peru')) as ctry    FULL OUTER JOIN      (SELECT COUNTRY_NAME, CITY_NAME       FROM city       WHERE COUNTRY_NAME IN ('South Africa', 'Peru')) as cty ON cty.country_name=ctry.country_name ORDER BY ctry.continent, cty.country_name,    cty.city_name CONTINENT              COUNTRY_NAME       COUNTRY_NAME      CITY_NAME -----------------      -------------      -------------     --------- African Continent      South Africa       -                 - -                      -                  Peru              Lima 2 record(s) selected. 

Since the country name can return null from either table, it is common to code the COALESCE or VALUE function to ensure that the value is always supplied (and not null). The select list for this example would be.

 SELECT ctry.continent, COALESCE(ctry.country_name,cty.country_name)      , cty.city_name 

NOTE

graphics/note_icon.jpg

COALESCE returns the first element in the list that is not null.


Combining Outer Joins

Up until now, we have discussed each outer join operation separately. Now, we will show a more complex example combining two outer joins in a single query. Let's display all the candidates and all the tests with their respective scores. To create this query, we need two outer joins. The first outer join will obtain all candidates and their scores, including candidates who did not schedule or take any tests. The second outer join will retrieve all the tests present in the TEST table even if no candidate scheduled or took those tests.

 SELECT c.cid, t.name, score  FROM (candidate c LEFT OUTER JOIN test_taken tt ON tt.cid=c.cid)    FULL OUTER JOIN test t    ON tt.number = t.number ORDER BY c.cid 

The first outer join is enclosed in parentheses. The parentheses are used for readability and to indicate that the left outer join will be resolved first. In reality, the join sequence is determined by the precedence of the join predicates. For example, because the full join in this example joins to a column from the null-supplied table of the previous left join, the left join must precede the full join.

NOTE

graphics/note_icon.jpg

When the join sequence is not dictated by the join predicates, DB2 will evaluate the most cost-effective join sequence.


The left outer join gathers all of the candidate IDs. We only need a left outer join here. Because of referential integrity constraints, table TEST_TAKEN can have candidates present only in the CANDIDATE table.

The second part is a full outer join. With this outer join, we take all the tests taken by the candidates, the result of the left outer join, and join them with all the tests in the TEST table. We need a full outer join this time, even though the TEST_TAKEN table can have only test numbers that are present in the TEST table, because the left table of this outer join, which is the result of the first join, may include null values as a test number. The TEST table does not have null values, so we need to use a full outer join. Even if the TEST table did contain nulls, a null will never equal another null unless the predicates of the form IS NULL are included in the join criteria. The results are shown below.

 CID NAME                               SCORE  --- ------------------------------     ----- 111 DB2 Fundamentals                   65 111 DB2 Administration                 73 111 DB2 Application Development        67 222 DB2 Fundamentals                   55 222 DB2 Application Development        53 222 DB2 Application Development        75 333 DB2 Fundamentals                   82 333 DB2 Administration                 - 333 DB2 Application Development        92 444 -                                  - 555 -                                  - 666 -                                  - 777 -                                  - 888 -                                  - 14 record(s) selected. 

NOTE

graphics/note_icon.jpg

Care must be taken when joining the result of an outer join to subsequent tables. If the join predicate refers to columns from a null-supplied table, then the join predicate must also cater for the occurrence of nulls. Where possible, code the join predicate from the preserved row table, and if the preceding join was a full join, then utilize the COALESCE function to ensure that the join predicate cannot be null.


Joins Versus Denormalization

One issue that needs to be examined when discussing joins is that of denormalization. Sometimes there is the perceived need to take tables that are always joined together and put them into a single table, in an effort to minimize the joins and improve performance. Since this is generally applied to tables in a one-to-many relationship, it results in multiple occurrences of the same data. This introduces update anomalies, the situation where a single piece of data occurs multiple times. When there is the attempt to update it, all occurrences must be found, and there is no fixed way to determine the number or location of the multiple occurrences.


Team-Fly    
Top


DB2 Universal Database for OS. 390 v7. 1 Application Certification Guide
DB2(R) Universal Database for OS/390 V7.1 Application Certification Guide (IBM DB2 Certification Guide Series)
ISBN: 0131007718
EAN: 2147483647
Year: 2002
Pages: 163
Authors: Susan Lawson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net