The heart of SQL in a relational system is the ability to perform relational joins. Joins are specified by using more than one table name in the FROM clause of a query. A joined table specifies an intermediate result table that is the result of either an inner join or an outer join. NOTE | The maximum number of tables that can be coded in a single FROM clause is 15, but there can be up to 225 table references in an SQL statement. The 15 table limit can be increased by updating the macro SPRMMXT to a value greater than 15. Also, the 15 table limit does not apply to star join queries. | Inner Joins The join operation used most often is the one we have been using in the book exercises, which is known as an inner join. An inner join combines each row of the left table with each row of the right table, keeping only the rows where the join condition is true. The result table may be missing rows from either or both of the joined tables. The following example shows an inner join without using the INNER JOIN syntax, which means that the inner join is implicitly defined. The tables in the FROM clause are separated by commas to indicate the join. This type of syntax is valid only for inner joins. In the example following, we wish to find the scores of all DB2 tests taken by U.S. citizens in the small test centers year by year, in order of ascending year and descending score. SELECT t.name, YEAR(tt.date_taken) AS year, tt.score FROM test t, test_taken tt, test_center tc, candidate c WHERE c.cid = tt.cid AND tc.tcid = tt.tcid AND t.number = tt.number AND t.name LIKE 'DB2%' AND c.country='USA' AND tc.noseats < 10 ORDER BY YEAR(tt.date_taken), TT.SCORE DESC If the inner join used the explicit syntax, it would appear as SELECT t.name, YEAR(tt.date_taken) AS year, tt.score FROM test t INNER JOIN test_taken tt on t.number = tt.number INNER JOIN test_center tcon tc.tcid = tt.tcid INNER JOIN candidate c ON c.cid = tt.cid WHERE t.name LIKE 'DB2%' AND c.country='USA' AND tc.noseats < 10 ORDER BY YEAR(tt.date_taken), TT.SCORE DESC For the explicit join syntax, the keyword INNER is optional and, if omitted, implies an inner join. The ON clause determines the join predicates between the tables, and the WHERE clause applies the local or table access predicates. For an inner join, it does not impact the result if the local predicates appear in the WHERE or ON clause. For consistency with outer joins, it is recommended that join predicates appear only in the ON clause, and local predicates in the WHERE clause. For outer joins, the placement of predicates in either the ON or WHERE clause can dramatically change the result. This is demonstrated in the next section on outer joins. Each join must be followed by its ON clause. The ON clause dictates the two tables that are being joined. The joined tables must appear in the FROM clause before they are referenced by join predicates in the ON clause. In the above example, the first join is between TEST and TEST_TAKEN. The second join is between TEST_CENTER and TEST_TAKEN as dictated by the ON clause. And the third is between CANDIDATE and TEST_TAKEN. There is one exception to the rule that the ON clause must follow each join, and that is when nesting joins within other joins. For simplicity, this syntax is not recommended and can easily be recoded to suit the rule just mentioned. Deciding whether to code the explicit join syntax or to separate the joined tables with commas is purely a matter of preference and is not related to performance. NOTE | The explicit join syntax has the following benefits: (1) compatibility with the outer join syntax; (2) improves readability of large SQL statements, as each join is followed by its join predicate; and (3) increases the likelihood that all join predicates will be coded correctly. | Outer Joins Now, we will talk about a different kind of join operation, the outer join. The result set of an inner join consists only of those matched rows that are present in both joined tables. What happens when we need to include those values that are present in one or another joined table, but not in both of them? This is the case when we need to use an OUTER JOIN operation. Outer joins are designed to generate an answer set that includes those values that are present in joined tables and those that are not. This type of optional relationship between tables is very common. There are three types of outer joins: left outer join, right outer join, and full outer join. Left Outer Join A LEFT OUTER JOIN operation, also known as left join, produces an answer set that includes the matching values of both joined tables (inner join rows) and those values only present in the left joined table (rows exclusive to the left table). The left joined table is the one used in the left part of the LEFT OUTER JOIN operator when coding the join operation in the FROM clause. In a left outer join, the left table is referred to as the preserved row table. If a row is unmatched between the two joined tables, it is still returned (preserved) from the left table. The opposite (right) table is the null-supplying table. When a row is unmatched from the left to the right table in the join, the row is supplied with nulls from the right table. NOTE | With left and right outer joins, reversing the order of tables in the FROM clause may alter the result. | We have been requested to generate a report that includes the first name, the phone number, and the highest score for all the candidates present in the candidate table. If an inner join is used, the report will include data only for those candidates present in the TEST_TAKEN table. Without utilizing the outer join syntax, this request could be solved using some SQL statements already discussed; however, the construction will be complex. Instead, we will use the left outer join to satisfy the request, as the following example shows. SELECT C.fname, C.wphone, MAX(TT.score) FROM candidate c LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid GROUP BY fname, wphone NOTE | The keyword OUTER is optional. Provided the JOIN is preceded by either LEFT, RIGHT, or FULL, then OUTER is implied . | Observe the syntax used to indicate a left outer join. The LEFT OUTER JOIN operator is used to indicate the left outer join operation. In this example, the answer set includes those candidates not present in the TEST_TAKEN table. The MAX(TT.SCORE) column will show nulls for those candidates. It is important to note when coding outer joins the difference between WHERE and on clause predicates. From the previous example, say we want to retrieve only those candidates with a CID of 111, 333, or 444. SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE FROM candidate c LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE MAX_SCORE --- ----- ---------- --------- 111 Bill 1115551234 73 333 Susan 4442314244 92 444 Glen 5552143244 - 3 record(s) selected. The WHERE clause predicate in this example is applied to the left or preserved row table. DB2 is able, therefore, to apply this predicate before the join to the CANDIDATE table and join only candidates 111, 333, and 444 to the TEST_TAKEN table. CID 444 in this example returns a MAX_SCORE of null because they have not taken any tests and are therefore not found in the TEST_TAKEN table. Consider the result if the query was written to also exclude those rows with a score less than 75 if found. SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE FROM candidate c LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND tt.score >= 75 GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE MAX_SCORE --- ----- ---------- --------- 333 Susan 4442314244 92 1 record(s) selected. There is only one row that qualifies from this query. The WHERE clause predicate on the preserved row table (WHERE c.cid IN (111, 333, 444)) can be applied before the join to limit the number of rows that must be joined. The additional WHERE clause predicate (AND tt.score >= 75) is applied after the join because it is not known until after the join what the value of tt.score will be. This predicate is applied to each row, with the following result: -
Is Bill's score of 73 >= 75? FALSE -
Is Susan's score of 92 >= 75? TRUE -
Is Glen's score of NULL >= 75? UNKNOWN Remember that for a row to be returned, the WHERE clause must evaluate to TRUE. Since the nulls introduced by the outer join are discarded by the WHERE clause applied to the null-supplying table, DB2 will actually rewrite the left join to an INNER JOIN, thus allowing the WHERE clause predicates to be applied to both tables before the joining of the rows occurs. SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE FROM candidate c INNER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND tt.score >= 75 GROUP BY C.CID, C.fname, C.wphone NOTE | The outer join simplification will be recorded in the JOIN_TYPE column of the plan table. If a left, right, or full join has been simplified to an inner join, the JOIN_TYPE value will be blank. | If you wish to maintain the null-supplied rows, then you must include an additional IS NULL predicate, such as: SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE FROM candidate c LEFT OUTER JOIN test_taken tt ON c.cid=tt.cid WHERE c.cid IN (111, 333, 444) AND (tt.score >= 75 OR tt.score IS NULL) GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE MAX_SCORE --- ----- ---------- --------- 333 Susan 4442314244 92 444 Glen 5552143244 - 2 record(s) selected. In this case, DB2 will not simplify the left outer join to an inner join because the null-supplied row is not discarded by the WHERE clause predicate. Thus, the row for Glen is included in the result set because it is null. NOTE | WHERE clause predicates applied to the preserved row table can be applied before the join. Predicates applied to the null-supplying table will cause DB2 to simplify the join unless a predicate of the form OR COL IS NULL (or equivalent expression that retains nulls) is coded. | We've looked at WHERE clause predicates applied to either the preserved row or null-supplied tables. The next objective is to demonstrate the impact of coding local predicates in the ON clause. The following example displays the output if we specify the candidate predicate (111, 333, or 444) in the ON clause rather than in the WHERE clause. SELECT C.CID, C.fname, C.wphone, MAX(TT.score) AS MAX_SCORE FROM candidate c LEFT OUTER JOIN Test_taken tt ON c.cid=tt.cid AND c.cid IN (111, 333, 444) GROUP BY C.CID, C.fname, C.wphone CID FNAME WPHONE MAX_SCORE --- ----- ---------- --------- 111 Bill 1115551234 73 222 George 2226543455 - 333 Susan 4442314244 92 444 Glen 5552143244 - 555 Jim 6662341234 - 666 Kevin 7773142134 - 777 Bert 8886534534 - 888 Paul 9992112212 - 8 record(s) selected. Since there are no WHERE clause predicates to restrict the result, this query returns all rows of the CANDIDATE table. Columns from the TEST_TAKEN table are supplied only when the ON clause is true. Remember, WHERE clause predicates restrict the result, whereas ON clause predicates determine the criteria for the join. For the rows not matching the full ON clause (CID NOT IN (111, 333)note that 444 is not included because it will not be found in the TEST_TAKEN table), DB2 will preserve the row from the left table and supply nulls from the right table. Rows matching the ON clause (CID IN (111, 333)) will match and thus return columns from both tables. NOTE | If DB2 can determine that the ON clause will fail before joining the rows, then it will automatically supply nulls for the null-supplied columns without attempting the join. This improves performance, since rows are not joined unnecessarily if the final result will not match regardless. | Right Outer Join A RIGHT OUTER JOIN operation, also known as right join, produces an answer set that includes the matching values of both joined tables and those values only present in the right joined table. The right joined table is the one used in the right part of the RIGHT OUTER JOIN operator when coding the join operation. For a right join, the right table is the preserved row table, and the left table is the null-supplied table for unmatched rows. This is the opposite of a left join. The example using a right outer join is shown below: SELECT name, count(DISTINCT tt.cid) FROM test_taken tt RIGHT OUTER JOIN test t ON tt.number = t.number In this example, all test names present in the TEST table and the number of candidates who scheduled or took each test are requested. Notice there may be some tests for which no candidate was scheduled. You cannot report such tests using an inner join statement; however, you can do it using right outer join. NOTE | The COUNT(DISTINCT TT.CID) will count unique occurrences of TT.CID after the exclusion of nulls. Thus, nulls introduced by the outer join will not be counted. This is intentional, as the addition of a null will falsely add one to the count. The right outer join is required, however, to also report those tests that have not been taken by any candidates. | As with a left join, the order of the tables in the FROM clause for a right join can impact the result. However, a right join can be converted to a left join by reversing the order of the tables in the FROM clause, and also by changing the RIGHT keyword to LEFT. NOTE | DB2 converts all right joins to left joins at bind time. In the JOIN_TYPE column of the plan table will show an L for a right or a left join unless the join has been simplified to an inner. For this reason, it is recommended to avoid coding right joins to minimize confusion when evaluating the access path . | Full Outer Join The FULL OUTER JOIN operation produces an answer set that includes the matching values of both joined tables and those values not present in one or the other of the tables. Thus, it returns the inclusive inner join rows plus the exclusive left and right outer rows. To show a FULL OUTER JOIN operation, we will create two sample tables: CITY and COUNTRY. They show the relationship between a city and a country (CITY table) and a country and a continent (COUNTRY table). The CITY table is designed to have countries that are not in the COUNTRY table. The COUNTRY table is also designed to have countries that are not in the city table. The contents of both tables are shown below. CITY_NAME COUNTRY_NAME --------------- --------------- Sydney Australia London England Dublin Ireland Firenze Italy Milano Italy Mexico Mexico Lima Peru Toronto Canada Vienna Austria Hannover Germany 10 record(s) selected. COUNTRY_NAME CONTINENT --------------- -------------------------- Australia Australian Continent England European Continent Ireland European Continent Italy European Continent Mexico American Continent Austria European Continent South Africa African Continent Spain European Continent 8 record(s) selected. We want to show all the countries, cities, and continents that are in the tables. Therefore, we are using a full outer join, as the following example shows: SELECT ctry.continent, ctry.country_name, cty.country_name, cty.city_name FROM country ctry FULL OUTER JOIN city cty ON cty.country_name=ctry.country_name ORDER BY ctry.continent, cty.country_name, cty.city_name CONTINENT COUNTRY_NAME COUNTRY_NAME CITY_NAME -------------- ------------- ------------- ---------- African Continent South Africa - - American Continent Mexico Mexico Mexico Australian Continent Australia Australia Sydney European Continent Austria Austria Vienna European Continent England England London European Continent Ireland Ireland Dublin European Continent Italy Italy Firenze European Continent Italy Italy Milano European Continent Spain - - - - Canada Toronto - - Germany Hannover - - Peru Lima 12 record(s) selected. As shown in the above example, the rows that have a null value were added by the outer join operation. The COUNTRY_NAME column is shown twice to see those countries present in the COUNTRY table that are not present in the CITY table and vice versa. As shown by the full join example, both tables can preserve rows and also supply nulls for unmatched rows. Since WHERE clause predicates applied to null-supplied tables may cause DB2 to perform simplification of the outer join, predicates that are to be applied before the join to limit the result set must be coded within nested table expressions. NOTE | DB2 can simplify a full join to either a left or inner join if the WHERE clause predicates discard nulls from one or both tables respectively. This is shown in the JOIN_TYPE column of the plan table as either an L for left or blank for inner join rather than F for a full join. | Consider the following example where we wish to list the information for only South Africa and Peru. SELECT ctry.continent, ctry.country_name, cty.country_name, cty.city_name FROM (SELECT CONTINENT, COUNTRY_NAME FROM country WHERE COUNTRY_NAME IN ('South Africa', 'Peru')) as ctry FULL OUTER JOIN (SELECT COUNTRY_NAME, CITY_NAME FROM city WHERE COUNTRY_NAME IN ('South Africa', 'Peru')) as cty ON cty.country_name=ctry.country_name ORDER BY ctry.continent, cty.country_name, cty.city_name CONTINENT COUNTRY_NAME COUNTRY_NAME CITY_NAME ----------------- ------------- ------------- --------- African Continent South Africa - - - - Peru Lima 2 record(s) selected. Since the country name can return null from either table, it is common to code the COALESCE or VALUE function to ensure that the value is always supplied (and not null). The select list for this example would be. SELECT ctry.continent, COALESCE(ctry.country_name,cty.country_name) , cty.city_name NOTE | COALESCE returns the first element in the list that is not null. | Combining Outer Joins Up until now, we have discussed each outer join operation separately. Now, we will show a more complex example combining two outer joins in a single query. Let's display all the candidates and all the tests with their respective scores. To create this query, we need two outer joins. The first outer join will obtain all candidates and their scores, including candidates who did not schedule or take any tests. The second outer join will retrieve all the tests present in the TEST table even if no candidate scheduled or took those tests. SELECT c.cid, t.name, score FROM (candidate c LEFT OUTER JOIN test_taken tt ON tt.cid=c.cid) FULL OUTER JOIN test t ON tt.number = t.number ORDER BY c.cid The first outer join is enclosed in parentheses. The parentheses are used for readability and to indicate that the left outer join will be resolved first. In reality, the join sequence is determined by the precedence of the join predicates. For example, because the full join in this example joins to a column from the null-supplied table of the previous left join, the left join must precede the full join. NOTE | When the join sequence is not dictated by the join predicates, DB2 will evaluate the most cost-effective join sequence. | The left outer join gathers all of the candidate IDs. We only need a left outer join here. Because of referential integrity constraints, table TEST_TAKEN can have candidates present only in the CANDIDATE table. The second part is a full outer join. With this outer join, we take all the tests taken by the candidates, the result of the left outer join, and join them with all the tests in the TEST table. We need a full outer join this time, even though the TEST_TAKEN table can have only test numbers that are present in the TEST table, because the left table of this outer join, which is the result of the first join, may include null values as a test number. The TEST table does not have null values, so we need to use a full outer join. Even if the TEST table did contain nulls, a null will never equal another null unless the predicates of the form IS NULL are included in the join criteria. The results are shown below. CID NAME SCORE --- ------------------------------ ----- 111 DB2 Fundamentals 65 111 DB2 Administration 73 111 DB2 Application Development 67 222 DB2 Fundamentals 55 222 DB2 Application Development 53 222 DB2 Application Development 75 333 DB2 Fundamentals 82 333 DB2 Administration - 333 DB2 Application Development 92 444 - - 555 - - 666 - - 777 - - 888 - - 14 record(s) selected. NOTE | Care must be taken when joining the result of an outer join to subsequent tables. If the join predicate refers to columns from a null-supplied table, then the join predicate must also cater for the occurrence of nulls. Where possible, code the join predicate from the preserved row table, and if the preceding join was a full join, then utilize the COALESCE function to ensure that the join predicate cannot be null. | Joins Versus Denormalization One issue that needs to be examined when discussing joins is that of denormalization. Sometimes there is the perceived need to take tables that are always joined together and put them into a single table, in an effort to minimize the joins and improve performance. Since this is generally applied to tables in a one-to-many relationship, it results in multiple occurrences of the same data. This introduces update anomalies, the situation where a single piece of data occurs multiple times. When there is the attempt to update it, all occurrences must be found, and there is no fixed way to determine the number or location of the multiple occurrences. |