Joins | Inside Microsoft SQL Server 7.0 (Mps)

You gain much more power when you join tables, which typically results in combining columns of matching rows to project and return a virtual table . Usually, joins are based on the primary and foreign keys of the tables involved, although the tables aren't required to explicitly declare keys. The pubs sample database contains a table of authors ( authors ) and a table of book titles ( titles ). An obvious query would be, "Show me the titles that each author has written, and sort the results alphabetically by author. I'm interested only in authors who live outside California." Neither the authors table nor the titles table alone has all this information. Furthermore, a many-to-many relationship exists between authors and titles; an author might have written several books, and a book might have been written by multiple authors. So an intermediate table, titleauthor , exists expressly to associate authors and titles, and this table is necessary to correctly join the information from authors and titles . To join these tables, you must include all three tables in the FROM clause of the SELECT statement, specifying that the columns that make up the keys have the same values. Figure 7-2 shows you how.

 SELECT  'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM authors AS A JOIN titleauthor AS TA      ON A.au_id=TA.au_id          -- JOIN CONDITION JOIN titles AS T     ON T.title_id=TA.title_id    -- JOIN CONDITION WHERE A.state <> 'CA' ORDER BY 1

Figure 7-2. Joining two tables via a third, intermediate table.

Here's the output:

 Author                      Title ------------------------    ---------------------------------- Blotchet-Halls, Reginald    Fifty Years in Buckingham Palace                              Kitchens DeFrance, Michel            The Gourmet Microwave del Castillo, Innes         Silicon Valley Gastronomic Treats Panteley, Sylvia            Onions, Leeks, and Garlic: Cooking                              Secrets of the Mediterranean Ringer, Albert              Is Anger the Enemy? Ringer, Albert              Life Without Fear Ringer, Anne                The Gourmet Microwave Ringer, Anne                Is Anger the Enemy?

Before discussing join operations further, let's study the example in Figure 7-2. The author's last and first names have been concatenated into one field. The RTRIM (right trim) function is used to strip off any trailing whitespace from the au_lname column. Then we add a comma and a space and concatenate on the au_fname column. This column is then aliased as simply Author and is returned to the calling application as a single column.

NOTE
The RTRIM function isn't needed for this example. Because the column is of type varchar , trailing blanks wouldn't be present. RTRIM is shown for illustration purposes only.

Another important point is that the ORDER BY 1 clause indicates that the results should be sorted by the first column in the result set. It's more typical to use the column name rather than its number, but using the column number provides a convenient shorthand when the column is derived and hence, isn't present in the base table or view (virtual table) being queried.

We could have specified the same expression of the column using ORDER BY RTRIM(au_lname) + ', ' + au_fname instead. SQL Server provides an extension to the ANSI standard that allows sorting by columns that aren't included in the select list. So even though we don't individually select the columns au_lname , au_fname , or state , we could nonetheless choose to order the query based on these columns by specifying columns ORDER BY au_lname , au_fname , state. We'll see this in the next example (Figure 7-3). Notice also that the query contains comments ( -- JOIN CONDITION ). A double hyphen (--) signifies that the rest of the line is a comment (similar to // in C++). You can also use the C-style /* comment block */ , which allows blocks of comment lines.

TIP
Comments can be nested, but you should generally try to avoid this. You can easily introduce a bug by not realizing that a comment is nested within another comment and misreading the code.

Now let's examine the join in Figure 7-2: The ON clauses specify how the tables relate and set the join criteria, stating that au_id in authors must equal au_id in titleauthor , and title_id in titles must equal title_id in titleauthor . This type of join is referred to as an equijoin , and it's the most common join operation. To remove ambiguity, you must qualify the columns. You can do this by specifying the columns in the form table.column , such as authors.au_id = title_author.au_id . The more compact and common way to do this, however, is by specifying a table alias in the FROM clause, as was done in this figure. By following the titles table with the word AS and the letter T, the titles table will be referred to as T from that point on. Typically, such an alias consists of one or two letters , although it can be much longer (following the same rules as identifiers).

After a table is aliased, it must be referred to by the alias, so now we can't refer to authors.au_id because the authors table has been aliased as A . We must use A.au_id . Note also that we refer to the state column of authors as A.state . The other two tables don't contain a state column, so qualifying it with the A. prefix isn't necessary; however, doing so makes the query more readable ”and less prone to subsequent bugs .

The join in Figure 7-2 is accomplished using the ANSI JOIN SQL syntax, introduced in SQL Server version 6.5. Many examples and applications still continue to use an old-style JOIN syntax, which is shown below. (The term "old-style JOIN" is actually used by the SQL-92 specification.) The ANSI JOIN syntax is based on ANSI SQL-92. The main differences between the two types of join formulations are:

The ANSI JOIN actually uses the keyword JOIN.
The ANSI JOIN segregates join conditions from search conditions.

The ANSI JOIN syntax specifies the JOIN conditions in the ON clauses (one for each pair of tables), and the search conditions are specified in the WHERE clause ”for example, WHERE state <> 'CA'. Although slightly more verbose, the explicit JOIN syntax is more readable. There's no difference in performance; behind the scenes, the operations are the same. Figure 7-3 shows how the query in Figure 7-2 can be equivalently respecified using the old-style JOIN syntax.

 SELECT  'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM authors A, titles T, titleauthor TA WHERE  A.au_id=TA.au_id AND T.title_id=TA.title_id  -- JOIN CONDITIONS AND A.state <> 'CA' ORDER BY 1

Figure 7-3. Explicit JOIN syntax that produces the same result yielded by Figure 7-2.

The query in Figure 7-3 produces the same output and the same execution plan as the query in Figure 7-2.

One of the most common errors new SQL users make when using the old-style JOIN syntax is not specifying the join condition. Omitting the WHERE clause is still a valid SQL request and causes an answer set to be returned. However, that result set is likely not what the user wanted. In Figure 7-3, omitting the WHERE clause would return the Cartesian product of the three tables: it would generate every possible combination of rows between them. Although in a few unusual cases you might want all permutations , usually this is just a user error. The number of rows returned is huge and typically doesn't represent anything meaningful. For example, the Cartesian product of the three small tables here ( authors , titles , and titleauthor each have less than 26 rows) generates 10,350 rows of (probably) meaningless output.

Using the ANSI JOIN syntax, it's impossible to accidentally return a Cartesian product of the tables ”one reason to use ANSI JOINs almost exclusively. The ANSI JOIN syntax requires an ON clause for specifying how the tables are related . In cases where you actually do want a Cartesian product, the ANSI JOIN syntax allows a CROSS JOIN operator to be used, which we'll see in more detail later in this chapter.

NOTE
In other examples throughout this book, we'll use the ANSI JOIN syntax almost exclusively. You might want to try translating some of them into queries using old-style JOIN syntax, because you should be able to recognize both forms. If you have to read SQL code written earlier than version 7, you're bound to come across queries using this older syntax.

The most common form of joins is an equijoin, sometimes also referred to as an inner join to differentiate it from an outer join, which we'll discuss shortly. Strictly speaking, an inner join isn't quite the same as an equijoin, which by definition means that the condition is based on equality; an inner join can use an operator such as less than (>) or greater than (<), although this is relatively unusual and esoteric. To make this distinction clear, you can use the INNER JOIN syntax in place of JOIN, as in this example:

 INNER JOIN titleauthor TA ON A.au_id=TA.au_id

Other than making the syntax more explicit, there's no difference in the semantics or the execution. By convention, the modifier INNER generally isn't used.

NOTE
ANSI SQL-92 also specifies the natural join operation, in which you don't have to specify the tables' column names. By specifying syntax such as FROM authors NATURAL JOIN titleauthor , the system automatically knows how to join the tables without your specifying the column names to match. SQL Server doesn't yet support this feature.
The ANSI specification calls for the natural join to be resolved based on identical column names between the tables. Perhaps a better way to do this would be based on a declared primary key_foreign key relationship, if it exists . Admittedly, declared key relationships have issues, too, because there's no restriction that only one such foreign key relationship is set up. Also, if the natural join were limited to only such relationships, all joins would have to be known in advance ”as in the old CODASYL days.

The FROM clause in Figure 7-3 shows an alternative way to specify a table alias, by omitting the word "AS". The use of the word "AS" preceding the table alias, as used in Figure 7-2, conforms to ANSI SQL-92. From SQL Server's standpoint, the two methods are equivalent (stating FROM authors AS A is identical to stating FROM authors A ). Commonly, the AS formulation is used when doing ANSI SQL-92 join operations, and the formulation that omits AS is used with the old-style join formulation. However, you can use either alias formulation ”it's strictly a matter of preference.

Outer Joins

Equijoins choose only rows from the respective tables that match the equality condition. In other words, if a row in the first table doesn't match any rows in the second table, that row won't be returned in the result set. In contrast, outer joins preserve some or all of the unmatched rows. To illustrate how easily subtle semantic errors can be introduced, let's refer back to the queries in Figures 7-2 and 7-3, in which we wanted to see the titles written by all authors not living in California. The result omitted two writers who do not, in fact, live in California. Was the query wrong? No! The query performed exactly as written ”we didn't specify that authors who currently have no titles in the database should be included. The results of the query were as we requested . In fact, the authors table has four rows that have no related row in the titleauthor table, which means these "authors" actually haven't written any books. Two of these authors don't contain the value CA in the state column, as the code below shows.

 au_id          Author                 state -----------    -------------------    ----- 341-22-1782    Smith, Meander         KS 527-72-3246    Greene, Morningstar    TN 724-08-9931    Stringer, Dirk         CA 893-72-1158    McBadden, Heather      CA

The titles table has one row for which there is no author in our database, as shown below. (Later in this chapter, you'll see the types of queries used to produce these two result sets.)

 title_id    title --------    ---------------------------------- MC3026      The Psychology of Computer Cooking

If the queries in Figures 7-2 and 7-3 were meant to be "Show me the titles that each author has written; include authors even if there's no title by that author currently in our database; if there's a title for which there is no author, show me that as well; sort the results alphabetically by the author; I'm interested only in authors who live outside of California," the query would use an outer join so that authors with no matching titles would be selected. Figure 7-4 shows this outer-join query.

 SELECT  'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM      (JOIN CONDITIONS     -- FIRST join authors and titleauthor         (authors AS A                 FULL OUTER JOIN titleauthor AS TA ON A.au_id=TA.au_id)     -- The result of the previous join is then joined to titles          FULL OUTER JOIN titles AS T ON TA.title_id=T.title_id) WHERE  state <> 'CA' OR state IS NULL ORDER BY 1

Figure 7-4. An outer-join query, which preserves rows with no matches.

Here's the output:

 Author                      Title ------------------------    ---------------------------------- NULL                        The Psychology of Computer Cooking Blotchet-Halls, Reginald    Fifty Years in Buckingham Palace                             Kitchens DeFrance, Michel            The Gourmet Microwave del Castillo, Innes         Silicon Valley Gastronomic Treats Greene, Morningstar         NULL Panteley, Sylvia            Onions, Leeks, and Garlic: Cooking                             Secrets of the Mediterranean Ringer, Albert              Is Anger the Enemy? Ringer, Albert              Life Without Fear Ringer, Anne                The Gourmet Microwave Ringer, Anne                Is Anger the Enemy? Smith, Meander              NULL

The query in Figure 7-4 demonstrates a full outer join . Rows in the authors and titles tables that don't have a corresponding entry in titleauthor are still presented, but with a NULL entry for the title or author column. (The data from the authors table is requested as a comma between the last names and the first names. Because we have the option CONCAT_NULL_YIELDS_NULL set to ON, the result is NULL for the author column. If we didn't have that option set to ON, SQL Server would have returned the single comma between the nonexistent last and first names.) A full outer join preserves nonmatching rows from both the lefthand and righthand tables. In Figure 7-4, the authors table is presented first, so it is the lefthand table when joining to titleauthor . The result of that join is the lefthand table when joining to titles .

ANSI Outer Joins
At the beginning of this chapter, we promised not to go into much detail about generic SQL operations. The outer join formulations here are standard ANSI SQL-92. However, this area is one in which there aren't enough good examples; using OUTER JOIN with more than two tables introduces some obscure issues. Few products provide support for full outer joins, and frankly, the current SQL Server documentation is thin in its discussion of outer joins. Because you must understand outer joins thoroughly if you are to use OUTER JOIN, we'll go into a bit more detail in this section.

You can generate missing rows from one or more of the tables by using either a left outer join or a right outer join . So if we want to preserve all authors and generate a row for all authors who have a missing title, but we don't want to preserve titles that have no author, we can reformulate the query using LEFT OUTER JOIN, as shown in Figure 7-5. This join preserves entries only on the lefthand side of the join. Note that the left outer join of the authors and titleauthor columns generates two such rows (for Greene and Smith). The result of the join in Figure 7-5 is the lefthand side of the join to titles ; therefore, the LEFT OUTER JOIN must be specified again to preserve these rows with no matching titles rows.

 FROM      ((authors as A         LEFT OUTER JOIN titleauthor AS TA ON A.au_id=TA.au_id)         LEFT OUTER JOIN titles AS T ON TA.title_id=T.title_id)

Figure 7-5. Using LEFT OUTER JOIN to preserve rows on only the lefthand side of the joins.

Here's the output:

 Author                        Title ------------------------    ---------------------------------- Blotchet-Halls, Reginald      Fifty Years in Buckingham Palace                               Kitchens DeFrance, Michel              The Gourmet Microwave del Castillo, Innes           Silicon Valley Gastronomic Treats Greene, Morningstar           NULL Panteley, Sylvia              Onions, Leeks, and Garlic: Cooking                               Secrets of the Mediterranean Ringer, Albert                Is Anger the Enemy? Ringer, Albert                Life Without Fear Ringer, Anne                  The Gourmet Microwave Ringer, Anne                  Is Anger the Enemy? Smith, Meander                NULL

The query in Figure 7-5 produces the same rows as the full outer join, except for the row for The Psychology of Computer Cooking. Because only LEFT OUTER JOIN was specified, there was no request to preserve titles (righthand) rows that have no matching rows in the result of the join of authors and titleauthor .

You must use care with OUTER JOIN operations, because the order in which tables are joined affects which rows are preserved and which aren't. In an equijoin, the associative property holds (if A equals B, then B equals A) and no difference results, whether something is on the left or the right side of the equation, and no difference results in the order in which joins are specified. This is definitely not the case for OUTER JOIN operations. For example, consider the following two queries and their results:

QUERY 1

 SELECT  'Author'= RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM (titleauthor AS TA  RIGHT OUTER JOIN authors AS A ON (A.au_id=TA.au_id)) FULL OUTER JOIN titles AS T ON (TA.title_id=T.title_id) WHERE  A.state <> 'CA' or A.state is NULL ORDER BY 1 Author                      Title ------------------------    ---------------------------------- NULL                        The Psychology of Computer Cooking Blotchet-Halls, Reginald    Fifty Years in Buckingham Palace                             Kitchens DeFrance, Michel            The Gourmet Microwave del Castillo, Innes         Silicon Valley Gastronomic Treats Greene, Morningstar         NULL Panteley, Sylvia            Onions, Leeks, and Garlic: Cooking                             Secrets of the Mediterranean Ringer, Albert              Is Anger the Enemy? Ringer, Albert              Life Without Fear Ringer, Anne                The Gourmet Microwave Ringer, Anne                Is Anger the Enemy? Smith, Meander              NULL

This query produces results semantically equivalent to the previous FULL OUTER JOIN formulation in Figure 7-4, although we've switched the order of the authors and titleauthor tables. This query and the previous one preserve both authors with no matching titles and titles with no matching authors. This might not be obvious, because RIGHT OUTER JOIN is clearly different than FULL OUTER JOIN. However, in this case we know it's true because FOREIGN KEY and NOT NULL constraints exist on the titleauthor table to ensure that there can never be a row in the titleauthor table that doesn't match a row in the authors table. So we can be confident that the titleauthor RIGHT OUTER JOIN to authors can't produce any fewer rows than would a FULL OUTER JOIN.

But if we modify the query ever so slightly by changing the join order again, look what happens:

QUERY 2

 SELECT  'Author'=rtrim(au_lname) + ', ' + au_fname, 'Title'=title FROM (titleauthor AS TA  FULL OUTER JOIN titles AS T ON TA.title_id=T.title_id) RIGHT OUTER JOIN authors AS A ON A.au_id=TA.au_id WHERE  A.state <> 'CA' or A.state is NULL ORDER BY 1 Author                      Title ------------------------    ---------------------------------- Blotchet-Halls, Reginald    Fifty Years in Buckingham Palace                             Kitchens DeFrance, Michel            The Gourmet Microwave del Castillo, Innes         Silicon Valley Gastronomic Treats Greene, Morningstar         NULL Panteley, Sylvia            Onions, Leeks, and Garlic: Cooking                             Secrets of the Mediterranean Ringer, Albert              Is Anger the Enemy? Ringer, Albert              Life Without Fear Ringer, Anne                The Gourmet Microwave Ringer, Anne                Is Anger the Enemy? Smith, Meander              NULL

At a glance, Query 2 looks equivalent to Query 1, although the join order is slightly different. But notice how the results differ . This query didn't achieve the goal of preserving the titles rows without corresponding authors , and the row for The Psychology of Computer Cooking is again excluded. This row would have been preserved in the first join operation:

 FULL OUTER JOIN titles AS T ON TA.title_id=T.title_id

But then this row is discarded, because the second join operation,

 RIGHT OUTER JOIN authors AS A ON A.au_id=TA.au_id

preserves only authors without matching titles. Because the title row for The Psychology of Computer Cooking is on the lefthand side of this join operation and only a RIGHT OUTER JOIN operation is specified, this title is discarded.

NOTE
Just as INNER is an optional modifier, so is OUTER. Hence, LEFT OUTER JOIN can be equivalently specified as LEFT JOIN, and FULL OUTER JOIN can be equivalently expressed as FULL JOIN. However, although INNER is seldom used by convention and is usually only implied , OUTER is almost always used by convention when specifying any type of outer join.

Because join order matters, we urge you to use parentheses and indentation carefully when specifying OUTER JOIN operations. Indentation, of course, is always optional, and use of parentheses is often optional. But as this example shows, it's easy to make mistakes that result in your queries returning the wrong answers. As is true with almost all programming, simply getting into the habit of using comments, parentheses, and indentation often results in such bugs being noticed and fixed by a developer or database administrator before they make their way into your applications.

Who's Writing the Queries?
Notice that the terms "developer" or "database administrator" are used above, not "end user." SQL is often regarded as a language suitable for end users as well as database professionals. To an extent, such as with straightforward single table queries, this is true. But if you expect end users to understand the semantics of outer joins or to deal correctly with NULL, you might as well just give them a random data generator. These concepts are tricky, and it's your job to insulate your users from them. Later in this chapter, you'll see how you can use views to help accomplish this.

The Obsolete *= OUTER JOIN Operator

Prior to version 6.5, SQL Server had limited outer-join support in the form of the special operators *= and =*. Many people assume that the LEFT OUTER JOIN syntax is simply a synonym for *=, but this isn't the case. LEFT OUTER JOIN is semantically different from and superior to *=.

For equijoins, the associative property holds, so the issues with old-style JOIN syntax don't exist. You can use either new-style or old-style syntax with equijoins. For outer joins, you should consider the *= operator obsolete and move to the OUTER JOIN syntax as quickly as possible ”the *= operator might be dropped entirely in future releases of SQL Server.

ANSI's OUTER JOIN syntax, which was adopted in SQL Server version 6.5, recognizes that for outer joins, the conditions of the join must be evaluated separately from the criteria applied to the rows that are joined. ANSI gets it right by separating the JOIN criteria from the WHERE criteria. The old SQL Server *= and =* operators are prone to ambiguities , especially when three or more tables, views, or subqueries are involved. Often the results aren't what you'd expect, even though you might be able to explain them. But sometimes you simply can't get the result you want. These aren't implementation bugs; more accurately, these are inherent limitations in trying to apply the outer-join criteria in the WHERE clause.

When *= was introduced, no ANSI specification existed for OUTER JOIN, or even for INNER JOIN. Just the old-style join existed, with operators like =* in the WHERE clause. So the designers quite reasonably tried to fit an outer-join operator into the WHERE clause, which was the only place that joins were ever stated. Other products, notably Oracle with its plus (+) syntax, also followed this approach; consequently, these products suffer from similar ambiguities and limitations. However, efforts to resolve this situation helped spur the specification for proper OUTER JOIN syntax and semantics within the ANSI SQL committee. Using outer joins correctly is difficult, and SQL Server is one of the few mainstream products that has done so.

To illustrate the semantic differences and problems with the old *= syntax, we'll walk through a series of examples using both new and old outer-join syntax. The following is essentially the same outer-join query shown in Figure 7-4, but this one returns only a count. It correctly finds the 11 rows, preserving both authors with no titles and titles with no authors.

 New style OUTER JOIN correctly finds 11 rows SELECT COUNT(*) FROM      (-- FIRST join authors and titleauthor         (authors AS A         LEFT OUTER JOIN titleauthor AS TA ON A.au_id=TA.au_id)     -- The result of the previous join is then joined to titles          FULL OUTER JOIN titles AS T ON TA.title_id=T.title_id) WHERE state <> 'CA' OR state IS NULL ORDER BY 1

There's really no way to write this query ”which does a full outer join ”using the old syntax, because the old syntax simply isn't expressive enough. Here's what looks to be a reasonable try ”but it generates several rows that you wouldn't expect.

 Attempt with old-style join. Really no way to do FULL OUTER  JOIN. This query finds 144 rows_-WRONG!! SELECT COUNT(*) FROM      authors A, titleauthor TA, titles T WHERE  A.au_id *= TA.au_id AND TA.title_id =* T.title_id AND (state <> 'CA' OR state IS NULL) ORDER BY 1

Now let's examine some issues with the old-style outer join using a simple example of only two tables, customers and orders :

 CREATE TABLE Customers (Cust_ID      int  PRIMARY KEY, Cust_Name    char(20)) CREATE TABLE Orders (OrderID    int    PRIMARY KEY, Cust_ID    int    REFERENCES Customers(Cust_ID)) GO INSERT Customers VALUES (1, 'Cust 1') INSERT Customers VALUES (2, 'Cust 2') INSERT Customers VALUES (3, 'Cust 3') INSERT Orders VALUES (10001, 1) INSERT Orders VALUES (20001, 2) GO

At a glance, in the simplest case, the new-style and old-style syntax appear to work the same. Here's the new syntax:

 SELECT  'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name,     'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders     ON Customers.Cust_ID=Orders.Cust_ID

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 2                        Cust 2           2 3                        Cust 3           NULL

And here's the old-style syntax:

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name,     'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders WHERE Customers.Cust_ID *= Orders.Cust_ID

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 2                        Cust 2           2 3                        Cust 3           NULL

But as soon as you begin to add restrictions, things get tricky. What if you want to filter out Cust 2 ? With the new syntax it's easy, but remember not to filter out the row with NULL that the outer join just preserved!

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders     ON Customers.Cust_ID=Orders.Cust_ID  WHERE Orders.Cust_ID <> 2 OR Orders.Cust_ID IS NULL

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 3                        Cust 3           NULL

Now try to do this query using the old-style syntax and filter out Cust 2 :

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders  WHERE Customers.Cust_ID *= Orders.Cust_ID AND (Orders.Cust_ID <> 2 OR Orders.Cust_ID IS NULL)

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 2                        Cust 2           NULL 3                        Cust 3           NULL

Notice that this time, we don't get rid of Cust 2 . The check for NULL occurs before the JOIN, so the outer-join operation puts Cust 2 back. This result might be less than intuitive, but at least we can explain and defend it. That's not always the case, as you'll see in a moment.

If you look at the preceding query, you might think that we should have filtered out Customers.Cust_ID rather than Orders.Cust_ID . How did we miss that? Surely this query will fix the problem:

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name, 'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders  WHERE Customers.Cust_ID *= Orders.Cust_ID AND (Customers.Cust_ID <> 2 OR Orders.Cust_ID IS NULL)

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 2                        Cust 2           NULL 3                        Cust 3           NULL

Oops! Same result. The problem here is that Orders.Cust_ID IS NULL is now being applied after the outer join, so the row is presented again. If we're careful and understand exactly how the old outer join is processed , we can get the results we want with the old-style syntax for this query. We need to understand that the OR Orders.Cust_ID IS NULL puts back Cust_ID 2 , so just take that out. The code appears below.

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name,     'Orders.Cust_ID'=Orders.Cust_ID FROM Customers, Orders  WHERE Customers.Cust_ID *= Orders.Cust_ID AND Customers.Cust_ID <> 2

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 3                        Cust 3           NULL

Finally! This result is what we want. And if we really think about it, the semantics are even understandable (although different from the new style). So maybe this is much ado about nothing; all we have to do is understand how it works, right? Wrong. Besides the issues of joins with more than two tables and the lack of a full outer join, we also can't effectively deal with subqueries and views (virtual tables). For example, let's try creating a view with the old-style outer join:

 CREATE VIEW Cust_old_OJ AS (SELECT Orders.Cust_ID, Customers.Cust_Name FROM Customers, Orders  WHERE Customers.Cust_ID *= Orders.Cust_ID)

A simple select from the view looks fine:

 SELECT * FROM Cust_old_OJ

And it gives this output:

 Cust_ID        Cust_Name -------        --------- 1              Cust 1 2              Cust 2 NULL           Cust 3

But restricting from this view doesn't seem to make sense:

 SELECT * FROM Cust_old_OJ WHERE Cust_ID <> 2     AND Cust_ID IS NOT NULL

The output shows NULLs in the Cust_ID column, even though we tried to filter them out:

 Cust_ID        Cust_Name -------        --------- 1              Cust 1 NULL           Cust 2 NULL           Cust 3

If we expand the view to the full select and we realize that Cust_ID is Orders.Cust_ID , not Customers.Cust_ID , perhaps we can understand why this happened . But we still can't filter out those rows!

In contrast, if we create the view with the new syntax and correct semantics, it works exactly as expected:

 CREATE VIEW Cust_new_OJ AS (SELECT Orders.Cust_ID, Customers.Cust_Name FROM Customers LEFT JOIN Orders      ON Customers.Cust_ID=Orders.Cust_ID) GO SELECT * FROM Cust_new_OJ WHERE Cust_ID <> 2 AND Cust_ID IS NOT NULL

Here's what was expected:

 Cust_ID        Cust_Name -------        --------- 1              Cust 1

In these examples, the new syntax performed the outer join and then applied the restrictions in the WHERE clause to the result. In contrast, the old style applied the WHERE clause to the tables being joined and then performed the outer join, which can reintroduce NULL rows. This is why the results often seemed bizarre. However, if that behavior is what you want, you could apply the criteria in the JOIN clause instead of in the WHERE clause.

The following example uses the new syntax to mimic the old behavior. The WHERE clause is shown here simply as a placeholder to make it clear that the statement Cust_ID <> 2 is in the JOIN section, not in the WHERE section.

 SELECT 'Customers.Cust_ID'=Customers.Cust_ID, Customers.Cust_Name,     'Orders.Cust_ID'=Orders.Cust_ID FROM Customers LEFT JOIN Orders     ON Customers.Cust_ID=Orders.Cust_ID      AND Orders.Cust_ID <> 2 WHERE 1=1

Here's the output:

 Customers.Cust_ID        Cust_Name        Orders.Cust_ID -----------------        ---------        -------------- 1                        Cust 1           1 2                        Cust 2           NULL 3                        Cust 3           NULL

As you can see, the row for Cust 2 was filtered out from the Orders table before the join, but because it was NULL, it was reintroduced by the OUTER JOIN operation.

With the improvements in outer-join support, you can now use outer joins where you couldn't previously. A bit later, you'll see how to use an outer join instead of a correlated subquery in a common type of query.

Cross Joins

In addition to INNER JOIN, OUTER JOIN, and FULL JOIN, the ANSI JOIN syntax allows a CROSS JOIN. Earlier, you saw that the advantage of using the ANSI JOIN syntax was that you wouldn't accidentally create a Cartesian product. However, in some cases, this behavior might be exactly what you want. SQL Server allows you to specify a CROSS JOIN with no ON clause to produce a Cartesian product. So when would you really want to get a Cartesian product?

One possible use for CROSS JOINs is to generate sample or test data. For example, to generate 10,000 names for a sample employees table, you don't have to come up with 10,000 individual INSERT statements. All you need to do is build a first_names table and a last_names table with 26 names each (perhaps one starting with each letter of the English alphabet), and a middle_initials table with the 26 letters. When these three small tables are joined using the CROSS JOIN operator, the result will be well over 10,000 unique names to insert into the employee table. The SELECT statement used to generate these names would look like this:

 SELECT first_name, middle_initial, last_name FROM first_names CROSS JOIN middle_initials      CROSS JOIN last_names

To summarize the five types of ANSI JOIN operations, consider two tables, TableA and TableB :

INNER JOIN (default)

 TableA INNER JOIN TableB ON join_condition

The INNER JOIN returns rows from either table only if they have a corresponding row in the other table. In other words, the INNER JOIN disregards any rows in which the specific join condition, as specified in the ON clause, isn't met.

LEFT OUTER JOIN

 TableA LEFT OUTER JOIN TableB ON join_condition

The LEFT OUTER JOIN returns all rows for a connection that exists between TableA and TableB ; in addition, it returns all rows from TableA for which no corresponding row exists in TableB . In other words, it preserves unmatched rows from TableA . TableA is sometimes called the preserved table. When returning the unmatched rows from TableA , any columns selected from TableB are returned as NULL.

RIGHT OUTER JOIN

 TableA RIGHT OUTER JOIN TableB ON join_condition

The RIGHT OUTER JOIN returns all rows for which a connection exists between TableA and TableB ; in addition, it returns all rows from TableB for which no corresponding row exists in TableA . In other words, it preserves unmatched rows from TableB , and in this case, TableB is the preserved table. When returning the unmatched rows from TableB , any columns selected from TableA are returned as NULL.

FULL OUTER JOIN

 TableA FULL OUTER JOIN TableB ON join_condition

The FULL OUTER JOIN returns all rows for which a connection exists between TableA and TableB . In addition, it returns all rows from TableA for which no corresponding row exists in TableB , with any values selected from TableB returned as NULL. In addition, it returns all rows from TableB for which no corresponding row exists in TableA , with any values selected from TableA returned as NULL. In other words, FULL OUTER JOIN acts as a combination of LEFT OUTER JOIN and RIGHT OUTER JOIN.

CROSS JOIN

 TableA CROSS JOIN TableB

The CROSS JOIN returns all rows from TableA combined with all rows from TableB . No ON clause exists to indicate any connecting column between the tables. A CROSS JOIN returns a Cartesian product of the two tables.