As I discussed in the opening to this chapter, it is rare, especially with SQL Server, to encase all of your columns in one table. Such a scenario certainly means denormalized data and poor performance; you would rarely escape serious data redundancy and duplication. When you have more than one table and you need to combine data from the collection into a result set, this is known as a join. For some reason, joins strike fear into the hearts of new converts to client/server databases and SQL Server. But we can throw this animal into the same fish stew that boiled down SELECT.
The definition of a join is as follows: “A join is the result set that is the product of two or more tables.” A join can be a simple referencing of two or more tables as illustrated, using a copy of the Northwind database (Orders cut to 20 rows), in the following statement:
The result set is as follows:
Customer | Order Numbers |
---|---|
Berglunds snabbköp | 10278 |
Berglunds snabbköp | 10280 |
Chop-suey Chinese | 10254 |
Frankenversand | 10267 |
Frankenversand | 10791 |
Gourmet Lanchonetes | 10790 |
GROSELLA-Restaurante | 10268 |
Hanari Carnes | 10250 |
Hanari Carnes | 10253 |
Lehmanns Marktstand | 10279 |
LILA-Supermercado | 10283 |
Richter Supermarkt | 10255 |
Romero y tomillo | 10281 |
Romero y tomillo | 10282 |
Suprêmes délices | 10252 |
Toms Spezialitäten | 10249 |
Victuailles en stock | 10251 |
Vins et alcools Chevalier | 10248 |
Wellington Importadora | 10256 |
Wolski Zajazd | 10792 |
The statement that produced the preceding result set is a legacy inner join statement, which is still supported in SQL Server 2005 (and I don’t see “Sequelists” parting with it for some time). The ANSI/ISO SQL-92 version supported in SQL Server 2005 looks like this:
SELECT Customers.CompanyName AS 'Customer', Orders. OrderID AS 'Order Numbers' FROM Customers, Orders WHERE Customers.CustomerID = Orders. CustomerID
The result set is identical, but notice the addition of the JOIN and ON keywords and the absence of the WHERE keyword used in the legacy code. Incidentally, if you look at the query plans for both queries, you will see that they are identical in every way The old method was more limiting because it trapped the “join” in a WHERE clause and produced ambiguities in complex operations.
The JOIN keyword specifies that the tables to the left and right of it should be joined. Additional tables can then be joined to the result set to produce an additional result set. The ON < search_condition > specifies the search condition on which the join is based. The condition can specify any predicate, although columns and comparison operators are often used as demonstrated earlier.
However, when the condition specifies columns, the columns do not have to have the same name or same data type. If the data types are not identical, they must be either compatible or types that SQL Server 2005 can implicitly convert. If the data types cannot be implicitly converted, the condition must explicitly convert the data type using the CAST or CONVERT functions.
Let’s look at the syntax that uses joins as a table source in the SELECT:
| joined_table = < table_source > < join_type > < table_source > ON < search_condition > | < table_source > CROSS JOIN < table_source > | < table_source >
In the preceding syntax, the <join_type> placeholder (SQL-92 compliant) can represent a simple join as described earlier or a more complex left, right, or full inner or outer join. This might become more clear to you if you study the table source syntax for joins as follows:
[ INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } ] [ < join_hint > ]
The INNER join is the default join type if the type is NOT specified in your query. The keyword INNER specifies to the query that all rows that have matching values specified in the search condition (Customers.CustomerID=Orders. CustomerID) are returned. All unmatched rows from both tables are ignored (which was our first example where the orders table contained only 20 rows).
The LEFT OUTER join specifies to the query that all rows from the table to the left of the join type declaration that do not meet the specified condition are included in the result set in addition to all rows returned by the inner join. Output columns from the left table are set to NULL. The LEFT [OUTER] of our first INNER example now looks like this after I abridged it from 95 to 20 rows:
SELECT Customers.CompanyName AS 'Customer', Orders. OrderID AS 'Order Numbers' FROM Customers LEFT OUTER JOIN Orders ON (Customers.CustomerID = Orders. CustomerID)
Customers | Order Numbers |
---|---|
Alfreds Futterkiste | NULL |
Ana Trujillo Emparedados y helados | NULL |
Antonio Moreno Taquería | NULL |
Around the Horn | NULL |
Berglunds snabbköp | 10278 |
Berglunds snabbköp | 10280 |
Blauer See Delikatessen | NULL |
Blondesddsl père et fils | NULL |
Bólido Comidas preparadas | NULL |
Bon app’ | NULL |
Bottom-Dollar Markets | NULL |
B’s Beverages | NULL |
Cactus Comidas para llevar | NULL |
Centro comercial Moctezuma | NULL |
Chop-suey Chinese | 10254 |
Comércio Mineiro | NULL |
Consolidated Holdings | NULL |
Drachenblut Delikatessen | NULL |
Du monde entire | NULL |
Eastern Connection | NULL |
If we now tag an ORDER BY on the end of the query, we can see the 20 records returned in the default INNER join shown earlier. The following table has been abridged from 95 records for paper conservation:
SELECT Customers.CompanyName AS 'Customer', Orders. OrderID AS 'Order Numbers' FROM Customers LEFT OUTER JOIN Orders ON (Customers.CustomerID = Orders .CustomerID) ORDER BY 2 DESC
Chapter 16: Working with Tabular Data 473
Customers | Order Numbers |
---|---|
Wolski Zajazd | 10792 |
Frankenversand | 10791 |
Gourmet Lanchonetes | 10790 |
LILA-Supermercado | 10283 |
Romero y tomillo | 10282 |
Romero y tomillo | 10281 |
Berglunds snabbköp | 10280 |
Lehmanns Marktstand | 10279 |
Berglunds snabbköp | 10278 |
GROSELLA-Restaurante | 10268 |
Frankenversand | 10267 |
Wellington Importadora | 10256 |
Richter Supermarkt | 10255 |
Chop-suey Chinese | 10254 |
Hanari Carnes | 10253 |
Suprêmes délices | 10252 |
Victuailles en stock | 10251 |
Hanari Carnes | 10250 |
Toms Spezialitäten | 10249 |
Vins et alcools Chevalier | 10248 |
Die Wandernde Kuh | NULL |
Wartian Herkku | NULL |
White Clover Markets | NULL |
Wilman Kala | NULL |
Tortuga Restaurante | NULL |
The RIGHT OUTER join specifies to the query that all rows from the table to the right of the join type declaration that do not meet the specified condition are included in the result set in addition to all rows returned by the inner join. Output columns from the right table are set to NULL. The RIGHT OUTER looks like this:
SELECT Customers.CompanyName AS 'Customer', Orders. OrderID AS 'Order Numbers' FROM Customers RIGHT OUTER JOIN Orders ON (Customers.CustomerID = Orders. CustomerID)
It returns the following result set, where all rows in the right table (consisting of 20 rows) were returned (in my example all 20 records met the search condition).
Customers | Order Numbers |
---|---|
Vins et alcools Chevalier | 10248 |
Toms Spezialitäten | 10249 |
Hanari Carnes | 10250 |
Victuailles en stock | 10251 |
Suprêmes délices | 10252 |
Hanari Carnes | 10253 |
Chop-suey Chinese | 10254 |
Richter Supermarkt | 10255 |
Wellington Importadora | 10256 |
Frankenversand | 10267 |
GROSELLA-Restaurante | 10268 |
Berglunds snabbköp | 10278 |
Lehmanns Marktstand | 10279 |
Berglunds snabbköp | 10280 |
Romero y tomillo | 10281 |
Romero y tomillo | 10282 |
LILA-Supermercado | 10283 |
Gourmet Lanchonetes | 10790 |
Frankenversand | 10791 |
Wolski Zajazd | 10792 |
Note | Your results may differ because cut the number of rows from my Northwind tables for illustration. |
The FULL OUTER join specifies to the query that, if a row from either the left or right table does not match the selection criteria, the row be included in the result set, and any output columns that correspond to the other table be set to NULL. This is in addition to all rows usually returned by the inner join. The statement for the FULL OUTER JOIN looks like this:
SELECT Customers.CompanyName AS 'Customer', Orders.OrderID AS 'Order Numbers' FROM Customers FULL OUTER JOIN Orders ON (Customers.CustomerID = Orders.CustomerID)
I am sure by now you can figure out what the result set looks like.
The CROSS JOIN specifies to the query to return the cross-product (the Cartesian product) of two tables. This join returns the same rows as if the tables to be joined were simply listed in the FROM clause and no WHERE clause was specified. For example, both of the following queries return a result set that is a cross-join of all the rows in Customers and Orders:
SELECT ColA, ColB FROM Customers, Orders SELECT ColA, ColB FROM Customers CROSS JOIN Orders
As you can tell, joining tables can be very useful for producing test data.
In the preceding join syntax, the placeholder < join_hint > specifies a join hint or execution algorithm to be followed by SQL Server. If the join hint is specified, INNER, LEFT, RIGHT, or FULL must also be explicitly specified.