Working with SQL JOIN Statements and Other Multiple-table Queries

Working with Tables from Multiple MS SQL Server Databases

Although the example SELECT statements in all of the previous Tips reference tables in single database (SQLTips), MS-SQL Server lets you execute SQL statements that work with tables from multiple databases.

If you refer to a table using only its name, the DBMS assumes that you want to use a table you own in the current database. For example, if you log in to the SQLTips database as user-name FRANK and execute the query

 SELECT * FROM customers

the DBMS will show you the contents of the CUSTOMERS table if CUSTOMERS exists in the SQLTips database and is owned by FRANK.

If you want to work with a table that exists in the current database but is owned by a different username, your table reference must include the name of the table's owner using the syntax:

 .

As such, if you log in to the SQLTips database as username FRANK and want to query the CUSTOMERS table owned by MARY, for example, you would execute a SELECT statement similar to:

 SELECT * FROM mary.customers

To work with a table in another database, the table reference must use the syntax

..

to specify the name of the database in which the table is located, the username of the table's owner, and the name of the table.

For example, if you are logged into the SQLTips database and want to query the AUTHORS table, owned by DBO in the PUBS database, you would execute a SELECT statement similar to:

 SELECT * FROM pubs.dbo.authors

As mentioned at the beginning of this tip, you can use tables from multiple databases in the same SQL statement—simply use the syntax necessary to tell the DBMS where it can find the table you want to use. For example, if you want to cross-reference the EMPLOYEES table in the current database ( SQLTips) with the AUTHORS table in the PUBS database (to see if you have any published authors working for you), execute a SELECT statement similar to:

 SELECT emp_ID, au_lname, au_fname
 FROM employees, pubs.dbo.authors
 WHERE employees.SSAN = pubs.dbo.authors.au_ID

Because the query omits the database name and owner name in its reference to the CUSTOMERS table, the DBMS knows the table must exist in the current ( SQLTips) database. Conversely, because the query's reference to the AUTHORS table includes both the database name and the owner's name, the DBMS knows to work with the AUTHORS table owned by DBO in the PUBS database.

  Note 

The DBMS will, of course, check its system tables to ensure that your username has the access rights necessary to execute the SQL statement you submit for processing. If you do not, the DBMS will abort execution of the statement with an error message similar to:

Server: Msg 229, Level 14, State 5, Line 1
SELECT permission denied on object 'authors', database
 'pubs', owner 'dbo'.

In short, DBMS will not allow you to circumvent system security by using a fully qualified table name (..

).

Understanding Cartesian Products

The Cartesian product of two tables is a (third) table that contains all of the possible pairs of rows from the two source tables. Each row in the table produced by a Cartesian product consists of the columns from the first table followed by columns from the second table.

Understanding Cartesian products is important because whenever you execute an SQL query that has multiple tables in its FROM clause, the DBMS creates the Cartesian product of the source tables as an interim virtual table. The system then uses the interim table as the single source table for the column data values referenced by column name in the query's expressions.

Suppose, for example, that you have a CUSTOMERS table and an INVOICES table with the following data:

CUSTOMERS table INVOICES table
==================== ===================================
cust_ID cust_name inv_date inv_no cust_ID inv_total
------- ------------ ---------- ------ ------- ---------
101 Customer 101 01/02/2000 1 101 15874
202 Customer 202 01/05/2000 2 202 6859
205 Customer 205 03/05/2000 3 101 20225
 09/05/2000 4 101 30228
 09/27/2000 5 202 7400

If you execute the SELECT statement

 SELECT * FROM customers, invoices

' cust_ID cust_name inv_date inv_no cust_ID inv_total
 ------- ------------ ---------- ------ ------- ---------
 101 Customer 101 01/02/2000 1 101 15874.0000
 101 Customer 101 01/05/2000 2 202 6859.0000
 101 Customer 101 03/05/2000 3 101 20225.0000
 101 Customer 101 09/05/2000 4 101 30228.0000
 101 Customer 101 09/27/2000 5 202 7400.0000
 202 Customer 202 01/02/2000 1 101 15874.0000
 202 Customer 202 01/05/2000 2 202 6859.0000
 202 Customer 202 03/05/2000 3 101 20225.0000
 202 Customer 202 09/05/2000 4 101 30228.0000
 202 Customer 202 09/27/2000 5 202 7400.0000
 205 Customer 205 01/02/2000 1 101 15874.0000
 205 Customer 205 01/05/2000 2 202 6859.0000
 205 Customer 205 03/05/2000 3 101 20225.0000
 205 Customer 205 09/05/2000 4 101 30228.0000
 205 Customer 205 09/27/2000 5 202 7400.0000

Because the SELECT statement in the current example has no WHERE clause, its final results table includes all of the rows produced by the Cartesian product of the CUSTOMERS table and the INVOICES table in the query's FROM clause. (The "display all column values," [that is, the asterisk "*"] within the query's select clause tells the DBMS to include all the Cartesian product's columns in the results table as well.)

When you review the contents of the results table in the current example, you will see that computing the Cartesian product of the tables in a query's FROM clause produces a lot of unwanted rows. For example, customer 205 has no invoices in the INVOICES table, yet the results table has five invoice detail lines for customer 205. Similarly, customer 202 made two purchases (INV_NO 2 and INV_NO 5), yet the results table has detail lines showing that customer 202 is also responsible for INV_NO 1 and 4.

The Cartesian product of the tables in a SELECT statement's FROM clause is normally used as an interim table (albeit virtual) and is almost never intended as a query's final results table. Real-world multi-table SELECT statements almost always include a WHERE clause with search criteria that filter out the unwanted (nonsense) rows in the interim (Cartesian product) table. (You will learn all about using WHERE clause filters in multi-table queries in Tips 284–287.)

  Note 

Although the query in the current example shows the Cartesian product from a SELECT statement with two tables in its FROM clause, the DBMS will also generate a Cartesian product when executing a query involving three (or more) tables. For example, if you submit the SELECT statement

 SELECT * FROM table_a, table_b, table_c

the DBMS will create the Cartesian product of TABLE_A and TABLE_B as a product table (TABLE_AB). It then creates the Cartesian product of the product table (TABLE_AB) and TABLE_C as a product table (TABLE_ABC). (If there were a fourth table (TABLE_D), the DBMS would create the Cartesian product of TABLE_ABC and TABLE_D to produce a product table [TABLE_ABCD], and so on.) As is the case with a two-table query, the DBMS uses the final Cartesian product (such as TABLE_ABC, in a three-table query) as the source table for the column data values referenced by column name in the query's expressions.

Using the FROM Clause to Perform a Multi table Query

A FROM clause, as its name implies, lists the tables from which the DBMS is to get the data values to use in a query. For example, a query with a single table in its FROM clause, such as

 SELECT * FROM customers

is easy to understand—retrieve each row from the CUSTOMERS table and supply its column values to the "display all column values" (the asterisk (*)) expression in the query's SELECT clause.

If a query needs data from columns in multiple tables, the FROM clause (which always lists all of the sources of a query's data), will have more than one table. However, whether the FROM clause lists a single table or several tables, the various clauses in a SELECT statement act on data one row at a time. Therefore, if a FROM clause has multiple tables, the DBMS will compute the Cartesian product of the tables listed in the FROM clause to create a single table from which it can draw data one row at a time.

As you learned in Tip 281, "Understanding Cartesian Products," the interim (Cartesian product) table consists of all of the possible combinations of rows in the FROM clause tables. Each row is made up of all of the columns from all of the tables listed in the FROM clause.

For example, suppose that you have a STUDENTS table with 12,000 rows, each of which has 15 columns (attributes), and a CLASSES table with 1,000 rows with 5 columns per row. When you execute the query

 SELECT * FROM students, classes

the DBMS creates a virtual table with 12,000,000 rows (12,000 × 1,000) and 20 columns per row (15 from STUDENTS plus 5 from CLASSES). If you add a third table, TEACHERS, which has 1,000 rows of 10 columns each, the DBMS will combine the three tables in the FROM clause of the query

SELECT * FROM students, classes, teachers

to produce an interim virtual table with 12,000,000,000 rows (12,000 × 1,000 × 1,000) and 30 columns (15 + 5 + 10) per row. As you can see, the Cartesian product of the tables listed in the FROM clause of a multi-table SELECT can become very large.

Of course, no DBMS product actually creates a physical table from the Cartesian product of the tables listed a query's FROM clause. However, multi-table queries involving large tables will sometimes take hours to run because the DBMS creates and stores in memory the portions of the virtual interim table that it needs to process data from multiple tables one composite row at a time.

The important thing to understand is that a multi-table query actually works with a single (albeit large) virtual table that the DBMS creates by taking the Cartesian product of the tables listed in the SELECT statement's FROM clause. This concept is important because it helps to explain why a multi-table query needs a WHERE clause to filter out unwanted rows.

Understanding Joins and Multi table Queries

In Tip 281, "Understanding Cartesian Products," and Tip 282, "Using the FROM Clause to Perform a Multi-table Query," you learned that the DBMS joins multiple tables listed in a SELECT statement's FROM clause by generating the Cartesian product of the tables.

SQL-92 refers to a Cartesian product as a CROSS JOIN. As such, the SELECT statement

 SELECT * FROM students, classes

is equivalent to:

SELECT * FROM students CROSS JOIN classes

Each of the two example queries joins two tables (STUDENTS and CLASSES) to produce a third table that contains all of the possible pairs or rows from each of the original tables. As you learned in Tip 281, a CROSS JOIN (or Cartesian product) is seldom (if ever) the desired result when you pose a query-there are simply too many unwanted (and perhaps nonsensical) rows. Therefore, multi-table queries almost always include a WHERE clause to filter out rows whose column values do not represent the attributes of valid physical objects or concepts-such as the invoice detail rows for CUST_ID 205 in the interim source table of the example in Tip 281.

When executing a multi-table query, you normally want the DBMS to perform a natural join (or equi-join) instead of a cross join of the source tables. In an equi-join the DBMS filters out interim (joint) table rows that do not have matching values in one (or more) columns common to both tables. You learned how to set up a shared column relationship between tables when you read about the FOREIGN KEY constraint in Tip 173, "Understanding Foreign Keys."

In short, a FOREIGN KEY lets you set up a parent/child relationship between tables by duplicating the PRIMARY KEY (see Tip 171, "Understanding Primary Keys") column value from the parent table in a FOREIGN KEY (see Tip 173) column of the child table. Then you can use the syntax

 SELECT { | *}
 FROM ,
WHERE =

to write a two-table query that joins the row with the attributes (column values) from

that partially describe an object or concept to the row(s) in
with additional attributes (column values) pertaining to the same object or concept.

You will learn more about the various SQL joins (CROSS JOIN, NATURAL JOIN, INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and UNION JOIN) in Tips 296-309. For now, the important thing to understand is that a join is used by a multi-table SELECT statement to combine related rows of values from different tables into a single virtual table. The DBMS then works with the joined rows in a single interim virtual table to produce the final query results table.

Using a WHERE Clause to Join Two Tables Related by a Single Column PRIMARY KEY FOREIGN KEY Pair

As you learned in Tip 283, "Understanding Joins and Multi-table Queries," when executing a multi-table query, the DBMS joins the rows in one table with the rows in another table to create an interim virtual table that has all of the data values from both of the source tables. Then the system uses the SELECT statement's clauses to filter and display the data values in the interim (joined) table.

The simplest form of a multi-table query is an equi-join (or natural join) based on a parent/child relationship between pairs of rows in two tables. The DBMS joins rows in the parent table with rows in the child table by matching PRIMARY KEY column values in the parent table with FOREIGN KEY column values in the child table. Figure 284.1 shows the relationship between a CUSTOMERS (parent) table and an INVOICES (child) table as defined by the values in the parent table's PRIMARY KEY column and the child table's FOREIGN KEY column.

click to expand Figure 284.1: CUSTOMERS (parent) table and INVOICES (child) table related by common columns-ID in CUSTOMERS and INV_NUM in INVOICES

To display a list of invoices and the name of the customer that placed each order, the DBMS must join each child row in the INVOICES table to its parent row in the CUSTOMERS table. For example, a query similar to

 SELECT f_name, l_name, inv_num, inv_date,
 (inv_total - amt_paid) AS 'Balance Due'
 FROM invoices, customers
 WHERE CID = ID

will perform an equi-join of the CUSTOMERS and INVOICES table. The query's FROM clause tells the DBMS to create an interim virtual table by joining the rows in the INVOICES table with the rows in the CUSTOMERS table. Next, the system uses the search criteria in the WHERE clause to filter out unwanted interim table rows. In an equi-join involving a parent and a child table, the DBMS passes to the SELECT clause only those interim (joined) table rows in which the FOREIGN KEY column matches the value of the PRIMARY KEY column. The equi-join query in the current example will produce a results table similar to:

 f_name l_name inv_num inv_date Balance Due
 ------ -------- ------- ----------------------- -----------
 Walter Winchell 2001 2000-01-01 00:00:00.000 .0000
 Konrad King 3010 2000-07-01 00:00:00.000 3200.0000
 Walter Winchell 2730 2000-05-01 00:00:00.000 23750.0000
 Walter Winchell 9050 2000-09-29 00:00:00.000 19300.0000

SQL does not require that the SELECT clause include columns referenced in WHERE clause expressions. In fact, key columns (used to set up parent/child relationships between tables) are often ID numbers. Although numeric key values make it easy for computers to uniquely identify and pair related rows, they are of little value in a results table. After all, a person (vs. a machine) using query results is more likely to know and refer to a customer, employee, inventory item, and so on by name or description rather than by number.

Using a WHERE Clause to Join Two Tables Related by a Composite PRIMARY KEY FOREIGN KEY Pair

In Tip 174, "Understanding Referential Integrity Checks and Foreign Keys," you learned that the DBMS lets you INSERT a row in a child table only if the row's FOREIGN KEY column value matches the value in the PRIMARY KEY column of an existing row in a parent table. However, as you learned in Tip 284, "Using a WHERE Clause to Join Two Tables Related by a Single-Column PRIMARY KEY/FOREIGN KEY Pair," the system does not automatically use the matching PRIMARY KEY/FOREIGN KEY column value requirement to filter out unwanted rows in a query involving a parent/child table pair.

To exploit the parent/child relationship between two tables, you must perform an equi-join query. The search condition in the SELECT statement's WHERE clause tells the DBMS you want to work with only joined rows in which the PRIMARY KEY column value from the row in the parent table matches the FOREIGN KEY column value in the row from the child table.

For example, if you have an INVENTORY (child) table with a single-column FOREIGN KEY and an ITEM_MASTER (parent) table with a single-column PRIMARY KEY, you can use a SELECT statement similar to

 SELECT inventory.item_number, description, qty_on_hand
 FROM inventory, item_master
 WHERE inventory.item_number = item_master.item_number
 ORDER BY description

to display an item's description (from the ITEM_MASTER table) along with the quantity of the item you have in inventory (from the INVENTORY table) as a single, joined row in a results table.

As you learned in Tip 168, "Understanding Single-Column and Composite Keys," you sometimes have composite key values-tables in which the PRIMARY KEY (or FOREIGN KEY) value consists of multiple-column values. For example, suppose you purchase items with the same item number from different vendors and want to find out the quantity of each vendor's products you have on hand. If you use a composite PRIMARY KEY consisting of (ITEM_NUMBER,VENDOR_CODE), the preceding query with a single search criterion will produce an incorrect results table similar to

 item_number description qty_on_hand
 ----------- -------------------- -----------
 1 Item 1 from Vendor 1 111
 2 Item 2 from Vendor 1 222
 2 Item 2 from Vendor 1 111
 2 Item 2 from Vendor 1 111
 2 Item 2 from Vendor 2 222
 2 Item 2 from Vendor 2 111
 2 Item 2 from Vendor 2 111
 3 Item 3 from Vendor 1 333
 3 Item 3 from Vendor 3 333

when you actually have in inventory:

 item_number description qty_on_hand
 ----------- -------------------- -----------
 1 Item 1 from Vendor 1 111
 2 Item 2 from Vendor 1 111
 3 Item 3 from Vendor 1 111
 2 Item 2 from Vendor 2 222
 3 Item 3 from Vendor 3 333

To perform an equi-join query using the parent/child relationships defined by matching composite (multi-column) FOREIGN KEY/PRIMARY KEY values in related tables, the SELECT statement's WHERE clause must have a search condition that matches each pair of columns that make up the composite key. For example, if you have two tables related by a composite key such as (ITEM_NUMBER,VENDOR_CODE), execute a query similar to

 SELECT inventory.item_number, description, qty_on_hand
 FROM inventory, item_master
 WHERE (inventory.item_number = item_master.item_number)
 AND (inventory.vendor_code = item_master.vendor_code)
 ORDER BY description

in which the WHERE clause search condition tells the DBMS to filter out all joined rows except those in which both pairs of columns that make up the composite FOREIGN KEY and PRIMARY KEY values have matching values. (If the composite keys consisted of three columns each, the WHERE clause would have three equality expressions [one for each matching column pair] joined by AND operators.)

Using a WHERE Clause to Join Three or More Tables Based on Parent Child Relationships

As you learned in Tip 284, "Using a WHERE Clause to join Two Tables Related by a Single-Column PRIMARY KEY/FOREIGN KEY Pair," and Tip 285, "Using a WHERE Clause to Join Two Tables Related by a Composite PRIMARY KEY/FOREIGN KEY Pair," the DBMS generates the Cartesian product of the tables listed in the SELECT statement's FROM clause each time the system executes a two table query. (You learned about Cartesian products in Tip 281, "Understanding Cartesian Products.") The DBMS then uses one or more search conditions in the query's WHERE clause to filter out joined yet unrelated pairs of rows. If you increase the number of tables joined in the query from two to three (or more), the DBMS still goes through the same process of generating the Cartesian product of the tables and then filtering out the joined yet unrelated rows from the interim virtual (Cartesian product) table.

Each search condition in the WHERE clause that is used to filter out unrelated joined rows must identify a pair of columns (one column from each pair of related tables) whose values must match if the joined row in the interim table expresses a valid parent/child relationship. For example, the WHERE clause in the two-table query

 SELECT f_name, l_name, inv_num, inv_date,
 (inv_total - amt_paid) AS 'Balance Due'
 FROM invoices, customers
 WHERE invoices.CID = customers.ID

filters out invoice detail rows joined to unrelated customer detail rows (and vice versa).

A parent/child relationship between two tables requires that a child's FOREIGN KEY value (INVOICES.CID) must match the parent's PRIMARY KEY value (CUTOMERS.ID). Therefore, the WHERE clause filter in the current example removes all joined rows in which the values in the PRIMARY KEY/FOREIGN KEY column pair do not match because these rows do not express a valid parent/child relationship between the two tables.

Similarly, to filter unwanted "garbage" rows out of the Cartesian product in a query on three (or more) tables, the WHERE clause must test pairs of column values that include at least one column from each of the source tables. For example, to get a list of customers including the invoice balances and salesperson's name, you can submit a three table query similar to:

 SELECT f_name, l_name, inv_no, inv_date,
 (inv_total - amt_paid) AS 'Balance Due',
 RTRIM(first_name)+' '+last_name AS 'Salesperson'
 FROM invoices, customers, employees
 WHERE (customers.ID = invoices.CID)
 AND (invoices.salesrep = employees.emp_ID)

The search condition in the WHERE clause filters out joined rows in which the ID (the PRIMARY KEY) from the CUSTOMERS (parent) table does not match the CID (the FOREIGN KEY) from the INVOICES (child) table, and joined rows in which the SALESREP (the FOREIGN KEY) from the INVOICES (child) table does not match the EMP_ID (the PRIMARY KEY) from the EMPLOYEES (parent) table.

As shown by the current example, the WHERE clause in a query on three (or more) tables does not have to test the same matching pair of column values for each table. The only requirement is that the WHERE clause must check the equality of at least one pair of FOREIGN KEY/PRIMARY KEY values from each set of parent/child tables listed in the query's FROM clause.

Using a WHERE Clause to Join Tables Based on Nonkey Columns

All of the WHERE clauses in the example multi-table queries in Tips 284-286 filtered out joined rows in which the FOREIGN KEY value from a child table did not match the PRIMARY KEY value from a parent table. However, SQL also lets you join rows based on matching values in other than PRIMARY KEY and FOREIGN KEY columns. Suppose, for example, that your company has a different set of training manuals for employees in each if its departments. The WHERE clause in the query

 SELECT first_name, last_name, title
 FROM manuals, employees
 WHERE manuals.for_dept = employees.dept

will filter out those rows that join employees in one department with manuals for another department and will produce a results table similar to:

 first_name last_name title
 ---------- --------- ----------------------------
 Richard Kimbal Handling Complaints
 Richard Kimbal Efficient Order Taking
 Richard Kimbal Frequently Asked Questions
 Hellen Waters Handling Complaints
 Hellen Waters Efficient Order Taking
 Hellen Waters Frequently Asked Questions
 Ed Norton Prospecting
 Ed Norton Working Callbacks
 Ed Norton Making Referral Calls
 Steve Forbes Prospecting
 Steve Forbes Working Callbacks
 Steve Forbes Making Referral Calls
 Charles Coulter Mechanics of the Pre-Close
 Charles Coulter Successful Closing Strategies
 Charles Coulter Proper Menu Planning
 Ralph Cramden Mechanics of the Pre-Close
 Ralph Cramden Successful Closing Strategies
 Ralph Cramden Proper Menu Planning

Joining tables based on matching nonkey column values generates a results table with joined rows that show many-to-many relationships between the two tables. For example, the results table in the current example shows that based on matching department values, each row from the MANUALS table is related to several rows from the EMPLOYEES table. Moreover, each row from the EMPLOYEES table has the same department as (and is therefore related to) several rows from the MANUALS table.

Joining tables based on matching pairs of PRIMARY KEY/FOREIGN KEY values, on the other hand, generates a results table that shows the one-to-many relationship between each row from the parent table and its children (related rows) in the child table. Or, looking at the relationships in reverse, the results table shows the many-to-one relationship between children (rows in the child table) related by a matching key value to a single (parent) row in the parent table.

Suppose, for example, that you have an EMPLOYEES parent table and a TIMECARDS child table. The SELECT statement

SELECT first_name, last_name, card_date, start_time,
 stop time
FROM employees, timecards
WHERE employees.emp_ID = timecards.emp_ID

will join parent rows (from the EMPLOYEES table) with related child rows (from the TIMECARDS table, based on matching pairs of PRIMARY KEY (EMPLOYEES.EMP_ID) and FOREIGN KEY (TIMECARDS.EMP_ID) values. After the WHERE clause filters out joined, unrelated rows, the results table will show each row from the EMPLOYEES table (parent) joined (by a matching pair of key column values) to zero or more rows from the TIMECARDS (child) table. Or, said another way, the results table will show one or more rows from the TIMECARDS (child) table joined (by a matching pair of key column values) to one and only one of the rows from the EMPLOYEES (parent) table.

The important thing to understand from the preceding discussion of many-to-one and one-to-many relationships is that you write the SELECT statement's WHERE clause the same way whether you are joining tables based on matching pairs of key column values or matching pairs of nonkey column values. In both cases, the WHERE clause includes a comparison test that filters out joined rows in which the values in the pair of columns that join related rows in the two tables do not match.

Understanding Non Equi Joins

Although all of the example multi-table queries in Tips 284-287 joined tables based on the equality of pairs of columns common to both tables, SQL also lets you join tables based on nonequality relationships between pairs of columns. Suppose, for example, that you want to generate a list of employees and the company benefits to which each is entitled based on length of employment. Given an EMPLOYEES table and an unrelated BENEFITS table, you can execute the SELECT statement

 SELECT first_name, last_name,
 CAST((GETDATE() - date_hired) AS INTEGER)
 AS 'Days Employed',
 description AS 'Eligible For'
 FROM employees, benefits
 WHERE CAST((GETDATE() - date_hired) AS INTEGER) >=
 days_on_job_required
 ORDER BY emp_ID

to produce a results table similar to:

 first_name last_name Days Employed Eligible For
 ---------- ---------- ------------- ---------------
 Robert Cunningham 3440 Retirement Plan
 Robert Cunningham 3440 Paid Vacation
 Robert Cunningham 3440 Paid Sick Days
 Robert Cunningham 3440 Paid Dental
 Robert Cunningham 3440 Paid Medical
 Lori Swenson 153 Paid Dental
 Lori Swenson 153 Paid Medical
 Richard Kimbal 93 Paid Dental
 Richard Kimbal 93 Paid Medical
 Glenda Widmark 32 Paid Medical

Each row in the results table joins an employee's name and length of employment (from the EMPLOYEES table) with each of the benefits to which the employee is entitled (from the BENEFITS table) based on length of employment.

Whether the multi-table query is based on an equi-join or a non-equi-join, a comparison test in the query's WHERE clause filters out the joined rows in which values in the pair of columns used to relate the two tables fail to satisfy the condition that defines the relationship between the tables.

In an equi-join, the WHERE clause uses the equality operator to compare values in the pair of columns used to relate the tables and filters out joined rows in which the paired columns have different values. Similarly, in the non-equi-join shown in the current example, the WHERE clauses uses the greater than or equal to (>=) comparison operator to filter out joined rows in which the value of the "Days Employed" expression is less than the value of the DAYS_ON_JOB_REQUIRED column (from the BENEFITS table).

The important thing to understand is that when executing a multi-table query, the DBMS always generates the Cartesian product of the tables listed in the SELECT statement's FROM clause. It then uses one or more search conditions in the WHERE clause to filter out the rows with values in related (paired) columns that do not satisfy the conditions of the comparison operator used to define the relationship between the two tables.

Using Qualified Column Names in Multi table Queries to Join Tables That Have the Same Names for One or More Columns

When writing a multi-table query, you can retrieve data from a table by using the name of the column in the SELECT statement if the name of the column with the data you want is unique to one of the tables joined in the query. For example, if you have a CUSTOMERS table and an EMPLOYEES as defined by the CREATE statements

 CREATE TABLE customers CREATE TABLE employees
 (cust_ID INTEGER, (emp_ID INTEGER,
 cust_f_name VARCHAR(30), emp_f_name VARCHAR(30),
 cust_l_name VARCHAR(30), emp_l_name VARCHAR(30))
 salesperson INTEGER)

you can execute a SELECT that retrieves column data values by name alone, such as:

 SELECT RTRIM(cust_f_name)+' '+cust_l_name AS 'Customer',
 RTRIM(emp_f_name)+' '+emp_l_name AS 'Salesperson'
 FROM customers, employees
 WHERE salesperson = emp_ID

The DBMS automatically knows to retrieve the customer's first and last names (CUST_F_NAME and CUST_L_NAME) from the CUSTOMERS table, and the employee's first and last names (EMP_F_NAME and EMP_L_NAME) from the EMPLOYEES table. After all, the DBMS can find each of the column names used in the query in one and only one of the query's source tables.

If, on the other hand, you execute a multi-table query in which you need data from a column that has the same name in more than one of the tables joined in the query, you must use a qualified column name in the SELECT statement. A qualified column name, as you learned in Tip 228, "Understanding Column References," tells the DBMS both the name of the table and the name of the column from which it is to retrieve a data value.

For example, if the CUSTOMERS and EMPLOYEES tables were created with the CREATE statements

 CREATE TABLE customers CREATE TABLE employees
 (cust_ID INTEGER, (emp_ID INTEGER,
 cust_f_name VARCHAR(30), f_name VARCHAR(30),
 cust_l_name VARCHAR(30), l_name VARCHAR(30))
 emp_ID INTEGER)

the DBMS would abort the execution of the SELECT statement

 SELECT RTRIM(cust_f_name)+' '+cust_l_name AS 'Customer',
 RTRIM(f_name)+' '+l_name AS 'Salesperson'
 FROM customers, employees
 WHERE emp_ID = emp_ID

and display an error message similar to

Server: Msg 209, Level 16 State 1, Line 1
Ambiguous column name 'emp_ID'.
Server: Msg 209, Level 16 State 1, Line 1
Ambiguous column name 'emp_ID'.

because the system cannot determine whether you want to use data values from the EMP_ID column of the CUSTOMERS table, or from the EMP_ID column of the EMPLOYEES table, or both in the query's WHERE clause.

To use data from a column whose name appears in more than one of the query's source tables, you must use a qualified column in the form:


 
.

Therefore, to correct the ambiguous reference to the EMP_ID column in the preceding example, rewrite the query as:

 SELECT RTRIM(cust_f_name)+' '+cust_l_name AS 'Customer',
 RTRIM(f_name)+' '+l_name AS 'Salesperson'
 FROM customers, employees
 WHERE customers.emp_ID = employees.emp_ID

When executing the revised (corrected) query, the DBMS knows to retrieve the EMP_ID value from the CUSTOMERS table for the expression on the left side of the equals (=) sign and to retrieve the EMP_ID value from the EMPLOYEES table for the expression on the right side of the equals (=) sign.

Using the ALL Keyword with an INTERSECT Operation to Include Duplicate Rows in the Query Results Table

As you learned in Tip 237, "Understanding the UNION, INTERSECT, and EXCEPT Operators," and Tip 238, "Using the INTERSECT Operator to Select Rows That Appear in All of Two or More Source Tables," you can use the INTERSECT operator to get a list of rows that appear in all of the results tables from two or more queries. For example, if you keep the list of your auto insurance customers in a table named AUTO_INS_CUSTOMERS and you keep the list of your home insurance customers in a union-compatible table named HOME_INS_CUSTOMERS, executing the query

 (SELECT * FROM auto_ins_customers)
 INTERSECT
 (SELECT * FROM home_ins_customers)

will produce a results table that lists all customers that have both an auto insurance and a home insurance policy.

The INTERSECT operator, like the UNION operator, eliminates duplicate rows from its results table. As such, if you want to get a list of 18- to 21-year-old auto insurance customers who have had an accident and a traffic ticket within the past year, execute a query similar to:

 (SELECT cust_ID FROM auto_ins_customers
 WHERE age BETWEEN 18 AND 21)
INTERSECT
 (SELECT cust_ID FROM traffic_violations
 WHERE date_of_infraction >= (GETDATE() - 365))
INTERSECT
 (SELECT cust_ID FROM auto_claims
 WHERE date_of_claim >= (GETDATE() - 365))

Because the DBMS eliminates duplicate rows of query results, the results table for the current query will list a particular CUST_ID once and only once-whether the 18- to 21-year-old customer has had one ticket and one claim or five tickets and three claims within the past year.

If you do not want the DBMS to eliminate duplicate rows from the results table generated from the INTERSECT of two or more sets of query results, use the ALL keyword in conjunction with the INTERSECT operator. For example, to list the CUST_ID in the results table once for each traffic violation or auto claim for 18- to 21-year-old customers who have had both a traffic violation and an accident claim within the past year, execute the INTERSECT query:

 (SELECT cust_ID FROM auto_ins_customers
 WHERE age BETWEEN 18 AND 21)
 INTERSECT ALL
 (SELECT cust_ID FROM traffic_violations
 WHERE date_of_infraction >= (GETDATE() - 365))
 INTERSECT ALL
 (SELECT cust_ID FROM auto_claims
 WHERE date_of_claim >= (GETDATE() - 365))
  Note 

If your DBMS, like MS-SQL Server, does not support the INTERSECT operator, you can use an AND Boolean operator to add a subquery that tests set membership to the query's WHERE clause in place of each INTERSECT pair. For example, the SELECT statement

SELECT cust_ID FROM auto_ins_customers
WHERE age BETWEEN 18 and 21
AND cust_ID IN (SELECT cust_ID FROM traffic_violations
 WHERE date_of_infraction >=
 (GETDATE() - 365))
AND cust_ID IN (SELECT cust_ID FROM auto_claims
 WHERE date_of_claim >= (GETDATE() - 365)

will generate the list of 18- to 21-year-old customers who have had both a traffic citation and an auto insurance claim within the past year-just like the second INTERSECT query in the current tip.

Generating the results table from the INTERSECT ALL query in the current tip requires the UNION of two non-INTERSECT queries such as:

 SELECT cust_ID FROM traffic_violations
 WHERE date_of_infraction >= (GETDATE() - 365)
 AND cust_ID IN (SELECT cust_ID FROM auto_ins_customers
 WHERE age BETWEEN 18 and 21)
 AND cust_ID IN (SELECT cust_ID FROM auto_claims
 WHERE date_of_claim >= (GETDATE() - 365)
 UNION ALL
 SELECT cust_ID FROM auto_claims
 WHERE date_of_claim >= (GETDATE() - 365)
 AND cust_ID IN (SELECT cust_ID FROM auto_ins_customers
 WHERE age BETWEEN 18 and 21)
 AND cust_ID IN (SELECT cust_ID FROM traffic_violations
 WHERE date_of_infraction >=
 (GETDATE() - 365)
 ORDER BY cust_ID

Using the CORRESPONDING Keyword in an INTERSECT Query on Non union Compatible Tables

By definition, two tables are union-compatible if both have the same number of columns and if the data type of each column in one table is the same as the data type of its corresponding column (by ordinal position) in the other table. If two tables are union-compatible, you can use the INTERSECT operator (which you learned about in Tip 238, "Using the INTERSECT Operator to Select Rows That Appear in All of Two or More Source Tables") on the rows returned by two SELECT statements such as

 SELECT * FROM table_a
 INTERSECT
 SELECT * FROM table_b

to generate a results table that has all of the rows from TABLE_A that are also in TABLE_B.

If, on the other hand, you have two tables that are not union-compatible, you can still use the INTERSECT operator to find sets of data values common to both tables by adding the CORRESPONDING keyword to the INTERSECT query. Suppose, for example, that you want a list of your vendors that contribute to both the Republican and Democratic parties. As long as each of the columns with the same name in all of the tables also has an identical data type across the tables, a query such as

 SELECT * FROM vendors
 INTERSECT CORRESPONDING
 SELECT * FROM republican_contributors
 INTERSECT CORRESPONDING
 SELECT * FROM democrat_contributors

will produce a results table similar to:

 tax_ID vendor_name phone_number
 ---------- ----------------- --------------
 88-5481815 'ABC Corporation' (748)-254-5565
 88-5107204 'XYZ Corporation' (754)-875-5648

In the current example, the three tables have three columns with the same name (TAX_ID, VENDOR_NAME, PHONE_NUMBER), and only two rows have matching values in the three columns across the three tables. (When you execute an INTERSECT CORRESPONDING query, the DBMS checks for matching data values in [and displays] only the columns with identical names in all of the tables.)

When you want to display the data values from only some of the matching columns while still requiring that all matching corresponding columns have the same value, list the columns you want to display after one of the CORRESPONDING keywords in the INTERSECT CORRESPONDING query. For example, if you want to display only the VENDOR_NAME, change the query in the current example to:

 SELECT * FROM vendors
 INTERSECT CORRESPONDING
 SELECT * FROM republican_contributors
 INTERSECT CORRESPONDING (vendor_name)
 SELECT * FROM democrat_contributors

The DBMS will then display only the VENDOR_NAME for those rows with the same combination of TAX_ID, VENDOR_NAME, and PHONE_NUMBER.

Using a Multi table JOIN Without a WHERE Clause to Generate a Cartesian Product

Whenever you tell the DBMS to execute a multi-table query, the system first generates the Cartesian product of the source tables listed in the SELECT statement's FROM clause. It then uses the search condition in the WHERE clause to filter out unwanted rows. Therefore, if you want to generate the Cartesian product of two or more tables, simply omit the WHERE clause normally present in a multi-table query.

For example, if you want to get a list of teachers and the classes they teach, execute a query similar to

 SELECT class_ID,
 RTRIM(first_name)+' '+last_name AS 'Instructor'
 FROM classes, teachers
 WHERE classes.instructor = teachers.ID

which joins each row of class information from the CLASSES table with the name of the instructor for the class from the TEACHERS table to produce a results table similar to:

 class_ID Instructor
 --------------- -------------
 English 101 Ishud Reedmour
 Composition 101 Wanda Wright
 Math 101 Mathew Mattick

If you want the list of all possible combinations of classes and teachers (that is, the Cartesian product of selected columns from the CLASSES table and the TEACHERS table) instead, rewrite the query as

 SELECT class_ID,
 RTRIM(first_name)+' '+last_name AS 'Instructor'
 FROM classes, teachers

to produce a (Cartesian product) results table similar to:

 class_ID Instructor
 --------------- --------------
 English 101 Ishud Reedmour
 Composition 101 Ishud Reedmour
 Math 101 Ishud Reedmour
 English 101 Wanda Wright
 Composition 101 Wanda Wright
 Math 101 Wanda Wright
 English 101 Mathew Mattick
 Composition 101 Mathew Mattick
 Math 101 Mathew Mattick

Moreover, if you have a third table—STUDENTS, for example—simply add its name to the list of source tables in the query's FROM clause, and add the STUDENTS columns you want to see in the results table to the query's SELECT clause. For example, when the DBMS executes the query

 SELECT class_ID,
 RTRIM(first_name)+' '+last_name AS 'Instructor',
 RTRIM(students.f_name)+' '+students.l_name AS 'Student'
 FROM classes, teachers, students

it will first generate the Cartesian product of the CLASSES table and the TEACHERS table. Next, the system will generate the Cartesian product of that (Cartesian) product table and the STUDENTS table, to produce a results table similar to

 class_ID Instructor Student
 --------------- -------------- -------------
 English 101 Ishud Reedmour Ima Pupil
 Composition 101 Ishud Reedmour Ima Pupil
 Math 101 Ishud Reedmour Ima Pupil
 English 101 Wanda Wright Ima Pupil
 Composition 101 Wanda Wright Ima Pupil
 Math 101 Wanda Wright Ima Pupil
 English 101 Mathew Mattick Ima Pupil
 Composition 101 Mathew Mattick Ima Pupil
 Math 101 Mathew Mattick Ima Pupil
 English 101 Ishud Reedmour Uhara Student
 Composition 101 Ishud Reedmour Uhara Student
 Math 101 Ishud Reedmour Uhara Student
 English 101 Wanda Wright Uhara Student
 Composition 101 Wanda Wright Uhara Student
 Math 101 Wanda Wright Uhara Student
 English 101 Mathew Mattick Uhara Student
 Composition 101 Mathew Mattick Uhara Student
 Math 101 Mathew Mattick Uhara Student

given that Ima Pupil and Uhara Student are the only two names in the STUDENTS table.

  Note 

The Cartesian product of two (or more) tables actually consists of all possible combinations of rows from all of the tables with each row made up of the columns from all of the tables as well. Therefore, the preceding examples do not generate a true Cartesian product, since they show all possible combinations of rows but not the combination of all of the columns from all of the tables in each row. To get a true Cartesian product, use the "all columns" operator (the asterisk (*)) or list all (vs. some) of the column names in the query's SELECT clause. For example, rewrite the preceding example query as

 SELECT * FROM classes, teachers, students

to generate the true Cartesian product of the CLASSES, TEACHERS, and STUDENTS tables.

Using Aliases (Correlation Names) as Shorthand for Table Names

If a column name used in an SQL statement has the same name in two or more of the statement's source tables, you must use a qualified column name. As you learned in Tip 289, "Using Qualified Column Names in Multi-table Queries That Join Tables That Have the Same Names for One or More Columns," a qualified column name includes both the name of the column and the name of the table whose data the query is to use.

Moreover, to reference a table owned by another user, you must use a fully qualified column name, which means your column reference must include the column name, the table name, and the username of the table's owner. Therefore, if you want to generate a list of birthdays and anniversaries using tables owned by another user (such ANNIVERSARY_BIRTHDAY owned by KONRAD, and EMPLOYEES owned by HR_ADMIN, for example), the column references in the query will be rather long:

 SELECT
 CONVERT(CHAR(12,konrad.anniversary_birthday.next_date,107)
 AS 'Date', hr_admin.employees.emp_ID,
 RTRIM(hr_admin.employees.first_name)+
 ' '+hr_admin.employees.last_name AS 'Employee Name',
 konrad anniversary_birthday.relationship,
 RTRIM(konrad.anniversary_birthday.first_name)+' '+
 konrad.anniversary_birthday.last_name AS
 'Family Member', konrad.anniversary_birthday.event,
 CONVERT(INTEGER,DATENAME(year,
 konrad.anniversary_birthday.next_date)) -
 CONVERT(INTEGER,DATENAME(year,
 konrad.anniversary_birthday.first_date)) AS 'Years'
 FROM hr_admin.employees, konrad.anniversary_birthday
 WHERE konrad.anniversary_birthday.next_date
 BETWEEN GETDATE() AND (GETDATE() + 30)
 AND hr_admin.employees.emp_ID =
 konrad.anniversary_birthday.emp_ID
 ORDER BY konrad.anniversary_birthday.next_date,
 hr_admin.employees.emp_ID

Because typing a query with long qualified column names or several references to columns common to multiple tables can get quite tedious, SQL lets you use an alias (or correlation name) in place of any or all of the table names used in a statement. To define an alias you can use in place of a table name, simply type the alias after the name of the table in the statement's FROM clause. Thus, in the current example you would replace the FROM clause

 FROM hr_admin.employees, konrad.anniversary_birthday

with the FROM clause

 FROM hr_admin.employees e, konrad.anniversary_birthday ab

if you want to use the letter e as an alias for HR_ADMIN.EMPLOYEES table and the letters ab to mean KONRAD.ANNIVERSARY_BIRTHDAY table in the query. You could then rewrite the query in the current example as:

 SELECT
 CONVERT(CHAR(12,ab.next_date,107) AS 'Date', e.emp_ID,
 RTRIM(e.first_name)+' '+e.last_name AS 'Employee Name',
 ab.relationship,
 RTRIM(ab.first_name)+' '+ab.last_name AS 'Family Member',
 ab.event, CONVERT(INTEGER,DATENAME(year,ab.next_date)) -
 CONVERT(INTEGER,DATENAME(year,ab.first_date)) AS 'Years'
 FROM hr_admin.employees e, konrad.anniversary_birthday ab
 WHERE ab.next_date BETWEEN GETDATE() AND (GETDATE() + 30)
 AND e.emp_ID = ab.emp_ID
 ORDER BY ab.next_date, e.emp_ID

The only restrictions on correlation names (aliases) are that you cannot use the same alias to refer to more than one of the tables listed in the FROM clause, and you cannot use both an alias and the table's long (full) name in the same statement.

Understanding One to Many and Many to One Joins

In a relational database, a one-to-many relationship is most often called a parent-child relationship because one parent can have many children, while a child can have only one parent. Arguably (in the real world), a child has two parents. However, only one of the parents brings the child into the world. Similarly, think of child rows as being the offspring from a single parent row. Figure 294.1 illustrates the one-to-many relationship (parent-to-children) between a row in the EMPLOYEES (parent) table and several rows in the TIMECARDS (child) table.

click to expand
Figure 294.1: One-to-many relationships between (parent) rows in an EMPLOYEES table and (child) rows in a TIMECARDS table

The relationship between parent and child is represented by the value of a column (or set of columns) common to both tables. In the current example, a parent-child relationship exists between those rows in which the value in the EMP_ID column of the EMPLOYEES table matches the value in the EMP_ID column of the TIMECARDS table.

  Note 

SQL does not require that the column used to create the one-to-many relationship between parent and child tables have the same name in both tables. In the current example, the EMP_ID column in the EMPLOYEES table could have had another name (such as EMP_NUM, for example). The value in the column, not its name, defines the relational link between rows in one table and rows in another.

To exploit (or exercise) the one-to-many relationship, execute a query that equates the related column in one table with its counterpart in the other table. For example, the query

 SELECT * FROM employees, timecards
 WHERE employees.emp_ID = timecards.emp_ID

will produce a results table that joins each row in the parent table (EMPLOYEES) with one or more rows from the child table (TIMECARDS) based on a matching value in the pair of EMP_ID columns-one from each table.

Actually, as you learned in Tip 283, "Understanding Joins and Multi-table Queries," the DBMS first generates the Cartesian product of the TIMECARDS table and the EMPLOYEES table. It then filters out all rows except those in which the value in the EMP_ID column from EMPLOYEES matches the value in the EMP_ID column from TIMECARDS.

The end result of a one-to-many query is a results table similar to

emp_ID f_name l_name emp_ID date time_in time_out
------ ------ ------ ------ ---------- ------- --------
101 Konrad King 101 2000-01-01 800 1600
101 Konrad King 101 2000-01-02 830 1630
101 Konrad King 101 2000-01-03 900 1600
502 Sally Fields 502 2000-01-02 615 1630
502 Sally Fields 502 2000-01-03 700 1700
609 Wally Wells 609 2000-01-03 800 1200
609 Wally Wells 609 2000-01-03 1230 1630

which has only parent rows joined with one or more child rows. Therefore, you may find it easier to conceptualize a multi-table query as joining parent and child rows versus generating all possible combinations of joins and then filtering out unwanted rows that combine a parent row with a child row that belongs to a different parent.

If you reverse the links in Figure 294.1 (from TIMECARDS to EMPLOYEES), you can see that the child-to-parent relationship is many-to-one. Many (that is, one or more) rows in the child table relate (again by matching column value) to one and only one row in the parent table.

  Note 

Figure 294.1 shows the one-to-many relationship between EMPLOYEES and TIMECARDS rows by relating the PRIMARY KEY value in the parent table (EMPLOYEES) with a FOREIGN KEY value in the child table (TIMECARDS). However, SQL does not require that you use a pair of key columns to relate to tables. As long as one of the columns in the two related tables is constrained as UNIQUE or has no duplicate values, performing an equi-join on the source tables will produce a results table showing one-to-many relationships. If both tables have duplicate values in the column used to relate rows in an equi-join, the results table will show many-to-many relationships—which you will learn about in Tip 295, "Understanding Many-to-Many Joins."

Understanding Many to Many Joins

As you learned in Tip 294, "Understanding One-to-Many and Many-to-One Joins," a join (or multi-table query) is a two-step process. First, generate all possible pairs of rows from two related tables. Second, use the Boolean expression in the statement's WHERE clause to filter out the "garbage" rows. (Garbage rows are interim [virtual] table rows that join a row from one table with an unrelated row from the other table.)

In a one-to-many join, the WHERE clause filters out joined rows such that only those that join each row from the first table with one or more rows from the second table are left. Or, looking at the relationship in reverse, the WHERE clause filter leaves only joined rows in which each row from the second table is joined with one (and only one) row from the first table.

In a many-to-many join, on the other hand, the WHERE clause filter leaves not only rows in which each row from the first table is joined with one or more rows from the second table, but also rows in which each row from the second table is joined with one or more rows from the first table.

The decision whether or not to include a joined row from the interim (virtual) table in the final results table is based on the values in one or more pairs of columns common to both tables. When the DBMS executes either an equi-join (which you learned about in Tip 283, "Understanding Joins and Multi-table Queries") or a non—equi-join (which you learned about in Tip 288, "Understanding Non—Equi-Joins), the system filters out joined rows in which the values in the matching pair (or pairs) of columns used to relate the tables fails to satisfy the condition of the relational operator in the query's WHERE clause.

For example, when executing an equi-join such as

 SELECT * FROM customers, autos_sold
 WHERE customers.cust_ID = autos_sold.sold_to

the DBMS will generate an interim (virtual) table that has all possible pairs of rows from the CUSTOMERS table and the AUTOS_SOLD table. Then the system will use the WHERE clause to filter out all joined rows that do not have matching values in the CUST_ID and SOLD_TO columns.

Similarly, when executing a non—equi-join such as

 SELECT * FROM customers, auto_inventory
 WHERE customers.max_price >= auto_inventory.price

the DBMS will generate an interim (virtual) table that has all possible pairs of rows from the CUSTOMERS table and the AUTO_INVENTORY table. Next, the system will use the WHERE clause to filter out all joined rows in which the value in the MAX_PRICE column is less than the value in the PRICE column.

In both of these queries (or joins), the DBMS uses the value in a pair of columns (one column from each of the related tables) to decide which of the joined rows to keep in the results table. (In the first example, an equi-join, the pair of columns has to have matching values; in the second example, a non—equi-join, the value in one column has to be greater than or equal to the value in the other.)

Non—equi-joins (queries in which the Boolean operator in the WHERE clause is other than an equals [=] sign) always produce a results table with many-to-many relationships. For example, Figure 295.1 shows some of the joined rows that will pass through the WHERE clause filter into the results table of the second query in the current tip.

click to expand
Figure 295.1: Many-to-many relationships between rows in a CUSTOMERS table and rows in an AUTO_INVENTORY table

As shown in Figure 295.1, the results table will have rows in which each row from the CUSTOMERS table is joined with one or more rows from the AUTO_INVENTORY table. Moreover, the results table will also have rows in which each row from the AUTO_INVENTORY table is joined with one or more rows from the CUSTOMERS table.

  Note 

The equi-join used as the first example in the current tip will perform a one-to-many join of the CUSTOMERS table and the AUTOS_SOLD table only if the values in either the CUST_ID column or the SOLD_TO column are unique. If you allow the user to add duplicate CUST_ID and SOLD_TO values, the equi-join query

SELECT * FROM customers, autos_sold
WHERE customers.cust_ID = autos_sold.sold_to

like the non—equi-join query

 SELECT * FROM customers, auto_inventory
 WHERE customers.max_price >= auto_inventory.price

will generate a results table that shows many-to-many join relationships in the joined rows.

Understanding the NATURAL JOIN

A NATURAL JOIN is a special type of equi-join with an implied WHERE clause that compares all columns in one table with corresponding columns that have the same name in another table for equality. Therefore, after the DBMS filters the product of the source tables through the natural join's implied WHERE clause, the final results table will have only joined rows in which all pairs of columns that have the same name in both two tables also have matching values.

Suppose, for example, that you have an EMPLOYEES table and a SALES table created by:

 CREATE TABLE employees CREATE TABLE sales
 (emp_ID INTEGER, (sales_date DATETIME,
 f_name VARCHAR(30), amount_sold MONEY,
 l_name VARCHAR(30)) emp_ID INTEGER,
 office_ID INTEGER)

To get a list of employees and their sales for each office, you can submit a NATURAL JOIN similar to:

 SELECT employees.emp_ID, sales.office_ID, f_name, l_name,
 sales_date, amount_sold
 FROM employees NATURAL JOIN sales

When the DBMS executes the query, it will join rows from the EMPLOYEES table with rows from the SALES table that have matching values in the pair of EMP_ID columns—the two columns that have the same name in both tables. Similarly, if the EMPLOYEES table also had an OFFICE_ID column, the NATURAL JOIN query in the current example would join rows in which both pairs of same name columns (EMP_ID and OFFICE_ID) had matching values in the two tables.

In effect, a NATURAL JOIN is equivalent to an equi-join with a WHERE clause that equates each pair of columns with the same name in both source tables. As such, you can rewrite the NATURAL JOIN in the current example as:

SELECT employees.emp_ID, sales.office_ID, f_name, l_name,
 sales_date, amount_sold
FROM employees, sales
WHERE employees.emp_ID = sales.emp_ID

Or, if the EMPLOYEES table also had an OFFICE_ID column, you could rewrite the NATURAL JOIN of EMPLOYEES and SALES in the current example as:

 SELECT employees.emp_ID, sales.office_ID, f_name, l_name,
 sales_date, amount_sold
 FROM employees, sales
 WHERE employees.emp_ID = sales.emp_ID
 AND employees.office_ID = sales.office_ID
  Note 

If your DBMS product, like MS-SQL Server, does not support the NATURAL JOIN operator, simply use an equi-join with a WHERE clause that uses AND operators to combine search conditions that equate each pair or columns with the same names in both tables. Whatever its form, the important thing to remember is that a NATURAL JOIN is a query that joins rows only if all pairs of columns with the same name in both source tables have matching values. Therefore, if you use a NATURAL JOIN, make sure that all related (joinable) columns have the same name in both tables and that all unrelated columns have names unique to each table.

Understanding the Condition JOIN

A condition join is a multi-table query that can use any of the relational operators (>, <, >=, <=, <>, and =) to relate a column in one table with the column value of a corresponding (related) column in another table. In short, a condition join is like an equi-join, except you can use any relational operator in a condition join, while you can use only the equality operator (=) in an equi-join. The only difference between a condition join and a multi-table query with a WHERE clause is that you will find the search condition used to relate the tables in the condition join's ON clause instead of a WHERE clause.

For example, the condition join

 SELECT * FROM employees JOIN customers
 ON (salesperson_ID = emp_ID)

will generate a results table with each row from the CUSTOMERS table joined to the row in the EMPLOYEES table in which the value in the EMP_ID column (from the row in the EMPLOYEES table) matches the SALESPERSON_ID (from the row in the CUSTOMERS table). As such, the condition join in the current example is functionally equivalent to the multi-table join:

 SELECT * FROM employees, customers
 WHERE salesperson_ID = emp_ID

Similarly, you can use a condition join such as

SELECT DISTINCT e.emp_ID e.f_name, e.l_name, e.total_sales
FROM NV_employees e JOIN AZ_employees
ON (e.total_sales > AZ_employees.total_sales)

to generate a list of Nevada office salespeople (from rows in the NV_EMPLOYEES table) whose TOTAL_SALES are greater than the TOTAL_SALES of at least one of the Arizona office salespeople (found in the AZ_EMPLOYEES table). Or, you could use the multi-table query

 SELECT DISTINCT e.emp_ID e.f_name, e.l_name, total_sales
 FROM NV_employees e, AZ_employees
 WHERE e.total_sales > AZ_employees.total_sales

to produce the same result.

  Note 

If your DBMS product does not support a condition join, simply rewrite the condition join as a multi-table query with a WHERE clause. Put the search condition from the condition join's ON clause into the query's WHERE clause. The important thing to remember is that a condition join, like every other multi-table query or join, uses the search condition (whether in a WHERE clause or in an ON clause) to filter out any joined yet unrelated rows. The DBMS rejects joined, unrelated rows because their column values fail to satisfy the search condition in the WHERE clause (or in the ON clause).

Using the CROSS JOIN to Create a Cartesian Product

A CROSS JOIN (also called a cross product or a Cartesian product) of two tables is a third table that contains all possible pairs of rows from the two cross joined source tables. For example, if you have two tables, TABLE_1 and TABLE_2, each of which has two columns and three rows, the CROSS JOIN

 SELECT * FROM table_1 CROSS JOIN table_2

will pair each of the rows in TABLE_1 with each of the rows in TABLE_2 to produce a results table with four columns and nine rows, as shown in Figure 298.1.

click to expand
Figure 298.1: The CROSS JOIN of TABLE_ 1 and TABLE_2

  Note 

You can determine the number of columns in a CROSS JOIN (or Cartesian product) of two tables by adding the number of columns in the first table to the number of columns in the second table. Moreover, the number of rows in a CROSS JOINs results table will always be the number of rows in the first table multiplied by the number of rows in the second table.

If your DBMS product does not support the CROSS JOIN operator, you can still generate the Cartesian product of two tables by executing a multi-table query without a WHERE clause. Suppose, for example, that you have an ATHLETES table and want to pair each athlete with each row of events in a DECATHLON_EVENTS table. If your DBMS supports the CROSS JOIN operator, you can write the query as:

SELECT * FROM athletes CROSS JOIN decathlon_events

Or, you can generate the same results table with the multi-table query:

 SELECT * FROM athletes, decathlon_events

As you learned from the multi-table queries in Tips 282–289, the CROSS JOIN (or Cartesian product) is rarely the final result you want from a query. However, the DBMS normally generates the CROSS JOIN of the tables in a SELECT statement's FROM clause as a first step in every query. The system then performs a sequence of steps that manipulate and filter the interim (virtual) Cartesian product table (of the source tables) to produce the joined rows in the final results table.

Understanding the Column Name JOIN

A column name join is very much like the NATURAL JOIN you learned about in Tip 296, "Understanding the NATURAL JOIN." While a NATURAL JOIN requires that all pairs of columns with the same name in both tables have matching values, a column name join lets you specify the pairs of same-name column values the DBMS is to compare. As a result, you can use the column name join to join not only tables that are joinable with a NATURAL JOIN, but also tables that are not. After all, if you write a column name join that requires a match of all pairs of columns with the same name in both tables, the column name join is in effect, a NATURAL JOIN. However, if you have a CUSTOMERS table and an EMPLOYEES table created by

 CREATE TABLE customers CREATE TABLE employees
 (cust_ID INTEGER, (emp_ID INTEGER,
 f_name VARCHAR(30), f_name VARCHAR(30),
 l_name VARCHAR(30)) l_name VARCHAR(30))
 emp_ID INTEGER)

you cannot use the NATURAL JOIN

 SELECT RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
 FROM customers c NATURAL JOIN employees e

to get a combined list showing each customer's name and the name of the customer's salesperson. As you learned in Tip 296, the NATURAL JOIN requires that all pairs of same-name columns have matching values in joined rows. Therefore, the NATURAL JOIN of CUSTOMERS and EMPLOYEES in the current example will show only the customers that happen to have a salesperson whose first and last names match the customer's first and last name, respectively.

However, you can use a column name join such as

 SELECT RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
 FROM customers c JOIN employees e
 USING (emp_ID)

to join the CUSTOMERS and EMPLOYEES tables in the current example. Instead of requiring matching values in all pairs of same-name columns, the USING clause in the current example query tells the DBMS to join rows that have matching values in the EMP_ID column (from each table)—whether the F_NAME and L_NAME column values match or not.

  Note 

If your DBMS, like MS-SQL Server, does not support the USING clause, use a condition join instead of a column name join. The only difference between the two types of joins is that the column name join implicitly specifies that the same name column(s) named in the USING clause must have matching values, while the condition join explicitly equates the pair of columns in an ON clause. As such, to convert the column name join in the current example to a condition join, simply rewrite the query as:

SELECT RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
FROM customers c JOIN employees e
ON (c.emp_ID = e.emp_ID)

Using an INNER JOIN to Select All Rows in One Table That Relate to Rows in Another Table

An INNER JOIN is a multi-table query in which the DBMS returns only related pairs of rows from the source tables—that is, the query's results table will contain only joined rows that satisfy the search condition in the query's ON clause. Conversely, if a row in either source table does not have a corresponding (related) row in the other table, the row is filtered out and therefore not included in the results table.

Suppose, for example, that you have stocks lists from two analysts, and you want to create a list of the stock recommendations they have in common. If one analyst's list is in table STOCK_LIST_A and the other analyst's list is in table STOCK_LIST_B, then an INNER JOIN of the two tables, such as

 SELECT a.symbol,
 a.buy_at AS 'Buy Price A', a.sell_at AS 'Sell Price A',
 b.buy_at AS 'Buy Price B', b.sell_at AS 'Sell Price B'
 FROM stock_list_a a INNER JOIN stock_list_b b
 ON (a.symbol = b.symbol)

will display a results table similar to

symbol Buy Price A Sell Price A Buy Price B Sell Price B
------ --------- ------------ ----------- --------------
CSCO 50 60 55 70
LU 32 40 30 45
F 26 32 27 40
GM 60 69 58 63
VTSS 86 92 82 89
LEN 28 32 30 34

which lists only stock symbols and price information for those stocks in both STOCK_LIST_A and STOCK_LIST_B. Therefore, any row with a SYMBOL column value in either table that does not match the value in the SYMBOL column of a row in the other table will be filtered out and therefore not included as one of the joined rows in the final results table.

If you rewrite the INNER JOIN in the current example as an equivalent multi-table query such as

SELECT a.symbol,
 a.buy_at AS 'Buy Price A', a.sell_at AS 'Sell Price A',
 b.buy_at AS 'Buy Price B', b.sell_at AS 'Sell Price B'
FROM stock_list_a a, stock_list_b b
WHERE a.symbol = b.symbol

you can see that an INNER JOIN is just another syntax (or way of writing) the multi-table equi-join queries you learned about in Tips 282–289.

  Note 

By default, the DBMS will execute a multi-table query as an INNER JOIN unless you specify one of the OUTER JOIN queries you will learn about in Tips 302–305. Therefore, the query

SELECT * FROM table_a JOIN table_b
ON (table_a.column_to_relate = table_b.column_to_relate)

is equivalent to:

SELECT * FROM table_a INNER JOIN table_b
ON (table_a.column_to_relate = table_b.column_to_relate)

Understanding the Role of the USING Clause in an INNER JOIN

The USING clause in an INNER JOIN lists the pairs of same name columns whose values must match in order for the DBMS to include a joined row in the query's results table. In other words, the system generates the Cartesian product of the tables listed in the SELECT statement's FROM clause and then filters out any joined rows in which the pairs of same name columns listed in the USING clause do not have matching values.

For example, the USING clause in the query

 SELECT class, section, description, title
 FROM curriculum INNER JOIN book_list
 USING (class)

tells the DBMS to generate a results table with only joined rows in which the value in the CLASS column from the CURRICULUM table matches the value in the CLASS column from the BOOK_LIST table.

  Note 

The column name(s) listed in a USING clause must appear in both of the tables listed in the FROM clause. Moreover, the column(s) must be defined as either of the same data type or of compatible data types.

If the relationship between the tables in a query is based on matching values in multiple same-name columns, the USING clause will have more than one column name. For example, if the required reading list for each section of a class is unique, then the query in the current example must be rewritten as

 SELECT class, section, description, title
 FROM curriculum INNER JOIN book_list
 USING (class, section)

to indicate that both the SECTION column pair and the CLASS column pair must have matching values in order for the DBMS to include the joined row in the results table.

The important thing to remember is that the USING clause can specify table row relationships based only on the equality of values in same name column pairs. Therefore, the USING clause in the preceding query is equivalent to the ON clause in

 SELECT class, section, description, title
 FROM curriculum INNER JOIN book_list
 ON ((curriculum.class = book_list.class) AND
 (curriculum.section = book_list.section)

and the WHERE clause in:

 SELECT class, section, description, title
 FROM curriculum, book_list
 WHERE (curriculum.class = book_list.class)
 AND (curriculum.section = book_list.section)
  Note 

The USING clause really does not add any functionality to SQL. As a result, many DBMS products, such as MS-SQL Server, do not support it. After all, as you can see from the final two example queries in the current tip, you can use an ON clause or a WHERE clause to perform the same function as a USING clause. However, if your DBMS supports it, you may want to use the USING clause because it lets you write queries that are more compact and (perhaps) easier to understand.

Understanding the OUTER JOIN

Both inner joins and multi-table queries with a WHERE clause combine rows from multiple tables (two tables at a time) and generate results tables that contain only pairs of rows. In other words, if a row in either of the source tables does not have a match (or related row) in the other source table, the DBMS will not place the row in the final results table. As a result, unmatched rows from both tables appear to "vanish" when you execute a multi-table query or INNER JOIN.

An OUTER JOIN, on the other hand, tells the DBMS to generate a results table with not only pairs of related rows, but also unmatched rows from either of the two source tables. For example, if you have a STUDENTS table and a FACULTY table with the following data

 STUDENTS table FACULTY table
 f_name l_name major f_name l_name dept_head
 ------ -------- ----------- ------ -------- ---------
 Sally Smith English Lori Raines English
 Allen Winchell Mathematics Marcus Elliot Engineering
 Bruce Dern Business Kelly Wells Mathematics
 Susan Smith NULL Kris Matthews NULL
 Howard Baker NULL Linda Price NULL

then the single-table queries

 SELECT * FROM students
 SELECT * FROM faculty

will generate results tables with five rows of data. However if you join rows from the STUDENTS table with rows from the FACULTY table using a multi-table query such as

SELECT RTRIM(s.f_name+' '+s.l_name AS 'Student', major,
 dept_head, RTRIM(f.f_name)+' '+f.l_name AS 'Professor'
FROM students s, faculty f
WHERE major = dept_head
ORDER BY major, dept_head

then the DBMS will generate a results table with only two rows:

 Student major dept_head Professor
 -------------- ----------- ----------- -----------
Sally Smith English English Lori Raines
Allen Winchell Mathematics Mathematics Kelly Wells

The remaining three rows in the STUDENTS table seem to "vanish" because the DBMS cannot pair with rows in the FACULTY table by finding a matching DEPT_HEAD column value (from the FACULTY table) for the MAJOR column value (from the STUDENTS table). Similarly, the system filters out three unmatched rows from the FACULTY table for the same reason—no matching value in the MAJOR column (from the STUDENTS table) for the value in the DEPT_HEAD column (from the FACULTY table).

  Note 

The SQL standard specifies that the predicate NULL = NULL is FALSE. As a result, the WHERE clause filters out any rows that join a student without a major (that is, a row from the STUDENTS table with a NULL in the MAJOR column) with a faculty member who is not the head of a department (that is, a row from the FACULTY table with a NULL in the DEPT_HEAD column).

To list not only paired (related) rows but also unmatched rows from either table, rewrite the multi-table query (which is, in effect, an INNER JOIN based on the columns listed in the WHERE clause) with a FULL OUTER JOIN such as

 SELECT RTRIM(s.f_name+' '+s.l_name AS 'Student', major,
 dept_head, RTRIM(f.f_name)+' '+f.l_name AS 'Professor'
 FROM students s FULL OUTER JOIN faculty f
 ON (major = dept_head)
 ORDER BY major, dept_head

to generate a results table similar to:

 Student major dept_head Professor
 -------------- ----------- ----------- -------------
 NULL NULL NULL Kris Mathews
 NULL NULL NULL Linda Price
 Susan Smith NULL NULL NULL
 Howard Baker NULL NULL NULL
 NULL NULL Engineering Marcus Elliot
 Bruce Dern Business NULL NULL
 Sally Smith English English Lori Raines
 Allen Winchell Mathematics Mahtematics Kelly Wells

The results table from the FULL OUTER JOIN has eight rows:

  • Two joined (related) rows (Sally Smith and Allen Winchell) produced by the multi-table query (the INNER JOIN)
  • Two unmatched rows (from the STUDENTS table) for students without a major (Susan Smith and Howard Baker)
  • One unmatched row (from the STUDENTS table) for a student (Bruce Dern) with a MAJOR (Business) without a department head in the FACULTY table
  • Two unmatched rows (from the FACULTY) table for professors who are not department heads (Kris Mathews and Linda Price)
  • One unmatched row (from the FACULTY table) for a professor (Marcus Elliot) who is head of a department (Engineering) in which no students are majoring

Notice that the DBMS fills the remaining results table columns from the "other" table with NULL values in each of the unmatched rows. For example, Bruce Dern is a business major. However, since there is no DEPT_HEAD for the business department in the FACULTY table, the DBMS puts a NULL into the DEPT_HEAD and PROFESSOR columns (from the FACULTY table) in the joined row in the results table. Similarly, Kris Mathews is not the head of a department, so the DBMS puts a NULL in the STUDENT and MAJOR columns (from the STUDENTS table) in the joined row in the results table.

Understanding the LEFT OUTER JOIN

As you learned in Tip 302, "Understanding the OUTER JOIN," the results table from an INNER JOIN includes only pairs of related rows, while the results table from an OUTER JOIN includes both matched (related) rows and unmatched rows. The FULL OUTER JOIN in Tip 302 included unmatched rows from both tables in the final results table. If you want to include unmatched rows from only one of the two source tables involved in a join (or multi-table query), use either a LEFT OUTER JOIN or a RIGHT OUTER JOIN instead of a FULL OUTER JOIN.

A LEFT OUTER JOIN tells the DBMS to generate a results table that includes joined rows and any unmatched rows from the table listed to the left (that is, listed before) the keyword JOIN in the query's FROM clause.

Suppose, for example, that you want to join the list of all customer names and total purchases from a CUSTOMERS table with the name of salesperson to whom each customer is assigned from the EMPLOYEES table. However, since you want a list of all customers, you want the DBMS to include any customers not currently assigned to an active salesperson as well.

If you execute an INNER JOIN such as

 SELECT cust_ID, RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 total_purchases, emp_ID,
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
 FROM customers c JOIN employees e
 ON (salesperson = emp_ID)
 ORDER BY emp_ID DESC

the results table will display only the name, total purchases, and salesperson's name for customers currently assigned to a salesperson (that is, rows from the CUSTOMERS customer table with a SALESPERSON value that the DBMS can find in the EMP_ID column of the EMPLOYEES table). As a result, any CUSTOMERS table rows with a NULL SALESPERSON or an invalid SALESPERSON will not appear in the SELECT statement's results table.

However, if you execute the same basic query as a LEFT OUTER JOIN

 SELECT cust_ID, RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 total_purchases, emp_ID,
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
 FROM customers c LEFT OUTER JOIN employees e
 ON (salesperson = emp_ID)
 ORDER BY emp_ID DESC

the DBMS will generate a results table similar to

cust_ID Customer total_purchases emp_ID Salesperson
------- --------------- --------------- ------ -----------
6753 Sally Brown 95658.0000 NULL NULL
3758 Richard Stewart 15425.0000 NULL NULL
1001 Linda Reed 158112.0000 101 Konrad King
7159 Walter Fields 96835.0000 101 Konrad King
4859 Sue Coulter 45412.0000 101 Konrad King
2158 Jimmy Tyson 754515.0000 201 Kris Jamsa
5159 James Herrera 74856.0000 201 Kris Jamsa

which includes not only joined rows showing each customer and the customer's salesperson's name, but also any customers not yet assigned to a salesperson or that have values in the SALESPERSON column (from the CUSTOMERS table) that do not match any values in the EMP_ID column (from the EMPLOYEES table). Notice that the DBMS puts NULL values into the results table EMP_ID and SALESPERSON columns (from the EMPLOYEES table) in each of the unmatched rows from the CUSTOMERS table.

  Note 

The keyword LEFT in a LEFT OUTER JOIN tells you that the results table will include unmatched rows from the table to the LEFT of the keyword JOIN in the query's FROM clause. As such, if you changed the FROM clause in the current example to

FROM (employees e JOIN customers c)

you would get a different results table. The new results table, like the one in the current example, would still have the same pairs of joined rows. However, instead of unmatched CUSTOMERS table rows, the results table would include all unmatched from the EMPLOYEES table and no unmatched rows from the CUSTOMERS table.

Understanding the RIGHT OUTER JOIN

In Tip 303, "Understanding the LEFT OUTER JOIN," you learned that a LEFT OUTER JOIN tells the DBMS to generate a results table that includes all related rows from the query's source tables, and any unmatched rows from the table to the left of the keyword JOIN. Conversely, a RIGHT OUTER JOIN tells the DBMS to generate a results table that includes all related rows and any unmatched rows from the table to the right (that is, the table that follows) the keyword JOIN in the SELECT statement's FROM clause.

For example, if you work for a food delivery service that lends freezers to its customers, you can get a complete list of freezers the company owns along with the names of customers who have the freezer currently out on loan by executing a RIGHT OUTER JOIN such as

SELECT RTRIM(f_name)+' '+l_name AS 'Customers Name'
 freezer_inventory.freezer_ID, date_purchased AS
 'Purchased', cost, amt_repairs AS 'Repairs'
FROM customers RIGHT OUTER JOIN freezer_inventory
ON (customers.freezer_ID = freezer_inventory.freezer_ID)
ORDER BY freezer_inventory.freezer_ID

which will generate a results table similar to:

 Customer Name freezer_ID Purchased Cost Repairs
 --------------- ---------- ---------- -------- -------
 NULL 11111 2000-10-11 155.9900 10.0000
 Richard Stewart 15425 1999-01-01 179.9400 .0000
 NULL 15915 1998-05-05 133.4500 .0000
 NULL 16426 1998-07-05 100.4500 12.7500
 NULL 21345 1996-09-09 100.4500 12.2300
 NULL 22222 2000-04-07 255.5800 .0000
 Sue Coulter 45412 1995-03-05 179.9400 45.8900
 NULL 45413 1999-01-01 255.2800 .0000
 NULL 74845 1997-04-01 99.9900 .0000
 James Herrera 74856 1999-05-09 185.2500 12.2500
 Sally Brown 95658 2000-06-01 188.8500 15.5500
 Walter Fields 96835 2000-10-15 155.9900 75.5500
 NULL 97999 1996-09-03 75.9800 44.2500

Notice that the DBMS places a NULL in the results table Customer Name column for each unmatched row from the FREEZER_INVENTORY table. Moreover, the system filters out any unmatched rows from the CUSTOMERS table. (Unmatched CUSTOMERS table rows in the results table would have a customer name and NULL values for the results table columns from the FREEZER_INVENTORY table (FREEZER_ID, PURCHASED, COST, and REPAIRS).

  Note 

If you are doing a RIGHT OUTER JOIN and the results table includes the column used to match (join) related rows in the query's source tables, be sure to use the column from the table listed to the right of the keyword JOIN in the SELECT statement's FROM clause. If you use the column from the table listed to the left of the JOIN instead, results table rows for unmatched rows from the right table will have a NULL value in the matching column. For example, if the SELECT clause in the preceding query listed CUSTOMERS.FREEZER_ID (the matching column from the left table) instead of FREEZER_INVENTORY.FREEZER_ID (the matching column from the right table) results table would show a NULL value for the freezer_ID column for freezers 11111, 15915, 16426, 21345, 22222, 45413, 74845, and 97999. As a result, you would not know the serial numbers for the freezers that should be in your warehouse. Moreover, you would not know the purchase date, initial cost, and cost of repairs for a particular freezer not currently on loan.

Understanding the FULL OUTER JOIN

The FULL OUTER JOIN combines the results of the LEFT OUTER JOIN (which you learned about in Tip 303, "Understanding the LEFT OUTER JOIN") and the RIGHT OUTER JOIN (which you learned about in Tip 304, "Understanding the RIGHT OUTER JOIN"). When the DBMS executes a FULL OUTER JOIN, it generates a results table that contains joined (related) rows along with any unmatched rows from both the table to the left and the table to the right of the keyword JOIN in the SELECT statement's FROM clause.

For example, to list all customer names from the CUSTOMERS table along with all freezers from the FREEZER_INVENTORY table, you could execute the FULL OUTER JOIN

 SELECT RTRIM(f_name)+' '+l_name AS 'Customers Name'
 customers.freezer_ID AS 'Cust_FID',
 freezer_inventory.freezer_ID AS 'Inv_FID',
 date_purchased AS 'Purchased', cost,
 amt_repairs AS 'Repairs'
 FROM customers FULL OUTER JOIN freezer_inventory
 ON (customers.freezer_ID = freezer_inventory.freezer_ID)
 ORDER BY freezer_inventory.freezer_ID

which will produce a results table similar to:

 Customer Name Cust_FID Inv_FID Purchased Cost Repairs
 --------------- -------- ------- ---------- -------- ------
 Jimmy Tyson 754515 NULL NULL NULL NULL
 Linda Reed 158112 NULL NULL NULL NULL
 NULL NULL 11111 2000-10-11 155.9900 10.0000
 Richard Stewart 15425 15425 1999-01-01 179.9400 .0000
 NULL NULL 15915 1998-05-05 133.4500 .0000
 NULL NULL 16426 1998-07-05 100.4500 12.7500
 NULL NULL 21345 1996-09-09 100.4500 12.2300
 NULL NULL 22222 2000-04-07 255.5800 .0000
 Sue Coulter 45412 45412 1995-03-05 179.9400 45.8900
 NULL NULL 45413 1999-01-01 255.2800 .0000
 NULL NULL 74845 1997-04-01 99.9900 .0000
 James Herrera 74856 74856 1999-05-09 185.2500 12.2500
 Sally Brown 95658 95658 2000-06-01 188.8500 15.5500
 Walter Fields 96835 96835 2000-10-15 155.9900 75.5500
 NULL NULL 97999 1996-09-03 75.9800 44.2500

If you are using a FULL OUTER JOIN not only to display all rows in both tables (both matched and unmatched) but also to look for inconsistencies in related tables, be sure the results table includes all of the columns in the pair(s) of columns used to relate the tables.

For example, if the results table in the current example displayed only the FREEZER_ID from the FREEZER_INVENTORY table, it would not show that CUSTOMERS table rows for Jimmy Tyson and Linda Reed have invalid freezer ID numbers in the FREEZER_ID column. A results table without the CUST_FID column (the FREEZER_ID from the CUSTOMERS table) would tell you only that the two customers did not have a freezer with a FREEZER_ID that matched one of the FREEZER_ID values in the FREEZER_INVENTORY table.

Conversely, if the results table had only the CUST_FID (the FREEZER_ID from the CUSTOMERS table) and not the INV_FID (the FREEZER_ID column from the FREEZER_INVENTORY table), you could still tell that Jimmy and Linda had invalid freezer ID numbers. (Unmatched rows from the CUSTOMERS table with an invalid FREEZER_ID in the CUSTOMERS table would have a non-NULL value in the CUST_FID column NULL values for the PURCHASED, COSTS, and REPAIRS columns in the results table.) However, without an INV_FID column, the results table would show only the FREEZER_ID values for the freezers currently on loan, since the DBMS would put a NULL in the CUST_FID column for each unmatched row from the FREEZER_INVENTORY table.

Understanding MS SQL Server OUTER JOIN Notation

When you learned how to write LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN queries in Tips 303-305, all of the examples used an ON clause to specify the relationship for joining matching pairs of rows. However, MS-SQL Server also lets you write LEFT and RIGHT OUTER JOIN queries as multi-table SELECT statements with a WHERE clause.

If you attach an asterisk (*) to the comparison operator in a SELECT statement's WHERE clause, the DBMS treats the multi-table query as an OUTER JOIN. As shown in Table 306.1, the position of the asterisk (*) (to the left or to the right of the comparison operator) tells the DBMS the type of OUTER JOIN you want to perform.

Table 306.1: MS-SQL Server WHERE Clause OUTER JOIN Notation

Type of OUTER JOIN

WHERE Clause Operators

LEFT OUTER JOIN

*=, *<, *>, *<=, *>=, *<>

RIGHT OUTER JOIN

=*, <*, >*, <=*, >=*, <>*

For example, you can write the LEFT OUTER JOIN used as an example in Tip 303, "Understanding the LEFT OUTER JOIN," as a multi-table SELECT statement by attaching an asterisk (*) to the left of the equality (=) comparison operator in the query's WHERE clause:

SELECT cust_ID, RTRIM(c.f_name)+' '+c.l_name AS 'Customer',
 total_purchases, emp_ID,
 RTRIM(e.f_name)+' '+e.l_name AS 'Salesperson'
FROM customers c, employees e
WHERE salesperson *= emp_ID
ORDER BY emp_ID DESC

Similarly, if you attach an asterisk (*) to the right of an equality operator (=) operator in a SELECT statement's WHERE, MS-SQL Server will execute a RIGHT OUTER JOIN. Therefore, you can write the RIGHT OUTER JOIN used as an example in Tip 304, "Understanding the RIGHT OUTER JOIN," as:

 SELECT RTRIM(f_name)+' '+l_name AS 'Customers Name'
 freezer_inventory.freezer_ID, date_purchased AS
 'Purchased', cost, amt_repairs AS 'Repairs'
 FROM customers, freezer_inventory
 WHERE customers.freezer_ID =* freezer_inventory.freezer_ID
 ORDER BY freezer_inventory.freezer_ID

You may be surprised to find (as I was) that MS-SQL Server does not have a WHERE clause operator for a FULL OUTER JOIN. I fully expected the DBMS to use an asterisk (*) on both sides of the comparison operator in the WHERE clause to specify a FULL OUTER JOIN. After all, the FULL OUTER JOIN is a combination of the LEFT OUTER JOIN and the RIGHT OUTER JOIN. Therefore, one would think that the "combination" FULL OUTER JOIN notation would be an asterisk (*) on both sides of the comparison operator in the query's WHERE clause. (After all, the LEFT OUTER JOIN has an asterisk [*] to the left of the comparison operator [*=], and the RIGHT OUTER JOIN has an asterisk [*] to the right of the comparison operator [=*]).

Joining More Than Two Tables in a Single Query

Whether executing an OUTER JOIN or an INNER JOIN, the DBMS always performs its joins two tables at a time. Therefore, to JOIN three or more tables in a single query will require multiple JOIN clauses that combine pairs of source tables, pairs of (interim) joined tables, or a single source table with an interim joined table.

Suppose, for example, that you keep information on the stocks you own in a PORTFOLIO table, and you want to know if insider trading activity (noted in an INSIDER_TRADES table) is at all related to the analyst recommendations (stored in an ANALYST_RECOM-MENDATIONS) table. To answer your question, the DBMS must join the three tables, two tables at a time. First, write a two-table INNER JOIN such as

SELECT p.symbol, trade_type, share_ct, position
FROM portfolio p INNER JOIN insider_trades it
ON p.symbol = it.symbol
ORDER BY p.symbol

to get a list of stocks in the PORTFOLIO table that also have insider trading activity in the INSIDER_TRADES table. The INNER JOIN tells the DBMS to filter out any rows from the PORTFOLIO table that do not have a related row (by matching SYMBOL column values) in the INSIDER_TRADES table, and vice versa. Next, add a LEFT OUTER JOIN that will relate the joined (PORTFOLIO/INSIDER_TRADES) rows in the interim table (by matching SYMBOL column values) with rows in the ANALYST_RECOMMENDATIONS table:

SELECT p.symbol, trade_type, share_ct, position,
 recommendation
FROM portfolio p INNER JOIN insider_trades it
ON p.symbol = it.symbol
LEFT OUTER JOIN analyst_recommendations ar
ON p.symbol = ar.symbol
ORDER BY p.symbol
  Note 

The LEFT OUTER JOIN in current example tells the DBMS that you want the query results to include all matched rows from the ANALYST_RECOMMENDATIONS table and any unmatched joined rows from the INNER JOIN of the PORTFOLIO table and the INSIDER_TRADES table. If you used an INNER JOIN instead of the LEFT OUTER JOIN, the final results table would include only stocks from the PORTFOLIO table that had both insider trades (from the INSIDER_TRADES table) and analyst recommendations (from the ANALYST_RECOMMENDATIONS table). However, the original query was "Are analyst recommendations related to insider trades?" Therefore, you really need a final results table that includes not only your insider-traded stocks with (matching) analyst recommendations, but also all of your insider-traded stocks without (matching) analyst recommendations.

After the DBMS puts all of your insider-traded stocks in the final results table, you can compare the number of insider trades matched with analyst recommendations to the number that are not.

Now, suppose you want to submit a query that can be answered only by joining multiple joined tables instead of by joining a source table to a joined table (as shown in the preceding example). For example, to get a list of stocks in your PORTFOLIO table that have matching rows in the INSIDER_TRADES table or matching rows in the ANALYST_REC-OMMENDATIONS, submit a query similar to

SELECT (CASE WHEN p1.symbol IS NULL THEN p2.symbol
 ELSE p1.symbol
 END) AS 'Stock',
 trade_type, share_ct, position, recommendation
FROM (portfolio p1 INNER JOIN insider_trades it
 ON p1.symbol = it.symbol)
FULL OUTER JOIN
 (portfolio p2 INNER JOIN analyst_recommendations ar
 ON p2.symbol = ar.symbol)
ON p1.symbol = p2.symbol
ORDER BY Stock

which performs a FULL OUTER JOIN of the joined rows from the INNER JOIN of rows from the PORTFOLIO table with rows from the INSIDER_TRADES table, and the joined rows from the INNER JOIN of rows from the PORTFOLIO table with rows from the ANALYST_RECOMMENDATIONS table.

The important thing to understand is that the DBMS always executes a JOIN two tables at a time. However, the two tables you tell the DBMS to join can be two individual source tables, an individual source table and a joined table, or two joined tables. As such, your query can join any number of tables—just be aware that the DBMS works its way through joining them, two tables at a time.

Understanding Non equality INNER and OUTER JOIN Statements

All of the INNER JOIN and OUTER JOIN examples in Tips 300–305 used the equality (=) comparison operator to join rows in one table with related rows in another based on matching pairs of column values. However, both INNER JOIN and OUTER JOIN queries also let you join pairs of rows based on column value relationships other than equality.

Suppose, for example, that an airline wants to generate a list of customers and the rewards they have earned based on the balance in each customer's frequent flier account. An INNER JOIN such as

 SELECT member_ID,
 RTRIM(f_name)+' '+l_name AS 'Member Name', miles_earned,
 miles_required, description AS 'Reward Earned'
 FROM frequent_fliers INNER JOIN rewards
 ON miles_earned >= miles_required
 ORDER BY l_name, f_name, member_ID, miles_required

will generate a results table that lists customer IDs, names, account balances, and rewards earned for all customers eligible for at least one reward.

In the current example, the relationship between a row in the FREQUENT_FLIER table and a row in the REWARDS table is not based on matching MILES_EARNED and MILES_REQUIRED column values alone. Rather, the final results table will include not only rows in which the two columns have matching values, but also rows in which the value in the MILES_EARNED column is greater than the value in the MILES_REQUIRED column.

You can also use nonequality comparison operators in OUTER JOIN queries. For example, if a real estate firm wants to prepare a list of its listed properties and prospective buyers, the FULL OUTER JOIN query

SELECT address, RTRIM(f_name)+' '+l_name AS 'Buyer',
 min_sales_price AS 'Seller Minimum',
 max_purchase_price AS 'Buyer Maximum',
 (max_purchase_price - min_sales_price) AS 'Spread'
FROM listings FULL OUTER JOIN buyers
ON (max_purchase_price >= min_sales_price)
AND (size_required <= square_footage)
AND (bedrooms_required <= num_bedrooms)
ORDER BY address DESC, buyer

will generate a results table that not only matches each potential buyer with listings within the buyer's price range and specifications, but also lists properties with no prospects and buyers with requirements not satisfied by any of the listed properties.

The important thing to understand is that you can use any of the comparison operators (=, <, >, <=, >=, and <>) in the ON clause of an INNER and an OUTER join to express the relationship between pairs of columns in related rows.

Understanding the UNION JOIN

The UNION JOIN, unlike the other joins you learned about in Tips 296-305, makes no attempt to match and actually join a row from one source table with one or more rows from the other source table. Instead, the UNION join simply creates a results table that contains the rows of columns from the first table plus the rows of columns from the second table.

For example, the UNION JOIN

 SELECT * FROM portfolio_a a UNION JOIN portfolio_b b

will generate a single results table with all of the rows and columns from both the PORTFOLIO A table and the PORTFOLIO_B table, as shown in Figure 309.1.

click to expand
Figure 309.1: The source and results tables for a two-table UNION JOIN

While an OUTER JOIN supplies only NULL values for the other table's columns in unmatched rows, every row in a UNION JOIN consists of the column values from one table joined to NULL column values for the other table. In the current example of a UNION JOIN, the DBMS inserts all of the rows from PORTFOLIO_A—joined to a row with a NULL value for each of the columns in PORTFOLIO_B—into the results table. Then the system inserts each of the rows from PORTFOLIO_B—joined to a row with a NULL value for each of the columns in PORTFOLIO_A into the results table.

A UNION JOIN is handy when you want to work with all of the rows from two or more (perhaps union-incompatible) tables as if they were a single table—without losing the ability to tell which rows came from which table.

  Note 

The results table from a UNION JOIN of two tables is not the same the results table generated by the DBMS when you use the UNION operator to SELECT all rows that appear in either or both of the same two tables. (You learned about the UNION operator in Tip 216, "Using the UNION Operator to Select All Rows That Appear in Any or All of Two or More Tables.") Unlike the results table for the UNION JOIN shown in Figure 309, a UNION query of the same two tables

 SELECT * FROM portfolio_a
UNION
 SELECT * FROM portfolio_b

generates a results table similar to:

SYM SHARES AVG_COST
--- ------ --------
BAC 200 64 5/16
CMB 400 50 7/8
HCA 1000 33 5/8
ADP 500 21 3/4
BAC 440 53 7/16
CDN 500 33 3/4

While the results table from the UNION JOIN has seven rows of six columns each, the DBMS generates a results table with six rows of three columns each when you use the UNION operator to combine the rows in the two tables instead.

Using the COALESCE Expression to Refine the Results of a UNION JOIN

In Tip 309, "Understanding the UNION JOIN," you learned that the UNION JOIN of two (or more) tables creates a results table with the rows and columns from all of the source tables. Each row in the results table contains the column values from one of the rows in the source table joined with NULL column values for each of the columns in each of the query's other source tables. As such, the UNION JOIN generates a final results table with a lot of null values.

For example, the UNION JOIN query

 SELECT * FROM
 joint_acct j UNION JOIN SEP_acct s UNION JOIN IRA_acct i

will generate a results table similar to:

 sym shares avg_cst sym shares avg_cst sym shares avg_cst
 ---- ------ ------- ---- ------ ------- ---- ------ -------
 BAC 200 64.625 NULL NULL NULL NULL NULL NULL
 CMB 400 50.875 NULL NULL NULL NULL NULL NULL
 HCA 1000 35.625 NULL NULL NULL NULL NULL NULL
 NULL NULL NULL ADP 500 21.75 NULL NULL NULL
 NULL NULL NULL BAC 440 53.875 NULL NULL NULL
 NULL NULL NULL CDN 500 33.75 NULL NULL NULL
 NULL NULL NULL NULL NULL NULL F 500 41.125
 NULL NULL NULL NULL NULL NULL HCA 300 27.375
 NULL NULL NULL NULL NULL NULL CMB 400 50.875

The large number of NULL values in the final results table makes it hard to discern any meaningful information from its contents. The expression "can't see the forest for the trees" comes to mind.

Fortunately, you can use the COALESCE expression (which you learned about in Tip 112, "Using the COALESCE Expression to Replace NULL Values") to filter out the NULL values from the final results table. For example, if you rewrite the preceding UNION JOIN as

 SELECT
 COALESCE (j.symbol, s.symbol, i.symbol) AS 'Symbol',
 COALESCE (j.account, s.account, i.account) AS 'Account',
 COALESCE (j.shares, s.shares, i.shares) AS 'Shares',
 COALESCE (j.avg_cst, s.avg_cst, i.avg_cst) AS 'Avg Cost'
 FROM (SELECT 'Joint Acct' AS 'Account', j.*) j,
 UNION JOIN (SELECT 'SEP Acct' AS 'Account', s.*) s,
 UNION JOIN (SELECT 'IRA Acct' AS 'Account', i.*) i
 ORDER BY symbol, account

the query's COALESCE expressions will select the one non-NULL value from each set of columns—thereby filtering out the NULL values from the (now) interim results table from the UNION JOIN clauses to generate a final results table similar to:

 Symbol Account Shares Avg Cost
 ------ ---------- ------ --------
 ADP SEP Acct 500 21.7500
 BAC Joint Acct 200 64.6250
 BAC SEP Acct 440 53.8750
 CDN SEP Acct 500 33.7500
 CMB IRA Acct 400 50.8750
 CMB Joint Acct 400 50.8750
 F IRA Acct 500 41.1250
 HCA IRA Acct 300 27.3750
 HCA Joint Acct 1000 35.6250

Moreover, you need only change the ORDER BY clause to

 ORDER BY account, symbol

to sort the list of stocks in order by account—thereby making it easier to see which stocks you are holding in which account.

Understanding the Role of the FROM Clause in a JOIN Statement

The FROM clause in a JOIN, or multi-table SELECT statement, is the virtual table from which the DBMS retrieves the rows of column values it uses as input for the remaining clauses in the query. Suppose, for example, that you have a GRADES table and a STUDENTS table created by:

CREATE TABLE grades CREATE TABLE students
course_ID VARCHAR(15), (SID INTEGER,
section SMALLINT, f_name VARCHAR(20),
student_ID INTEGER, l_name VARCHAR(20))
professor_ID INTEGER,
grade NUMERIC)

If you submit an INNER JOIN query such as

SELECT RTRIM(f_name)+' '+l_name AS 'Student', course_ID,
 section, grade
FROM grades JOIN students ON student_ID = SID

the DBMS will generate a results table that lists student names, classes, and grades.

When executing a query, the BMS first creates a virtual table by joining the rows from the tables listed in the FROM clause that satisfy the search condition(s) in the ON clause. In the current example, the DBMS creates a virtual table consisting of rows from the GRADES table joined with rows from the STUDENTS table in which the value in the STUDENT_ID column from the GRADES table matches the value in the SID column of the STUDENTS table.

Next, the system passes column values from each joined row in the virtual table (and not each of the physical source tables) to the SELECT clause, which filters out unwanted columns and displays the rest in the results table. In the current example, the SELECT clause filters out all but the F_NAME, L_NAME, COURSE_ID, SECTION, and GRADE column values from the joined rows in the GRADES+STUDENTS virtual table.

Similarly, if you have the names of the school's teachers in a PROFESSORS table, you can use a query similar to

SELECT RRIM(s.f_name)+' '+s.l_name AS 'Student', course_ID,
 section, grade, RTRIM(p.f_name)+' '+p.l_name AS 'Teacher'
FROM (grades JOIN students s ON student_ID = SID)
JOIN professors p ON professor_ID = PID

to display the name of the professor who taught the class next to the class ID, student name, and grade.

In the current example, the FROM clause tells the DBMS to create a virtual table with joined rows from the STUDENTS and GRADES table, as it did in the preceding example. Next, the DBMS creates another virtual table by joining rows from the PROFESSORS table with joined rows in the STUDENTS+GRADES virtual table in which the value in the PID column from a row in the PROFESSOR table matches the value in the PROFESSOR_ID column from a joined row in the (virtual) STUDENTS+GRADES table.

Therefore, although the FROM clause in a query such as

 SELECT * FROM students, grades
 WHERE student_ID = SID
 ORDER BY student_ID

might give you the impression that a FROM clause simply lists the names of source table(s) used by a SELECT statement, the FROM is actually the virtual table with the joined rows that serve as the data source for the query. The DBMS first creates a virtual table either from the Cartesian product of the tables listed in the FROM clause or by joining matching rows from related tables as specified in each JOIN statement in the FROM clause. Next the system uses the other clauses in the SELECT statement to filter unwanted rows and columns out of the virtual table and adds the remaining column values as rows to the final results table.

Using the * Operator to Specify All Columns in All or Only in Some Tables in a Multiple Table JOIN

As you learned in Tip 90, "Using the SELECT Statement to Display All Column Values," the asterisk (*) "all columns" operator gives you a shortcut way to tell the DBMS to that you want to include all of a table's columns in the query's results table. For example, the results table for the SELECT statement

 SELECT * FROM students

includes all of the columns in the STUDENTS table. Similarly, the results table for the query

 SELECT * FROM students, grades
 WHERE SID = student_ID

includes all of the columns from the STUDENTS table followed by all of the columns from the GRADES table. If you execute a query for three (or more) tables with only the asterisk (*) all columns operator in the SELECT clause, the results table will include all of the columns from the first table listed in the FROM clause, followed by all of the columns from the second table listed in the FROM clause, followed by all of the columns from the third table listed in the FROM clause, and so on.

To display all of the columns from only some of the tables listed in the FROM clause, use a qualified all columns operator. As you learned in Tip 289, "Using Qualified Column Names in Multi-table Queries That Join Tables That Have the Same Names for One or More Columns," a qualified column name is a column name that includes the name of the table in which the column is located. Similarly, a qualified all columns operator includes the name of the table from which the DBMS is to display all column values. Thus, to display all columns from the STUDENTS table, you could execute a SELECT statement such as:

 SELECT students.* FROM students

which uses the qualified all columns operator STUDENTS.* to tell the DBMS to display all of the column values in the STUDENTS table. Similarly, if you want to display all of the columns in the STUDENTS table while displaying only two of the columns from the GRADES table, you can execute a query similar to

SELECT students.*, grade FROM students, grades
WHERE SID = student_ID

which uses the qualified all columns operator to tell the DBMS to display all columns from the STUDENTS table along with only the COURSE_ID and GRADE columns from the GRADES table.

Using a Table Alias to Perform a Single table JOIN (i e Self JOIN)

Strange as it sounds, some multi-table queries (or joins) involve a relationship a table has with itself (vs. with another table). Suppose, for example, that you offer your customers an incentive to recommend your company to people they know. If you keep track of the referring customer's ID number in a column of the CUSTOMERS table (such as REFERRER, for example), you can submit a query similar to

 SELECT cust_ID, RTRIM(f_name)+' '+l_name AS 'Customer',
 referrer AS 'Referred By'
 FROM customers
 WHERE referrer IS NOT NULL
 ORDER BY "Referred By"

to generate a results table that displays the CUST_ID of the referrer next to the name of each customer that was referred by another customer. However, to list the name of each referrer instead of the CUST_ID number, you will need to join each row in the CUSTOMERS table that has a non-NULL value in the REFERRER column with the row in the CUSTOMERS table that has a matching value in the CUST_ID column. In other words, you need to JOIN rows in the CUSTOMERS table with other rows in the CUSTOMERS table based on matching REFERRER and CUST_ID column values.

Based on what you learned (in Tip 282, "Using the FROM Clause to Perform a Multi-table Query," and Tip 311, "Understanding the Role of the FROM Clause in a JOIN Statement") about the role of the FROM clause in multi-table queries and JOIN statements, you might assume you can join rows in a table with other rows in the same table by simply including the same table name twice in the FROM clause of a SELECT statement such as:

SELECT cust_ID, RTRIM(f_name)+' '+l_name AS 'Customer',
 RTRIM(f_name)+' '+l_name AS 'Referred By'
FROM customers, customers
WHERE referrer = cust_ID
ORDER BY "Referred BY"

When executing the query, the DBMS should first CROSS JOIN the rows in the table with itself. Then the system should use the search condition in the WHERE clause to filter out unwanted joined rows from the (virtual) interim table—those in which the value in the CUST_ID column does not match the value in the REFERRER column.

Unfortunately, SQL will not let you list the same table name more than once in a single FROM clause. As a result, when you submit the preceding example query to the DBMS for execution, the system will display an error message similar to:

 Server: Msg 1013, Level 15, State 1, Line 4
 Tables 'customers' and 'customers' have the same exposed
 names. Use correlation names to distinguish them.

Moreover, if you simply drop the second CUSTOMERS reference from the FROM clause, the DBMS will execute the query. However, as the system goes through the CUSTOMERS table one row at a time, it will display the names from CUSTOMERS table rows in which the REFERRER is the same as the CUST_ID. Since a customer cannot referrer himself (or herself), the query (with only a single reference to CUSTOMERS in the FROM clause) will execute successfully, but will not provide the desired customer names.

To join a table with itself, SQL requires that you use an alias or correlation name for the second reference to the same table. In this way, the DBMS can join two tables with different names when processing the query's FROM clause. Thus, a SELECT statement such as

 SELECT customers.cust_ID, RTRIM(customers.f_name)+
 ' '+customers.l_name AS 'Customer',
 RTRIM(referrers.f_name)+' '+referrers.l_name
 AS 'Referred By'
 FROM customers, customers referrers
 WHERE customers.referrer = referrers.CUST_ID
 ORDER BY "Referred By"

which defines the REFERRERS alias for the CUSTOMERS table, will display referrer's name next to the name of each customer your company now has as a result of the referrer's recommendation.

In the current example, the DBMS generates the Cartesian product of the CUSTOMERS table and the (imaginary, duplicate) REFERRERS table. Then the system filters out those joined rows from the interim (virtual) CUSTOMERS+REFERRERS table in which the value of the REFERRER column from the CUSTOMERS table is not equal to the CUST_ID from the (imaginary) REFERRERS table.

Similarly, you can use a query such as

SELECT c.cust_ID, RTRIM(c.f_name)+' '+c.l_name
 AS 'Customer',
 (SELECT count(*) FROM customers r
 WHERE r.referrer = c.cust_ID) AS 'Referral Count'
FROM customers c
WHERE c.cust_ID IN (SELECT referrer FROM customers)
ORDER BY "Referral Count" DESC

to get a count of customer referrals you received from each referrer. (The WHERE clause in the example eliminates from the results table those customers who have not given you any referrals.)

Understanding Table Aliases

In Tip 313, "Using a Table Alias to a Single-table JOIN," you learned that SQL requires that you use a second name, or alias, whenever you want to perform a self-join that relates rows in a table to other rows in the same table. You can, however, use an alias (also called a correlation name) for a table name in any query—whether SQL requires it or not.

Suppose, for example, that you want to list the names and phone numbers from the rows in a PROSPECTS table owned by another user, SUSAN. You can execute a query similar to

 SELECT RTRIM(susan.prospects.f_name)+
 ' '+susan.prospects.l_name AS 'Prospect'
 susan.prospects.phone_number
 FROM susan.prospects
 ORDER BY 'Prospect'

in which you type the fully qualified column reference throughout the query. (As you learned in Tip 228, "Understanding Column References," you have to specify the fully qualified column name [which includes the username of the table owner, the table name, and the column name] when referencing a column in a table owned by another user.)

Alternatively, you can shorten the column name references in the query to

 SELECT RTRIM(sp.f_name)+' '+sp.l_name AS 'Prospect',
 sp.phone_number
 FROM susan.prospects sp
 ORDER BY 'Prospect'

by defining an alias for the table name in the SELECT statement's FROM clause. To define a table alias (or correlation name), type the alias you want to use immediately after the name of the table in the SELECT statement's FROM clause. You can then use the alias in place of the table name (or the fully qualified table name, such as SUSAN.PROSPECTS, in the current example), throughout the query.

The benefit of using table aliases is more readily apparent when you write a query that displays several columns from the JOIN of two or more tables (especially if the tables have long names or are owned by other usernames). Suppose, for example, that you are the sales manager and want to generate a list of the duplicate phone numbers among the PROSPECTS tables belonging to the three salespeople (FRANK, SUSAN, and RODGER) you manage. If you execute a query such as

 SELECT RTRIM(COALESCE(frank.prospects.f_name,
 rodger.prospects.f_name,susan.prospects.f_name))+' '+
 COALESCE(frank.prospects.l_name,rodger.prospects.l_name,
 susan.prospects.l_name) AS 'Prospect',
 COALESCE(frank.prospects.phone_number,
 rodger.prospects.phone_number,
 susan.prospects.phone_number) AS 'Phone Number',
 (CASE WHEN frank.prospects.phone_number IS NOT NULL
 THEN 'Frank ' ELSE '' END)+
 (CASE WHEN rodger.prospects.phone_number IS NOT NULL
 THEN 'Rodger ' ELSE '' END)+
 (CASE WHEN susan.prospects.phone_number IS NOT NULL
 THEN 'Susan' ELSE '' END) AS 'Being Called By'
 FROM ((frank.prospects FULL JOIN rodger.prospects ON
 frank.prospects.phone_number =
 rodger.prospects.phone_number)
 FULL JOIN susan.prospects ON
 frank.prospects.phone_number =
 susan.prospects.phone_number
 OR rodger.prospects.phone_number =
 susan.prospects.phone_number)
 WHERE frank.prospects.phone_number =
 rodger.prospects.phone_number
 OR frank.prospects.phone_number =
 susan.prospects.phone_number
 OR rodger.prospects.phone_number =
 susan.prospects.phone_number
 ORDER BY "Being Called By", 'Prospect'

the DBMS will display a results table similar to:

 Prospect Phone Number Called By
 -------------- -------------- ------------------
 Bill Barteroma (222)-222-2222 Frank Rodger Susan
 Steve Kernin (555)-555-5555 Frank Rodger Susan
 Frank Burns (111)-222-1111 Frank Susan
 Hawkeye Morgan (333)-222-3333 Frank Susan
 Steve Pierce (444)-222-4444 Frank Susan
 Walter Phorbes (666)-666-6666 Rodger Susan

You can use aliases for each of the three table names and then shorten the query to:

SELECT RTRIM(COALESCE(fp.f_name,rp.f_name,sp.f_name))+' '+
 COALESCE(fp.l_name,rp.l_name,sp.l_name) AS 'Prospect',
 COALESCE(fp.phone_number,rp.phone_number,sp.phone_number)
 AS 'Phone Number',
 (CASE WHEN fp.phone_number IS NOT NULL
 THEN 'Frank ' ELSE '' END)+
 (CASE WHEN rp.phone_number IS NOT NULL
 THEN 'Rodger ' ELSE '' END)+
 (CASE WHEN sp.phone_number IS NOT NULL
 THEN 'Susan' ELSE '' END) AS 'Being Called By'
 FROM ((frank.prospects fp FULL JOIN rodger.prospects rp
 ON fp.phone_number = rp.phone_number)
 FULL JOIN susan.prospects sp
 ON fp.phone_number = sp.phone_number
 OR rp.phone_number = sp.phone_number)
 WHERE fp.phone_number = rp.phone_number
 OR fp.phone_number = sp.phone_number
 OR rp.phone_number = sp.phone_number
 ORDER BY "Being Called By", 'Prospect'

As you can see, using FP, RP, and SP (as aliases) in place of the fully qualified table names in the query not only makes the column name references shorter and less tedious to type, but also makes the query easier to read.

Understanding the Ambiguous Nature of ANY and How SQL Implements It to Mean SOME

In the English language, the word any can have one of two different meanings depending on the context in which it is used.

For example, if you ask, "Do any of you know how to write an SQL query?" you are using any as an existential quantifier, in that you want to know if there is at least one person (among those you are addressing) who knows how to write an SQL query. Conversely, if you say, "I have more fun writing SQL statements than any of you," you are using any as a universal quantifier to mean that you have more fun writing SQL statements than every person in your audience.

SQL lets you use the existential connotation of the keyword ANY in conjunction with one of the six comparison operators (=, <>, <, <=, >, and >=) to compare a single value to a column of data values generated by a subquery. If at least one of the comparisons of the single value to a data value in the set data values from the subquery evaluates to TRUE, then the entire ANY test evaluates to TRUE.

For example, if you have a list of stock symbols and prices in a PORTFOLIO table, and a list of symbols and historical price information in a PRICE_HISTORY table, you could execute a query such as

 SELECT symbol, current_price FROM portfolio
 WHERE current_price >= ANY
 (SELECT closing_price 1.5 FROM price_history
 WHERE price_history.symbol = portfolio.symbol
 AND price_history.trade_date >= (GETDATE() - 180))

to get a list of stock symbols and current prices for stocks with a current price that is greater by 50 percent or more than at least one of the same stock's closing prices during the past 6 months (180 days).

When executing the query, the DBMS tests the data values in the CURRENT_PRICE column of the PORTFOLIO one row at a time. The system retrieves a stock symbol from the SYMBOL column and its price from the CURRENT_PRICE column of a row in the PORTFOLIO table. Then the subquery generates a list with the stock's CLOSING_PRICE values multiplied by 1.5 (150 percent) for each day during the past 6 months (180 days). If there is any (that is, at least one) CURRENT_PRICE value that is greater than or equal to a computed value in the CLOSING_PRICE list (generated by the subquery), the DBMS includes the stock SYMBOL and its CURRENT_PRICE in the results table. If the CURRENT_PRICE is less than all of the computed CLOSING PRICE * 150% values in the list generated by the subquery, then the system does not include the stock's SYMBOL and CURRENT_PRICE in the results table.

Another way to write the same query is

 SELECT symbol, current_price FROM portfolio
 WHERE current_price >= SOME
 (SELECT closing_price * 1.5 FROM price_history
 WHERE price_history.symbol = portfolio.symbol AND
 price_history.date >= (GETDATE() - 180))

which reads: "List the stock SYMBOL and CURRENT_PRICE of each stock whose CURRENT_PRICE is 150 percent or more of some of the values in the list of CLOSING_PRICE values for the same stock the stock during the past 6 months (180 days)."

  Note 

Although the SQL-92 standard specifies the keyword SOME as an alternative to the keyword ANY, some DBMS products do not yet support the use of SOME. Check your system documentation. If your DBMS product lets you do so, use the keyword SOME in place of the keyword ANY because SOME is less confusing—unlike "any" it can never mean "all."

Using EXISTS Instead of COUN(*) to Check Whether a Subquery Returns at Least One Row

The COUNT(*) function, as its name implies, returns the count of the rows in a table that satisfy the search condition(s) in a WHERE clause. The EXISTS predicate, meanwhile, is a BOOLEAN expression that evaluates to TRUE if a subquery generates at least one row of query results. As such, the predicate

 WHERE 0 < (SELECT COUNT(*) FROM 
 WHERE )

is equivalent to:

WHERE EXISTS (SELECT * FROM 
WHERE )

When the subquery in the first WHERE clause returns no rows, its COUNT(*) function returns 0 and the WHERE clause evaluates to FALSE. Similarly, if an identical subquery used in the second WHERE clause returns no rows, EXISTS will (by definition) evaluate to FALSE, which will, in turn, cause the WHERE clause to evaluate to FALSE.

Conversely, if the subquery in the first WHERE clause returns one or more rows, its COUNT(*) function will return a value greater than 0, which, in turn, will cause the WHERE clause to evaluate to TRUE. Similarly, if an identical subquery used in the second WHERE clause returns one or more rows, EXISTS will (by definition) evaluate to TRUE; in turn, this will cause the WHERE clause to evaluate to TRUE.

Therefore, if you want a list of policy holders that have had a least one accident during the past year, you can use the COUNT(*) function in a query similar to:

 SELECT cust_ID, f_name, l_name FROM customers
 WHERE 0 < (SELECT COUNT(*) FROM claims
 WHERE claims.date_of_claim >= (GETDATE() - 365)
 AND claims.cust_ID = customers.cust_ID)

Or, you can use a similar query with an EXISTS predicate instead of the COUNT(*) > 0 comparison test:

 SELECT cust_ID, f_name, l_name FROM customers
 WHERE
 EXISTS (SELECT * FROM claims
 WHERE claims.date_of_claim >= (GETDATE() - 365)
 AND claims.cust_ID = customers.cust_ID)

After you submit either query to the DBMS, the system goes through the CUSTOMERS table one row at a time and executes the subquery in the WHERE clause for each row in the CUSTOMERS table.

When executing the subquery in the first SELECT statement, the system counts the number of rows in the CLAIMS table that have both a DATE_OF_CLAIM within the past year and a CUST_ID column value matching the value in the CUST_ID column of the row from the CUSTOMERS table being processed. If the COUNT(*) function returns a value greater than 0, the WHERE clause evaluates to TRUE and the DBMS adds the customer's CUST_ID and name to the results table.

When processing the second SELECT statement, the DBMS can abort execution of the sub-query as soon as it finds the first row in the CLAIMS table that satisfies the search conditions in the subquery's WHERE clause. If the subquery returns any rows at all, the EXISTS predicate evaluates to TRUE and the system adds the customer's CUST_ID and name from the CUSTOMERS table to the query's results table.

So, should you use EXISTS or COUNT to determine whether at least one row in a table satisfies the search condition in the subquery's WHERE clause? The simple answer is: If you need the COUNT(*) of rows that satisfy the search criteria, use COUNT(*). Otherwise, use EXISTS.

Arguably, using EXISTS to check for the existence of a row that satisfies the WHERE clause search criteria makes the query easier to read than using the comparison of 0 < COUNT(*). Moreover, depending on the strength of the optimizer that your DBMS uses, EXISTS may actually execute faster than COUNT(*), especially if the table in the subquery has a lot of rows.

Remember, the purpose of the COUNT(*) function is to count the number of rows in a table that satisfy the search condition in the (sub)query's WHERE clause. As such, when you use COUNT(*) to check for the existence of at least one row that satisfies the WHERE clause search criteria, some DBMS products will read every row in the (sub)query's table, counting those that satisfy the search criteria in the WHERE clause. Only after it completes its (time-consuming) full-table scan does COUNT(*) return the number of rows that satisfy the WHERE clause search criteria. (If COUNT(*) returns a value of 0, then the WHERE clause evaluates FALSE, and vice versa.)

The EXISTS predicate, on the other hand, does not use any data values returned by the sub-query's SELECT * clause. Moreover, the DBMS does not care about the exact number of rows in the subquery's table that satisfy the search condition in its WHERE clause. As such, the DBMS can stop retrieving and filtering rows from the subquery's source table as soon as it finds the first row whose column values satisfy the search conditions in the subquery's WHERE clause.

In short EXISTS, may outperform the COUNT(*) function when used to determine whether at least one row satisfies the criteria in a subquery's WHERE clause. EXISTS needs to process only 2 rows in a 10,000,000-row table if the second row satisfies the search condition in the WHERE clause. The COUNT(*) function, on the other hand, must read all 10,000,000 rows and then compare the count of rows that satisfy the search condition(s) to 0.

Understanding Why the Expression NULL = NULL Evaluates to FALSE

As you learned in Tip 30, "Understanding the Value of NULL," the value of NULL is not 0. In fact, NULL really has no specific value because a NULL in a column (or expression) represents a missing or unknown quantity. Moreover, whenever an expression uses one of the six SQL comparison operators (=,<>,>,>=,<,<=) to compare a NULL value with any other value (including another NULL), the expression always evaluates to NULL.

For example, the results table for the query

 SELECT * FROM customers
 WHERE salesperson = NULL

will always have no rows because the expression in the WHERE clause will never evaluate to TRUE. (Because the WHERE clause search condition uses the equality (=) comparison operator to compare a NULL value to another value, the WHERE clause will always evaluate to NULL, not to TRUE.)

The WHERE clause in the current example evaluates to NULL even if the DBMS encounters a row from the CUSTOMERS table that has a NULL value in its SALESPERSON column. That's strange behavior, considering that the search condition in the WHERE clause becomes

 NULL = NULL

when the value in the SALESPERSON column is NULL. However, because the values on both sides of the equals (=) sign are unknown, the system cannot tell whether one NULL (unknown) value is equal to the other. As such, the expression

 NULL = NULL

evaluates neither to TRUE nor to FALSE, but to NULL.

  Note 

To query a table for rows with a NULL value in a specific column, you must use the IS NULL value test that you learned about in Tip 97, "Understanding NULL Value Conditions When Selecting Rows Using Comparison Predicates." For example, to get a list of CUSTOMERS table rows with a NULL in the SALESPERSON column, rewrite the query in the current example as:

SELECT * FROM customers
WHERE salesperson IS NULL

Understanding When to Use the ON Clause and When to Use the WHERE Clause

An ON clause is always used as a filter for one of the JOIN clauses that you learned about in Tips 300-305. A WHERE clause, meanwhile, is used to filter unwanted rows returned by both single-table and multi-table queries. Therefore, an ON clause (if present) must always follow a JOIN clause. A WHERE clause, on the other hand, can be used in both queries with a JOIN clause and those without.

The ON clause in an INNER JOIN query is functionally equivalent to a WHERE clause with an identical search condition. Therefore, if you want to generate a results table listing the name of each instructor (from a TEACHERS table) next to the description of the class(es) (from the CLASSES table) that each professor teaches, this INNER JOIN (with an ON clause)

 SELECT course_ID, description,
 RTRIM(f_name)+' '+l_name AS Instructor
 FROM classes JOIN teachers
 ON classes.professor_ID = teachers.PID

and this multi-table SELECT statement (with a WHERE clause)

 SELECT course_ID, description,
 RTRIM(f_name)+' '+l_name AS Instructor
 FROM classes, teachers
 WHERE classes.professor_ID = teachers.PID

will produce the same results table.

Conversely, when executing one of the OUTER JOIN queries (LEFT, RIGHT, or FULL OUTER JOIN) using a WHERE clause with identical search conditions instead of an ON clause will not produce the same results table. In an OUTER JOIN, both the ON clause and the WHERE clause filter out rows that do not satisfy one or more of the search conditions in the WHERE clause. However, rows filtered out (that is, rejected) by a WHERE clause in an OUTER JOIN are not included in the results table. Meanwhile, the ON clause in an OUTER JOIN first filters out unwanted rows and then includes some or all of the rejected rows as NULL-valued columns in the results table.

For example, suppose that some professors in the TEACHERS table have not yet been assigned to teach a class. The LEFT OUTER JOIN (without a WHERE clause) shown in the top pane of the MS-SQL Server Query Analyzer window shown in Figure 318.1 will produce the results table shown in Figure 318.1's lower pane.

click to expand
Figure 318.1: The MS-SQL Server Query Analyzer Window for a LEFT OUTER JOIN of two tables

Conversely, this LEFT OUTER JOIN (with a WHERE clause)

 SELECT course_ID, description,
 RTRIM(f_name)+' '+l_name AS Instructor
 FROM teachers LEFT JOIN classes
 ON teachers.PID = classes.professor_ID
 WHERE teachers.PID = classes.professor_ID

will produce a results table similar to this:

course_ID description Instructor
--------- -------------------------------- -----------
CS-101 Introduction To Computer Science Kris Jamsa
COMP101 Beginning Composition Grady Booch

This results table has no NULL values in any of its columns.

Understanding How to Use Nested Queries to Work with Multiple Tables at Once

A subquery or nested query is a SELECT statement within another SQL statement. Although a nested query is always a SELECT statement, the enclosing SQL statement (which contains the nested SELECT statement (or subquery) may be a DELETE, an INSERT, an UPDATE, or another SELECT statement. In fact, a nested query may even be a subquery within another subquery.

Because the enclosing SQL statement and its subquery can each work on different tables, you can sometimes use a (single-table) query that contains a (single table) subquery in place of a (two-table) JOIN. For example, you can use a nested query instead of a JOIN to relate information in two tables when you want to include data values from only one of the tables in the query results. As an example, this nested query

 SELECT CID AS 'Cust ID',
 RTRIM(f_name)+' '+l_name AS 'Customer'
 FROM customers
 WHERE EXISTS (SELECT * FROM trades
 WHERE trade_date >= GETDATE() - 365
 AND cust_ID = CID

will list the names of all customers that made at least one stock trade during the past year.

Conversely, when you need to generate a results table with values from more than one table, you will often have to execute either a JOIN or a multi-table query with a WHERE clause. For example, you would use a query such as

SELECT trade_date, symbol, shares * price AS 'Total Trade',
 CID, RTRIM(f_name)+' '+l_name AS 'Customer'
FROM customers JOIN trades
 ON CID = cust_ID
WHERE shares * price > 100000
 AND trade_date >= GETDATE() -365
ORDER BY Customer

to join rows from the CUSTOMERS table with rows from the TRADES table to generate a list of customer names and trade dollar values of any trades of $100,000 or more made within the past year.

Multi-table queries and nested subqueries are not, however, mutually exclusive. Suppose, for example, that you want to include aggregate totals such as a SUM() of the dollar amount and a COUNT() of the number of trades, along with the other information in the preceding example. A query such as

 SELECT trade_date, symbol, shares * price AS 'Total Trade',
 (SELECT COUNT(*) FROM trades
 WHERE trade_date > GETDATE() - 365
 AND cust_ID = CID) AS 'Count'
 (SELECT SUM(price) * SUM(shares),
 FROM trades
 WHERE trades.trade_date >= GETDATE() - 365
 AND cust_ID = CID) AS 'Total $ Volume),
 CID AS 'Cust ID'-, TRIM(f_name)+' '+l_name AS 'Customer'
 FROM customers JOIN trades
 ON CID = cust_CID
 WHERE shares * price >= 100000
 AND trade_date >= GETDATE() -365
 ORDER BY Customer

will execute the same two-table JOIN found in the preceding example. However, the query's results table includes the values returned by the two aggregate function subqueries (in the SELECT cause).

In short, a nested query works independent of the enclosing SQL statement and can make use of any of the column values from the tables listed in the enclosing statement's FROM clause. You can use nested queries to perform multi-table operations without having to JOIN rows in multiple related tables. However, if you need data values from multiple tables, or if you want individual column values and aggregate function values in the same row in the results table, you can nest a subquery with the aggregate function that you need in the SELECT clause of a multi-table query or JOIN.

Understanding SQL Subqueries



SQL Tips & Techniques
SQL Tips & Techniques (Miscellaneous)
ISBN: 1931841454
EAN: 2147483647
Year: 2000
Pages: 632

Flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net