Advanced Joins

You've already looked at a basic two-table join in Chapter 1. But joins can get much more complicated than that, and badly written joins are the culprits in the majority of serious performance problems.

Let's return to the tables created in the previous chapter. If you skipped that chapter, you can re-create them by running the following statements:

 CREATE TABLE customer (   id int(11) default NULL,   first_name varchar(30) default NULL,   surname varchar(40) default NULL ) TYPE=MyISAM; INSERT INTO customer VALUES (1, 'Yvonne', 'Clegg'); INSERT INTO customer VALUES (2, 'Johnny', 'Chaka-Chaka'); INSERT INTO customer VALUES (3, 'Winston', 'Powers'); INSERT INTO customer VALUES (4, 'Patricia', 'Mankunku'); CREATE TABLE sales (   code int(11) default NULL,   sales_rep int(11) default NULL,   customer int(11) default NULL,   value int(11) default NULL ) TYPE=MyISAM; INSERT INTO sales VALUES (1, 1, 1, 2000); INSERT INTO sales VALUES (2, 4, 3, 250); INSERT INTO sales VALUES (3, 2, 3, 500); INSERT INTO sales VALUES (4, 1, 4, 450); INSERT INTO sales VALUES (5, 3, 1, 3800); INSERT INTO sales VALUES (6, 1, 2, 500); CREATE TABLE sales_rep (   employee_number int(11) default NULL,   surname varchar(40) default NULL,   first_name varchar(30) default NULL,   commission tinyint(4) default NULL,   date_joined date default NULL,   birthday date default NULL ) TYPE=MyISAM; INSERT INTO sales_rep VALUES (1, 'Rive', 'Sol', 10,  '2000-02-15', '1976-03-18'); INSERT INTO sales_rep VALUES (2, 'Gordimer', 'Charlene', 15,  '1998-07-09', '1958-11-30'); INSERT INTO sales_rep VALUES (3, 'Serote', 'Mike', 10,  '2001-05-14', '1971-06-18'); INSERT INTO sales_rep VALUES (4, 'Rive', 'Mongane', 10,  '2002-11-23', '1982-01-04'); 

Let's start with a basic join:

mysql> SELECT sales_rep, customer,value, first_name,surname  FROM sales, sales_rep WHERE code=1 AND  sales_rep.employee_number=sales.sales_rep; +-----------+----------+-------+------------+---------+ | sales_rep | customer | value | first_name | surname | +-----------+----------+-------+------------+---------+ |         1 |        1 |  2000 | Sol        | Rive    | +-----------+----------+-------+------------+---------+

Because the relationship between the sales_rep and sales tables is on the employee_number, or sales_rep field, those two fields form the join condition of the WHERE clause.

To do a more complex join over all three tables is not much more complicated. If you wanted to return the first names and surnames of both the sales rep and the customer, as well as the value of the sale, you'd use this query:

mysql> SELECT sales_rep.first_name,sales_rep.surname,  value,customer.first_name, customer.surname FROM  sales,sales_rep,customer WHERE sales_rep.employee_number =  sales.sales_rep AND customer.id = sales.customer; +------------+----------+-------+------------+-------------+ | first_name | surname  | value | first_name | surname     | +------------+----------+-------+------------+-------------+ | Sol        | Rive     |  2000 | Yvonne     | Clegg       | | Mike       | Serote   |  3800 | Yvonne     | Clegg       | | Sol        | Rive     |   500 | Johnny     | Chaka-Chaka | | Charlene   | Gordimer |   500 | Winston    | Powers      | | Mongane    | Rive     |   250 | Winston    | Powers      | | Sol        | Rive     |   450 | Patricia   | Mankunku    | +------------+----------+-------+------------+-------------+ 

The employee_number field of the sales_rep table is related to the sales_rep field of the sales table. And the id field in the customer table is related to the customer field of the sales table. Hence, these make up the two join conditions of the WHERE clause. There are no other conditions, so this query lists all the sales for which there are corresponding rows in both the sales_rep and customer tables.

Inner Joins

Inner joins are just another way of describing the first kind of join you learned. The following two queries are identical:

mysql> SELECT first_name,surname,value FROM customer,sales WHERE  id=customer; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | +------------+-------------+-------+ 6 rows in set (0.00 sec) mysql> SELECT first_name,surname,value FROM customer INNER JOIN sales  ON id=customer; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | +------------+-------------+-------+ 

Left Joins (Left Outer Joins)

Let's say you make another sale. Except this time it's a cash sale, and the customer has left with the goods before you realize you forgot to capture their details. There's no problem because you can still add the data to the sales table using a NULL value for the customer:

mysql> INSERT INTO sales(code,sales_rep,customer,value) VALUES  (7, 2,NULL,670); 

Let's run the query that returns the value and the names of the sales reps and customers for each sale again:

mysql> SELECT sales_rep.first_name, sales_rep.surname, value,  customer.first_name, customer.surname FROM sales,sales_rep,  customer WHERE sales_rep.employee_number = sales.sales_rep  AND customer.id = sales.customer; +------------+----------+-------+------------+-------------+ | first_name | surname  | value | first_name | surname     | +------------+----------+-------+------------+-------------+ | Sol        | Rive     |  2000 | Yvonne     | Clegg       | | Mike       | Serote   |  3800 | Yvonne     | Clegg       | | Sol        | Rive     |   500 | Johnny     | Chaka-Chaka | | Charlene   | Gordimer |   500 | Winston    | Powers      | | Mongane    | Rive     |   250 | Winston    | Powers      | | Sol        | Rive     |   450 | Patricia   | Mankunku    | +------------+----------+-------+------------+-------------+

What's going on? Where is the new sale? The problem here is that, because the customer is NULL in the sales table, the join condition is not fulfilled. Remember, when you looked at the operators earlier in this chapter, you saw that the = operator excludes NULL values. The <=> operator won't help because there are no NULL records in the customer table, so even a null-friendly equality check won't help.

The solution here is to do an OUTER JOIN. These return a result for each matching result of the one table, whether or not there is an associated record in the other table. So even though the customer field is NULL in the sales table and has no relation with the customer table, a record will be returned. A LEFT OUTER JOIN is one which returns all matching rows from the left table, regardless of whether there is a corresponding row in the right table. The syntax for a LEFT JOIN (a shorter name for a LEFT OUTER JOIN) is as follows:

SELECT field1, field2 FROM table1 LEFT JOIN table2 ON field1=field2

Let's first try a simple example first, performing a LEFT JOIN on just the customer and sales tables.

mysql> SELECT first_name,surname,value FROM sales LEFT JOIN customer  ON id=customer; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | | NULL       | NULL        |   670 | +------------+-------------+-------+ 

All seven records are returned, as expected.

Table order in a LEFT JOIN is important. The table from which all matching rows are returned must be the left table (before the LEFT JOIN keywords). If you'd reversed the order and tried the following:

mysql> SELECT first_name,surname,value FROM customer LEFT JOIN sales  ON id=customer; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | +------------+-------------+-------+

then you'd have seen only the six records. Because the left table is the customer table in this query, and the join matches only those records that exist in the left table, the sales record with the NULL customer (meaning there is no relation to the customer table) is not returned.

Note 

A LEFT JOIN was more frequently called a LEFT OUTER JOIN in the past. For the sake of familiarity, MySQL accepts this term, too.

Of course, you can extend this across a third table to answer the original query (names of customers and sales reps, as well as sales values, for each sale). See if you can do it. This is my suggestion:

mysql> SELECT sales_rep.first_name, sales_rep.surname, value,  customer.first_name, customer.surname FROM sales LEFT JOIN  sales_rep ON sales_rep.employee_number = sales.sales_rep  LEFT JOIN customer ON customer.id = sales.customer; +------------+----------+-------+------------+-------------+ | first_name | surname  | value | first_name | surname     | +------------+----------+-------+------------+-------------+ | Sol        | Rive     |  2000 | Yvonne     | Clegg       | | Mongane    | Rive     |   250 | Winston    | Powers      | | Charlene   | Gordimer |   500 | Winston    | Powers      | | Sol        | Rive     |   450 | Patricia   | Mankunku    | | Mike       | Serote   |  3800 | Yvonne     | Clegg       | | Sol        | Rive     |   500 | Johnny     | Chaka-Chaka | | Charlene   | Gordimer |   670 | NULL       | NULL        | +------------+----------+-------+------------+-------------+ 

Right Joins (Right Outer Joins)

Right joins are exactly the same as left joins, except that the order of the join is reversed. To return for each sale the names of all customers, including those sales where there is no corresponding customer data, you have to put the sales table on the right in your join:

mysql> SELECT first_name,surname,value FROM customer RIGHT JOIN  sales ON id=customer; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | | NULL       | NULL        |   670 | +------------+-------------+-------+

Tip 

If you get confused by which table to put on which side for left and right joins, remember that a right join reads all records from the right table, including nulls, while a left join reads all records from the left table, including nulls.

Full Outer Joins

At the time of writing this, MySQL does not yet support full outer joins. These are joins where each record from the first table, including those with no match in the second table, is returned along with each record in the second table, including those with no match in the first. This is the equivalent of a left and a right join. MySQL will support this in the near future, so check the latest documentation. The syntax is the same as for the other joins:

SELECT field1,field2 FROM table1 FULL OUTER JOIN table2 

Natural Joins and the USING Keyword

The id field in the customer table and the customer field in the sales table are related. If you have given them the same name, SQL has a few shortcuts that make the JOIN statements less unwieldy. Let's rename sales.customer to sales.id for now, to demonstrate:

mysql> ALTER TABLE sales CHANGE customer id INT; 

Now because the two tables have fields with identical names, you can perform a NATURAL JOIN, which looks for identically named fields on which to perform the JOIN:

mysql> SELECT first_name,surname,value FROM customer NATURAL JOIN  sales; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | +------------+-------------+-------+

This is identical to the following:

mysql> SELECT first_name,surname,value FROM customer INNER JOIN  sales ON customer.id=sales.id; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | +------------+-------------+-------+

There is only one field that is identical in both tables, but if there were more, each of these would become part of the join condition.

A NATURAL JOIN can also be a LEFT or RIGHT JOIN. The following two statements are identical:

mysql> SELECT first_name,surname,value FROM customer LEFT JOIN sales  ON customer.id=sales.id; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | +------------+-------------+-------+ mysql> SELECT first_name,surname,value FROM customer NATURAL LEFT JOIN sales; +------------+-------------+-------+ | first_name | surname     | value | +------------+-------------+-------+ | Yvonne     | Clegg       |  2000 | | Yvonne     | Clegg       |  3800 | | Johnny     | Chaka-Chaka |   500 | | Winston    | Powers      |   250 | | Winston    | Powers      |   500 | | Patricia   | Mankunku    |   450 | +------------+-------------+-------+

The USING keyword allows a bit more control than a NATURAL JOIN. If there is more than one identical field in the two tables, this keyword allows you to specify which of these fields are used as join conditions. For example, taking two tables A and B with identical fields a,b,c,d, the following are equivalent:

SELECT * FROM A LEFT JOIN B USING (a,b,c,d) SELECT * FROM A NATURAL LEFT JOIN B

The USING keyword allows more flexibility because it allows you to use only some of the four identical fields. For example:

SELECT * FROM A LEFT JOIN B USING (a,d)
Note 

For purposes of a NATURAL JOIN, identical means identical in name, not in type. The two id fields could be INT and DECIMAL, or even INT and VARCHAR, as long as they have the same name.

Returning Results Found in One Table, but Not the Other

So far you've returned rows that appear in both tables you're joining with an INNER JOIN. With the OUTER JOIN, you also returned records from one table where wasn't a corresponding match in the other table. Quite often it's useful to do the converse and return results that are found only in one table, but not the other. To demonstrate this, let's first add a new sales_ rep:

mysql> INSERT INTO sales_rep VALUES(5, 'Jomo', 'Ignesund', 10,  '2002-11-29', '1968-12-01'); 

Now, if you do an INNER JOIN, you can return all the sales reps who have made a sale:

mysql> SELECT DISTINCT first_name,surname FROM sales_rep  INNER JOIN sales ON sales_rep=employee_number; +------------+----------+ | first_name | surname  | +------------+----------+ | Sol        | Rive     | | Mongane    | Rive     | | Charlene   | Gordimer | | Mike       | Serote   | +------------+----------+

You use the DISTINCT keyword to avoid duplicates because there are sales reps who have made more than one sale.

But the reverse is as useful. The boss is raging and heads must roll. Which sales reps have not made any sales? You can find this information by seeing which sales reps appear in the sales_rep table but do not have a corresponding entry in the sales table:

mysql> SELECT first_name,surname FROM sales_rep LEFT JOIN sales  ON sales_rep=employee_number WHERE sales_rep IS NULL; +------------+---------+ | first_name | surname | +------------+---------+ | Ignesund   | Jomo    | +------------+---------+

You had to do a LEFT JOIN (an OUTER JOIN, as opposed to an INNER JOIN) because only an OUTER JOIN returns any records that don't correspond (or null values) in the first place.

Joining Results with UNION

MySQL 4 introduced the long-awaited ANSI SQL UNION statement, which combines the results of different SELECT statements into one. Each statement must have the same number of columns.

To see the use of this statement, let's create another table, containing a list of customers handed over from the previous owner of your store:

mysql> CREATE TABLE old_customer(id int, first_name varchar(30),  surname varchar(40)); mysql> INSERT INTO old_customer VALUES (5432, 'Thulani', 'Salie'),  (2342, 'Shahiem', 'Papo'); 

Now, to get a list of all customers, both old and new, you can use the following:

mysql> SELECT id, first_name, surname FROM old_customer UNION SELECT  id, first_name,surname FROM customer; +------+------------+-------------+ | id   | first_name | surname     | +------+------------+-------------+ | 5432 | Thulani    | Salie       | | 2342 | Shahiem    | Papo        | |    1 | Yvonne     | Clegg       | |    2 | Johnny     | Chaka-Chaka | |    3 | Winston    | Powers      | |    4 | Patricia   | Mankunku    | +------+------------+-------------+

You can also order the output as normal. You just need to be careful about whether the ORDER BY clause applies to the entire UNION or just to the one SELECT:

mysql> SELECT id, first_name, surname FROM old_customer UNION SELECT  id, first_name, surname FROM customer ORDER BY surname,first_name; +------+------------+-------------+ | id   | first_name | surname     | +------+------------+-------------+ |    2 | Johnny     | Chaka-Chaka | |    1 | Yvonne     | Clegg       | |    4 | Patricia   | Mankunku    | | 2342 | Shahiem    | Papo        | |    3 | Winston    | Powers      | | 5432 | Thulani    | Salie       | +------+------------+-------------+

The sorting is performed on the entire UNION. If you just want to sort the second SELECT, you'd need to use parentheses:

mysql> SELECT id, first_name, surname FROM old_customer UNION  (SELECT id, first_name, surname FROM customer ORDER BY surname,  first_name); +------+------------+-------------+ | id   | first_name | surname     | +------+------------+-------------+ | 5432 | Thulani    | Salie       | | 2342 | Shahiem    | Papo        | |    2 | Johnny     | Chaka-Chaka | |    1 | Yvonne     | Clegg       | |    4 | Patricia   | Mankunku    | |    3 | Winston    | Powers      | +------+------------+-------------+

Tip 

Whenever there's possible ambiguity, such as where the sorting applies, use parentheses. It ensures the sort is applied to the correct part and also means that anyone else trying to interpret your statements will have an easier time. Don't assume everyone else knows as much as you!

By default, UNION does not return duplicate results (similar to the DISTINCT keyword). You can override this by specifying that all results must be returned with the ALL keyword:

mysql> SELECT id FROM customer UNION ALL SELECT id FROM sales; +------+ | id   | +------+ |    1 | |    2 | |    3 | |    4 | |    1 | |    3 | |    3 | |    4 | |    1 | |    2 | | NULL | +------+

UNION requires some thought. You can quite easily put together unrelated fields as long as the number of fields returned by each SELECT match, and the data types are the same. MySQL will happily return these results to you, even though they are meaningless:

mysql> SELECT id, surname FROM customer UNION ALL SELECT value,  sales_rep FROM sales; +------+-------------+ | id   | surname     | +------+-------------+ |    1 | Clegg       | |    2 | Chaka-Chaka | |    3 | Powers      | |    4 | Mankunku    | | 2000 | 1           | |  250 | 4           | |  500 | 2           | |  450 | 1           | | 3800 | 3           | |  500 | 1           | |  670 | 2           | +------+-------------+

Sub-selects

Many queries make use of a SELECT within a SELECT. Sub-selects are scheduled for implementation in version 4.1. Until now, MySQL did not allow sub-selects, partly by design (they are often less efficient than alternatives, as you'll see later) and partly because it was low on a list of 1,001 other "vital" things to implement. With MySQL about to implement them, you'll need to see how they work.

Rewriting Sub-selects as Joins

Let's take a query where you want to return all the sales reps who have made a sale with a value greater than $1,000. If you can run a sub-select, try the following:

mysql> SELECT first_name,surname FROM sales_rep WHERE  sales_rep.employee_number IN (SELECT code FROM sales WHERE  value>1000); +------------+---------+ | first_name | surname | +------------+---------+ | Sol        | Rive    | +------------+---------+

Only Sol Rive has managed this feat.

The query resolves by first resolving the inner SELECT—that is, performing the following step first:

mysql> SELECT id FROM sales WHERE value>1000; +------+ | id   | +------+ |    1 | |    1 | +------+

and then the remainder:

mysql> SELECT first_name, surname FROM sales_rep WHERE  sales_rep.employee_number IN (1); +------------+---------+ | first_name | surname | +------------+---------+ | Sol        | Rive    | +------------+---------+

But you already know another, better way of doing this, which is the join:

mysql> SELECT DISTINCT first_name,surname FROM sales_rep INNER JOIN sales ON  employee_number=id WHERE value>1000; +------------+---------+ | first_name | surname | +------------+---------+ | Sol        | Rive    | | Sol        | Rive    | +------------+---------+ 

or, alternatively:

mysql> SELECT DISTINCT first_name,surname FROM sales_rep,sales WHERE  sales.id=sales_rep.employee_number AND value>1000; +------------+---------+ | first_name | surname | +------------+---------+ | Sol        | Rive    | | Sol        | Rive    | +------------+---------+

The reason I say better is that often the join is a more efficient way of doing the query and will return the results quicker. It may not make much difference on a tiny database, but in large, heavily used tables where performance is vital, you'll want every extra microsecond you can get out of MySQL.

To return all the sales reps who have not yet made a sale, you could again use a sub-select, if your DBMS allows it, as follows:

mysql> SELECT first_name,surname FROM sales_rep WHERE employee_number  NOT IN (SELECT DISTINCT code from sales); +------------+---------+ | first_name | surname | +------------+---------+ | Ignesund   | Jomo    | +------------+---------+

But you know another way, which is better:

mysql> SELECT DISTINCT first_name,surname FROM sales_rep LEFT JOIN sales ON  sales_rep=employee_number WHERE sales_rep IS NULL; +------------+---------+ | first_name | surname | +------------+---------+ | Ignesund   | Jomo    | +------------+---------+



Mastering MySQL 4
Mastering MySQL 4
ISBN: 0782141625
EAN: 2147483647
Year: 2003
Pages: 230
Authors: Ian Gilfillan

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net