Recipe 12.1. Finding Rows in One Table That Match Rows in Another


Problem

You need to write a query that uses information from more than one table.

Solution

Use a jointhat is, a query that lists multiple tables in its FROM clause and tells MySQL how to match information from them.

Discussion

The essential idea behind a join is that it combines rows in one table with rows in one or more other tables. Joins enable you to combine information from multiple tables when each table contains only part of the information in which you're interested. Output rows from a join contain more information than rows from either table by itself.

A complete join that produces all possible row combinations is called a Cartesian product. For example, joining each row in a 100-row table to each row in a 200-row table produces a result containing 100 x 200, or 20,000 rows. With larger tables, or joins between more than two tables, the result set for a Cartesian product can easily become immense. Because of that, and because you rarely want all the combinations anyway, a join normally includes an ON or USING clause that specifies how to join rows between tables. (This requires that each table have one or more columns of common information that can be used to link them together logically.) You can also include a WHERE clause that restricts which of the joined rows to select. Each of these clauses narrows the focus of the query.

This recipe introduces basic join syntax and demonstrates how joins help you answer specific types of questions when you are looking for matches between tables. Later recipes show how to identify mismatches between tables (Section 12.2) and how to compare a table to itself (Section 12.3). The examples assume that you have an art collection and use the following two tables to record your acquisitions. artist lists those painters whose works you want to collect, and painting lists each painting that you've actually purchased:

CREATE TABLE artist (   a_id  INT UNSIGNED NOT NULL AUTO_INCREMENT, # artist ID   name  VARCHAR(30) NOT NULL,                 # artist name   PRIMARY KEY (a_id),   UNIQUE (name) ); CREATE TABLE painting (   a_id  INT UNSIGNED NOT NULL,                # artist ID   p_id  INT UNSIGNED NOT NULL AUTO_INCREMENT, # painting ID   title VARCHAR(100) NOT NULL,                # title of painting   state VARCHAR(2) NOT NULL,                  # state where purchased   price INT UNSIGNED,                         # purchase price (dollars)   INDEX (a_id),   PRIMARY KEY (p_id) ); 

You've just begun the collection, so the tables contain only the following rows:

mysql> SELECT * FROM artist ORDER BY a_id; +------+----------+ | a_id | name     | +------+----------+ |    1 | Da Vinci | |    2 | Monet    | |    3 | Van Gogh | |    4 | Picasso  | |    5 | Renoir   | +------+----------+ mysql> SELECT * FROM painting ORDER BY a_id, p_id; +------+------+-------------------+-------+-------+ | a_id | p_id | title             | state | price | +------+------+-------------------+-------+-------+ |    1 |    1 | The Last Supper   | IN    |    34 | |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 |    3 | Starry Night      | KY    |    48 | |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 |    5 | The Rocks         | IA    |    33 | |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+------+-------------------+-------+-------+ 

The low values in the price column of the painting table betray the fact that your collection actually contains only cheap facsimiles, not the originals. Well, that's all right: who can afford the originals?

Each table contains partial information about your collection. For example, the artist table doesn't tell you which paintings each artist produced, and the painting table lists artist IDs but not their names. To use the information in both tables, you can ask MySQL to show you various combinations of artists and paintings by writing a query that performs a join. A join names two or more tables after the FROM keyword. In the output column list, you can name columns from any or all the joined tables, or use expressions that are based on those columns, tbl_name .* to select all columns from a given table, or * to select all columns from all tables.

The simplest join involves two tables and selects all columns from each. With no restrictions, the join generates output for all combinations of rows (that is, the Cartesian product). The following complete join between the artist and painting tables shows this:

mysql> SELECT * FROM artist, painting; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 |    1 | The Last Supper   | IN    |    34 | |    2 | Monet    |    1 |    1 | The Last Supper   | IN    |    34 | |    3 | Van Gogh |    1 |    1 | The Last Supper   | IN    |    34 | |    4 | Picasso  |    1 |    1 | The Last Supper   | IN    |    34 | |    5 | Renoir   |    1 |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    1 |    2 | The Mona Lisa     | MI    |    87 | |    2 | Monet    |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    1 |    2 | The Mona Lisa     | MI    |    87 | |    4 | Picasso  |    1 |    2 | The Mona Lisa     | MI    |    87 | |    5 | Renoir   |    1 |    2 | The Mona Lisa     | MI    |    87 | |    1 | Da Vinci |    3 |    3 | Starry Night      | KY    |    48 | |    2 | Monet    |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    4 | Picasso  |    3 |    3 | Starry Night      | KY    |    48 | |    5 | Renoir   |    3 |    3 | Starry Night      | KY    |    48 | |    1 | Da Vinci |    3 |    4 | The Potato Eaters | KY    |    67 | |    2 | Monet    |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | |    4 | Picasso  |    3 |    4 | The Potato Eaters | KY    |    67 | |    5 | Renoir   |    3 |    4 | The Potato Eaters | KY    |    67 | |    1 | Da Vinci |    3 |    5 | The Rocks         | IA    |    33 | |    2 | Monet    |    3 |    5 | The Rocks         | IA    |    33 | |    3 | Van Gogh |    3 |    5 | The Rocks         | IA    |    33 | |    4 | Picasso  |    3 |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    3 |    5 | The Rocks         | IA    |    33 | |    1 | Da Vinci |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    2 | Monet    |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    3 | Van Gogh |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    4 | Picasso  |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    5 | Renoir   |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

The statement output illustrates why a complete join generally is not useful: it produces a lot of output, and the result is not meaningful. Clearly, you're not maintaining these tables to match every artist with every painting, which is what the preceding statement does. An unrestricted join in this case produces nothing of value.

To answer questions meaningfully, you must combine the two tables in a way that produces only the relevant matches. Doing so is a matter of including appropriate join conditions. For example, to produce a list of paintings together with the artist names, you can associate rows from the two tables using a simple WHERE clause that matches up values in the artist ID column that is common to both tables and that serves as the link between them:

mysql> SELECT * FROM artist, painting     -> WHERE artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    3 |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

The column names in the WHERE clause include table qualifiers to make it clear which a_id values to compare. The output indicates who painted each painting, and, conversely, which paintings by each artist are in your collection.

Another way to write the same join is to use INNER JOIN rather than the comma operator and indicate the matching conditions with an ON clause:

mysql> SELECT * FROM artist INNER JOIN painting     -> ON artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    3 |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

In the special case that the matched columns have the same name in both tables and are compared using the = operator, you can use an INNER JOIN with a USING clause instead. This requires no table qualifiers, and each join column is named only once:

mysql> SELECT * FROM artist INNER JOIN painting     -> USING(a_id); +------+----------+------+-------------------+-------+-------+ | a_id | name     | p_id | title             | state | price | +------+----------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+-------------------+-------+-------+ 

Note that when you write a query with a USING clause, SELECT * returns only one instance of each join column (a_id).

Any of ON, USING, or WHERE can include comparisons, so how do you know which join conditions to put in each clause? As a rule of thumb, it's conventional to use ON or USING to specify how to join the tables, and the WHERE clause to restrict which of the joined rows to select. For example, to join tables based on the a_id column, but select only rows for paintings obtained in Kentucky, use an ON (or USING) clause to match the rows in the two tables, and a WHERE clause to test the state column:

mysql> SELECT * FROM artist INNER JOIN painting     -> ON artist.a_id = painting.a_id     -> WHERE painting.state = 'KY'; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | +------+----------+------+------+-------------------+-------+-------+ 

The preceding queries use SELECT * to select all columns. To be more selective about which columns a statement should display, provide a list that names only those columns in which you're interested:

mysql> SELECT artist.name, painting.title, painting.state, painting.price     -> FROM artist INNER JOIN painting     -> ON artist.a_id = painting.a_id     -> WHERE painting.state = 'KY'; +----------+-------------------+-------+-------+ | name     | title             | state | price | +----------+-------------------+-------+-------+ | Van Gogh | Starry Night      | KY    |    48 | | Van Gogh | The Potato Eaters | KY    |    67 | +----------+-------------------+-------+-------+ 

You're not limited to two tables when writing joins. Suppose that you prefer to see complete state names rather than abbreviations in the preceding query result. The states table used in earlier chapters maps state abbreviations to names, so you can add it to the previous query to display names:

mysql> SELECT artist.name, painting.title, states.name, painting.price     -> FROM artist INNER JOIN painting INNER JOIN states     -> ON artist.a_id = painting.a_id AND painting.state = states.abbrev; +----------+-------------------+----------+-------+ | name     | title             | name     | price | +----------+-------------------+----------+-------+ | Da Vinci | The Last Supper   | Indiana  |    34 | | Da Vinci | The Mona Lisa     | Michigan |    87 | | Van Gogh | Starry Night      | Kentucky |    48 | | Van Gogh | The Potato Eaters | Kentucky |    67 | | Van Gogh | The Rocks         | Iowa     |    33 | | Renoir   | Les Deux Soeurs   | Nebraska |    64 | +----------+-------------------+----------+-------+ 

Another common use of three-way joins is for enumerating many-to-many relationships. See Section 12.5 for an example.

By including appropriate conditions in your joins, you can answer very specific questions, such as the following:

  • Which paintings did Van Gogh paint? To answer this question, use the a_id value to find matching rows, add a WHERE clause to restrict output to those rows that contain the artist name, and select the title from those rows:

    mysql> SELECT painting.title     -> FROM artist INNER JOIN painting ON artist.a_id = painting.a_id     -> WHERE artist.name = 'Van Gogh'; +-------------------+ | title             | +-------------------+ | Starry Night      | | The Potato Eaters | | The Rocks         | +-------------------+ 

  • Who painted the Mona Lisa? Again you use the a_id column to join the rows, but this time the WHERE clause restricts output to those rows that contain the title, and you select the artist name from those rows:

    mysql> SELECT artist.name     -> FROM artist INNER JOIN painting ON artist.a_id = painting.a_id     -> WHERE painting.title = 'The Mona Lisa'; +----------+ | name     | +----------+ | Da Vinci | +----------+ 

  • Which artists' paintings did you purchase in Kentucky or Indiana? This is somewhat similar to the previous statement, but it tests a different column (a_id) in the painting table to determine which rows to join with the artist table:

    mysql> SELECT DISTINCT artist.name     -> FROM artist INNER JOIN painting ON artist.a_id = painting.a_id     -> WHERE painting.state IN ('KY','IN'); +----------+ | name     | +----------+ | Da Vinci | | Van Gogh | +----------+ 

    The statement also uses DISTINCT to display each artist name just once. Try it without DISTINCT and you'll see that Van Gogh is listed twice; that's because you obtained two Van Goghs in Kentucky.

  • Joins can also be used with aggregate functions to produce summaries. For example, to find out how many paintings you have per artist, use this statement:

    mysql> SELECT artist.name, COUNT(*) AS 'number of paintings'     -> FROM artist INNER JOIN painting ON artist.a_id = painting.a_id     -> GROUP BY artist.name; +----------+---------------------+ | name     | number of paintings | +----------+---------------------+ | Da Vinci |                   2 | | Renoir   |                   1 | | Van Gogh |                   3 | +----------+---------------------+ 

    A more elaborate statement might also show how much you paid for each artist's paintings, in total and on average:

    mysql> SELECT artist.name,     -> COUNT(*) AS 'number of paintings',     -> SUM(painting.price) AS 'total price',     -> AVG(painting.price) AS 'average price'     -> FROM artist INNER JOIN painting ON artist.a_id = painting.a_id     -> GROUP BY artist.name; +----------+---------------------+-------------+---------------+ | name     | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci |                   2 |         121 |       60.5000 | | Renoir   |                   1 |          64 |       64.0000 | | Van Gogh |                   3 |         148 |       49.3333 | +----------+---------------------+-------------+---------------+ 

Note that the summary statements produce output only for those artists in the artist table for whom you actually have acquired paintings. (For example, Monet is listed in the artist table but is not present in the summary because you don't have any of his paintings yet.) If you want the summary to include all artists, even if you have none of their paintings yet, you must use a different kind of joinspecifically, an outer join:

  • Joins written with the comma operator or INNER JOIN are inner joins, which means that they produce a result only for values in one table that match values in another table.

  • An outer join can produce those matches as well, but also can show you which values in one table are missing from the other. Section 12.2 introduces outer joins.

Joins and Indexes

Because a join can easily cause MySQL to process large numbers of row combinations, it's a good idea to make sure that the columns you're comparing are indexed. Otherwise, performance can drop off quickly as table sizes increase. For the artist and painting tables, joins are made based on the values in the a_id column of each table. If you look back at the CREATE TABLE statements that were shown for these tables earlier, you'll see that a_id is indexed in each table.


The tbl_name.col_name notation that qualifies a column name with a table name is always allowable in a join but can be shortened to just col_name if the name appears in only one of the joined tables. In that case, MySQL can determine without ambiguity which table the column comes from, and no table name qualifier is necessary. We can't do that for the following join. Both tables have an a_id column, so the column reference is ambiguous:

mysql> SELECT * FROM artist INNER JOIN painting ON a_id = a_id; ERROR 1052 (23000): Column 'a_id' in on clause is ambiguous 

By contrast, the following query is unambiguous. Each instance of a_id is qualified with the appropriate table name, only artist has a name column, and only painting has title and state columns:

mysql> SELECT name, title, state FROM artist INNER JOIN painting     -> ON artist.a_id = painting.a_id; +----------+-------------------+-------+ | name     | title             | state | +----------+-------------------+-------+ | Da Vinci | The Last Supper   | IN    | | Da Vinci | The Mona Lisa     | MI    | | Van Gogh | Starry Night      | KY    | | Van Gogh | The Potato Eaters | KY    | | Van Gogh | The Rocks         | IA    | | Renoir   | Les Deux Soeurs   | NE    | +----------+-------------------+-------+ 

To make the meaning of a statement clearer to human readers, it's often useful to qualify column names even when that's not strictly necessary as far as MySQL is concerned. I tend to use qualified names in join examples for that reason.

If you don't want to write complete table names when qualifying column references, give each table a short alias and refer to its columns using the alias. The following two statements are equivalent:

SELECT artist.name, painting.title, states.name, painting.price   FROM artist INNER JOIN painting INNER JOIN states   ON artist.a_id = painting.a_id AND painting.state = states.abbrev; SELECT a.name, p.title, s.name, p.price   FROM artist AS a INNER JOIN painting AS p INNER JOIN states AS s   ON a.a_id = p.a_id AND p.state = s.abbrev; 

In AS alias_name clauses, the AS is optional.

For complicated statements that select many columns, aliases can save a lot of typing. In addition, aliases are not only convenient but necessary for some types of statements, as will become evident when we get to the topic of self-joins (Section 12.3).




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net