3.6 Before and After: Subselects

 <  Day Day Up  >  

Subselects are a popular database feature that's available in MySQL 4.1. A subselect , or a subquery , is a query that occurs within the context of another query. You then use the subselect's results in the main query.

Many developers like subselects because they allow them to chain queries together to winnow results. It's often possible to rewrite a query to eliminate a subselect; however, this is not always straightforward or efficient. Additionally, sometimes, without a subselect, you'll be forced to make multiple requests .

The following sections contain a few examples that show how a subselect can solve problems. Many of them use the following programs table:

 mysql> DESCRIBE programs; +------------+-----------------+------+-----+---------+----------------+  Field       Type             Null  Key  Default  Extra           +------------+-----------------+------+-----+---------+----------------+  id          int(5) unsigned        PRI  NULL     auto_increment   title       varchar(50)                                           channel_id  int(5) unsigned             0                        +------------+-----------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM programs; +----+-----------------+------------+  id  title            channel_id  +----+-----------------+------------+   1  Oprah                    60    2  Sex and the City        201    3  The Sopranos            201    4  Frontline                13  +----+-----------------+------------+ 4 rows in set (0.00 sec) 

3.6.1 MySQL 4.0: Finding Related Entries

A common database-related task is finding rows that match a set of criteria. Normally, these specifications are known ahead of time:

 // Find the names and address of all people  // who have an email address ending in "php.net". SELECT name, email FROM users WHERE email LIKE '%.php.net'; // Find the title of all TV programs that air on channel 13 SELECT title   FROM programs, channels  WHERE channel.name = 'HBO'     AND program.channel_id = channel.id; 

Sometimes, as in the first case, you only need to query a single table. Other times, as in the second example, you need to link two tables together based on a common key to gather the information you want.

However, these links aren't always sufficient. For example, you know there's a show called The Sopranos and you want to discover all the other shows that air on the same channel, but you don't know which channel that is. One solution is to use two queries:

 // Get channel_id: $r = $db->query("SELECT channel_id FROM programs                   WHERE title = 'The Sopranos'"); $row = $r->fetch_assoc( ); // Use channel_id to find programs: $s = $db->query("SELECT title FROM programs                    WHERE channel_id = $row[channel_id]"); while ($program = $s->fetch_assoc( )) {     print $program['title'] . "\n"; }  Sex and the City   The Sopranos  

This is easy to understand, but it's a bit wasteful to make two queries for such a simple request. Another solution is to employ a self-join :

 // All-in-one query using self-join: $s = $db->query("SELECT p2.title                     FROM programs AS p1,                         programs AS p2                    WHERE p1.title = 'The Sopranos'                     AND p1.channel_id = p2.channel_id"); while ($program = $s->fetch_assoc( )) {     print $program['title'] . "\n"; }  Sex and the City   The Sopranos  

It can seem a little odd at first to JOIN a table against itself, but it can actually come in quite handy. In this example, you're finding the set of shows that have a title of The Sopranos and cross-referencing them against all the shows have the same channel_id field.

In order to eliminate the ambiguity of which table you're referring to, you're forced to assign table aliases. The table you're using to locate the channel The Sopranos is on is p1 , and the table in which you're searching for matching shows is p2 . That's why you return p2.title .

To return all the rows in p2 , use p2.* . This syntax is similar to * , which returns all rows, but qualifies it with a table name.

Since you're joining on the same column, an equivalent query is:

 // All-in-one query using self-join and USING( ) $s = $db->query("SELECT p2.title                     FROM programs AS p1 INNER JOIN programs AS p2                    USING (channel_id)                   WHERE p1.title = 'The Sopranos'"); while ($program = $s->fetch_assoc( )) {     print $program['title'] . "\n"; }  Sex and the City   The Sopranos  

An INNER JOIN is identical to linking tables with a comma (,), and the USING( ) condition is shorthand for a list of columns to set as equal. Thus, USING(channel_id) is the same as p1.channel_id = p2.channel_id .

3.6.2 MySQL 4.1: Finding Related Entries

Still, that type of query can get hard to handle as the relationships become more complex. Many people find it easier to implement that request using a subselect:

 // All-in-one query using subselect: $s = $db->query("SELECT title FROM programs WHERE channel_id =                      (SELECT channel_id FROM programs                        WHERE title = 'The Sopranos' LIMIT 1)"); while ($program = $s->fetch_assoc( )) {     print $program['title'] . "\n"; }  Sex and the City   The Sopranos  

MySQL starts off by parsing the SQL statement. When it reaches the second SELECT , it places what it already has on hold and executes the second query:

 mysql> SELECT channel_id FROM programs WHERE title = 'The Sopranos' LIMIT 1;  +------------+   channel_id   +------------+   201   +------------+   1 row in set (0.00 sec)  

That query returns a number (or NULL , if there are no matches). For the sake of example, assume it's 201 . This then translates the original query into:

 mysql> SELECT title FROM programs WHERE channel_id = 201;  +-----------------+   title   +-----------------+   Sex and the City   The Sopranos   +-----------------+   2 rows in set (0.00 sec)  

MySQL can now execute the query and return your titles.

Notice how the subselect specifically adds a LIMIT clause to restrict the number of rows to 1 . Since the results are used in the context of channel_id = , returning more than one number leads to a parser error because it's like doing channel_id = 201, 202 .

When you expect your query will return multiple rows, switch from an equals sign ( = ) to IN( ) :

 mysql> SELECT title FROM programs WHERE channel_id IN(          SELECT channel_id FROM programs WHERE title = 'The Sopranos');  +-----------------+   title   +-----------------+   Sex and the City   The Sopranos   +-----------------+   2 rows in set (0.02 sec)  

This allows you to find programs located in any of the channels returned by the subselect. Be careful, though ”duplicate items are eliminated when you use IN( ) . Sometimes this is good, but not always.

For instance, say you're implementing the backend of a TiVo or another DVR. You want to track all the shows that have been watched so you can suggest related programs the person might also be interested in viewing. This table, viewings , contains a list of program_id s. Each number is a key in the programs table from before and uniquely represents a program.

Here's one potential dataset:

 mysql> SELECT * FROM viewings;  +------------+   program_id   +------------+   1   2   1   3   4   +------------+   5 rows in set (0.01 sec)  

There are five shows, but program 1 was watched twice. Here's what happens if you look up the titles using IN( ) :

 mysql> SELECT title FROM programs WHERE id IN         (SELECT program_id FROM viewings);  +-----------------+   title   +-----------------+   Oprah   Sex and the City   The Sopranos   Frontline   +-----------------+   4 rows in set (0.01 sec)  

There are only four records because the duplicate program ID of 1 isn't counted twice. To find out which show was so good as to merit a second viewing, you need to eliminate the subselect and return to a join:

 mysql> SELECT title FROM programs INNER JOIN viewings         ON viewings.program_id = programs.id;  +-----------------+   title   +-----------------+   Oprah   Frontline   Oprah   Sex and the City   The Sopranos   +-----------------+   5 rows in set (0.01 sec)  

Ah! Now you can see that someone really enjoys Oprah , so you're more likely to suggest related shows.

3.6.3 When Subselects Are Necessary

The previous examples showed different ways to rewrite your query. While using a subselect often made the query easier to write and understand, it was never necessary. There was always an alternative syntax that used a JOIN to eliminate the subselect, yet retained the same results.

That's not always the case. There are lots of examples where you just can't solve the problem in a single query without using a subselect. This normally occurs when you're using an aggregate function, such as COUNT( ) .

Returning to the last problem, you've already seen how repeated entries in a table can cause problems when using IN( ) , but they can give you trouble in other ways, too. For instance, you want to find the title s of the shows you've watched more than The Sopranos . Once again, the viewings table looks like this:

 mysql> SELECT * FROM viewings;  +------------+   program_id   +------------+   1   2   1   3   4   +------------+   5 rows in set (0.01 sec)  

To find the total number of times you've watched a show, you need to do a join against the programs table and count the number of matches:

 mysql> SELECT COUNT(*) FROM viewings LEFT JOIN programs ON(program_id = programs.id)  WHERE programs.title = 'The Sopranos';  +---+   c   +---+   1   +---+   1 row in set (0.00 sec)  

Place this result as a subselect inside the master query:

 mysql> SELECT title, COUNT(*) AS c           FROM viewings LEFT JOIN programs             ON (program_id = programs.id)       GROUP BY title       HAVING c > (  SELECT COUNT(*)                       FROM viewings                   LEFT JOIN programs                          ON (program_id = programs.id)                       WHERE title = 'The Sopranos');  +-------+---+   title  c   +-------+---+   Oprah  2   +-------+---+   1 row in set (0.00 sec)  

Once again, Oprah is the most popular show.

This question can't be rewritten to avoid a subselect, because the condition inside the subselect not only uses an aggregate function, such as COUNT( ) in this example, but also places restrictions upon the query using a WHERE clause. Since you only want the WHERE title = 'The Sopranos ' condition to apply to the subselect's COUNT(*) , it can't mix with the SELECT title, COUNT(*) in the rest of the query.

For more on aggregate functions and HAVING clauses, see the discussion in Section 4.6.

 <  Day Day Up  >  


Upgrading to PHP 5
Upgrading to PHP 5
ISBN: 0596006365
EAN: 2147483647
Year: 2004
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net