Deleting Related Rows in Multiple Tables

Table of contents:

Using Foreign Keys to Enforce Referential Integrity

12.21.1 Problem

You want to delete related records from multiple tables. This is common, for example, when you have tables that are related in master-detail or parent-child fashion; deleting a parent record typically requires all the associated child records to be deleted as well.

12.21.2 Solution

You have several options. MySQL 4.0 supports cascaded delete with a multiple-table DELETE syntax; you can replace the table with new versions that contain only the records not to be deleted; you can write a program to construct appropriate DELETE statements for each table, or you may be able to use mysql to do so.

12.21.3 Discussion

Applications that use related tables need to operate on both tables at once for many operations. Suppose you use MySQL to record information about the contents of software distributions that you maintain. The master (or parent) table lists each distribution's name, version number, and release date. The detail (or child) table lists information about the files in the distributions, thus serving as the manifest for each distribution's contents. To allow the parent and child records to be associated, each parent record has a unique ID number, and that number is stored in the child records. The tables might be defined something like this:

CREATE TABLE swdist_head
(
 dist_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # distribution ID
 name VARCHAR(40), # distribution name
 ver_num NUMERIC(5,2), # version number
 rel_date DATE NOT NULL, # release date
 PRIMARY KEY (dist_id)
);
CREATE TABLE swdist_item
(
 dist_id INT UNSIGNED NOT NULL, # parent distribution ID
 dist_file VARCHAR(255) NOT NULL # name of file in distribution
);

For the examples here, assume the tables contain the following records:

mysql> SELECT * FROM swdist_head ORDER BY name, ver_num;
+---------+------------+---------+------------+
| dist_id | name | ver_num | rel_date |
+---------+------------+---------+------------+
| 1 | DB Gadgets | 1.59 | 1996-03-25 |
| 3 | DB Gadgets | 1.60 | 1998-12-26 |
| 4 | DB Gadgets | 1.61 | 1998-12-28 |
| 2 | NetGizmo | 3.02 | 1998-11-10 |
| 5 | NetGizmo | 4.00 | 2001-08-04 |
+---------+------------+---------+------------+
mysql> SELECT * FROM swdist_item ORDER BY dist_id, dist_file;
+---------+----------------+
| dist_id | dist_file |
+---------+----------------+
| 1 | db-gadgets.sh |
| 1 | README |
| 2 | NetGizmo.exe |
| 2 | README.txt |
| 3 | db-gadgets.sh |
| 3 | README |
| 3 | README.linux |
| 4 | db-gadgets.sh |
| 4 | README |
| 4 | README.linux |
| 4 | README.solaris |
| 5 | NetGizmo.exe |
| 5 | README.txt |
+---------+----------------+

The tables describe the distributions for three versions of DB Gadgets and two versions of NetGizmo. But the tables are difficult to make sense of individually, so to display information for a given distribution, you'd use a join to select rows from both tables. For example, the following query shows the information stored for DB Gadgets 1.60:

mysql> SELECT swdist_head.dist_id, swdist_head.name,
 -> swdist_head.ver_num, swdist_head.rel_date, swdist_item.dist_file
 -> FROM swdist_head, swdist_item
 -> WHERE swdist_head.name = 'DB Gadgets' AND swdist_head.ver_num = 1.60
 -> AND swdist_head.dist_id = swdist_item.dist_id;
+---------+------------+---------+------------+---------------+
| dist_id | name | ver_num | rel_date | dist_file |
+---------+------------+---------+------------+---------------+
| 3 | DB Gadgets | 1.60 | 1998-12-26 | README |
| 3 | DB Gadgets | 1.60 | 1998-12-26 | README.linux |
| 3 | DB Gadgets | 1.60 | 1998-12-26 | db-gadgets.sh |
+---------+------------+---------+------------+---------------+

Similarly, to delete a distribution, you'd need to access both tables. DB Gadgets 1.60 has an ID of 3, so one way to get rid of it would be to issue DELETE statements for each of the tables manually:

mysql> DELETE FROM swdist_head WHERE dist_id = 3;
mysql> DELETE FROM swdist_item WHERE dist_id = 3;

That's quick and easy, but problems can occur if you forget to issue DELETE statements for both tables (which is easier to do than you might think). In that case, your tables become inconsistent, with parent records that have no children, or children that are referenced by no parent. Also, manual deletion doesn't work well in situations where you have a large number of distributions to remove, or when you don't know in advance which ones to delete. Suppose you decide to purge all the old records, keeping only those for the most recent version of each distribution. (For example, the tables contain information for DB Gadgets distributions 1.59, 1.60, and 1.61, so you'd remove records for Versions 1.59 and 1.60.) For this kind of operation, you'd likely determine which distributions to remove based on some query that figures out the IDs of those that are not the most recent. But then what do you do? The query might produce many IDs; you probably wouldn't want to delete each distribution manually. And you don't have to. There are several options for deleting records from multiple tables:

Use the multiple-table DELETE syntax that is available as of MySQL 4.0.0. This way you can write a query that takes care of identifying and removing records from both tables at once. You need not remember to issue multiple DELETE statements each time you remove records from related tables.
Approach the problem in reverse: select the records that are not to be deleted into new tables, then use those tables to replace the original ones. The effect is the same as deleting the unwanted records.
Use a program that determines the IDs of the distributions to be removed and generates the appropriate DELETE statements for you. The program might be one you write yourself, or an existing program such as mysql.

The remainder of this section examines each of these options in turn, showing how to use them to solve the problem of deleting old distributions. Because each example removes records from the swdist_head and swdist_item tables, you'll need to create and populate them anew before trying each method, so that you begin at the same starting point each time. You can do this using the swdist_create.sql script in the joins directory of the recipes distribution. Scripts that demonstrate each multiple-table delete method shown in the examples may be found in that directory as well.

Using Foreign Keys to Enforce Referential Integrity

One feature a database may offer for helping you maintain consistency between tables is the ability to define foreign key relationships. This means you can specify explicitly in the table definition that a primary key in a parent table (such as the dist_id column of the swdist_head table) is a parent to a key in another table (the dist_id column in the swdist_item table). By defining the ID column in the child table as a foreign key to the ID column in the parent, the database can enforce certain constraints against illegal operations. For example, it can prevent you from creating a child record with an ID that is not present in the parent, or from deleting parent records without also deleting the corresponding child records first. A foreign key implementation may also offer cascaded delete: if you delete a parent record, the database engine cascades the effect of the delete to any child tables and automatically deletes the child records for you. The InnoDB table type in MySQL offers support for foreign keys as of Version 3.23.44, and for cascaded delete as of 3.23.50. In addition, there are plans to implement foreign key support for all the table types in MySQL 4.1.

12.21.4 Performing a Cascaded Delete with a Multiple-Table DELETE Statement

As of MySQL 4.0.0, DELETE supports a syntax that allows you to identify records to be removed from multiple tables and clobber them all with a single statement. To use this for deleting software distributions from the swdist_head and swdist_item tables, determine the IDs of the relevent distributions and then apply the list to those tables.

First, determine which version of each distribution is the most recent and select the names and version numbers into a separate table. The following query selects each distribution name and the highest version number for each one:

mysql> CREATE TABLE tmp
 -> SELECT name, MAX(ver_num) AS newest
 -> FROM swdist_head
 -> GROUP BY name;

The resulting table looks like this:

mysql> SELECT * FROM tmp;
+------------+--------+
| name | newest |
+------------+--------+
| DB Gadgets | 1.61 |
| NetGizmo | 4.00 |
+------------+--------+

Next, determine the ID numbers of the distributions that are older than those listed in the tmp table:

mysql> CREATE TABLE tmp2
 -> SELECT swdist_head.dist_id, swdist_head.name, swdist_head.ver_num
 -> FROM swdist_head, tmp
 -> WHERE swdist_head.name = tmp.name AND swdist_head.ver_num < tmp.newest;

Note that you actually need select only the dist_id column into tmp2. The example selects the name and version number as well so that you can look at tmp2 and see more easily that the IDs it chooses are indeed those for the older distributions that are to be deleted:

mysql> SELECT * FROM tmp2;
+---------+------------+---------+
| dist_id | name | ver_num |
+---------+------------+---------+
| 1 | DB Gadgets | 1.59 |
| 3 | DB Gadgets | 1.60 |
| 2 | NetGizmo | 3.02 |
+---------+------------+---------+

The table does not contain the IDs for DB Gadgets 1.61 or NetGizmo 4.00, which are the most recent distributions.

Now apply the ID list in tmp2 to the distribution tables using a multiple-table DELETE. The general form of this statement is:

DELETE tbl_list1 FROM tbl_list2 WHERE conditions;

tbl_list1 names the tables from which to delete records. tbl_list2 names the tables used in the WHERE clause, which specifies the conditions that identify the records to delete. Each table list can name one or more tables, separated by commas. For the situation at hand, the tables to delete from are swdist_head and swdist_item. The tables used to identify the deleted records are those tables and the tmp2 table:

mysql> DELETE swdist_head, swdist_item
 -> FROM tmp2, swdist_head, swdist_item
 -> WHERE tmp2.dist_id = swdist_head.dist_id
 -> AND tmp2.dist_id = swdist_item.dist_id;

The resulting tables look like this:

mysql> SELECT * FROM swdist_head;
+---------+------------+---------+------------+
| dist_id | name | ver_num | rel_date |
+---------+------------+---------+------------+
| 4 | DB Gadgets | 1.61 | 1998-12-28 |
| 5 | NetGizmo | 4.00 | 2001-08-04 |
+---------+------------+---------+------------+
mysql> SELECT * FROM swdist_item;
+---------+----------------+
| dist_id | dist_file |
+---------+----------------+
| 4 | README |
| 4 | README.linux |
| 4 | README.solaris |
| 4 | db-gadgets.sh |
| 5 | README.txt |
| 5 | NetGizmo.exe |
+---------+----------------+

For the tables that we're using, the DELETE statement just shown works as expected. But be aware that it will fail for tables containing parent records that should be deleted but for which there are no corresponding child records. The WHERE clause will find no match for the parent record in the client table, and thus not select the parent record for deletion. To make sure that the query selects and deletes the parent record even in the absence of matching child records, use a LEFT JOIN:

mysql> DELETE swdist_head, swdist_item
 -> FROM tmp2 LEFT JOIN swdist_head ON tmp2.dist_id = swdist_head.dist_id
 -> LEFT JOIN swdist_item ON swdist_head.dist_id = swdist_item.dist_id;

LEFT JOIN is discussed in Recipe 12.6.

12.21.5 Performing a Multiple-Table Delete Using Table Replacement

Another way to delete related rows from multiple tables is to select only the records that should not be deleted into new tables, then replace the original tables with the new ones. This is especially useful when you want to delete more records than you want to keep.

Begin by creating two tables tmp_head and tmp_item that have the same structure as the swdist_head and swdist_item tables:

CREATE TABLE tmp_head
(
 dist_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # distribution ID
 name VARCHAR(40), # distribution name
 ver_num NUMERIC(5,2), # version number
 rel_date DATE NOT NULL, # release date
 PRIMARY KEY (dist_id)
);
CREATE TABLE tmp_item
(
 dist_id INT UNSIGNED NOT NULL, # parent distribution ID
 dist_file VARCHAR(255) NOT NULL # name of file in distribution
);

Then determine the IDs of the distributions you want to keep (that is, the most recent version of each distribution). The IDs are found as follows, using queries similar to those just described in the multiple-table delete section:

mysql> CREATE TABLE tmp
 -> SELECT name, MAX(ver_num) AS newest
 -> FROM swdist_head
 -> GROUP BY name;
mysql> CREATE TABLE tmp2
 -> SELECT swdist_head.dist_id
 -> FROM swdist_head, tmp
 -> WHERE swdist_head.name = tmp.name AND swdist_head.ver_num = tmp.newest;

Next, select into the new tables the records that should be retained:

mysql> INSERT INTO tmp_head
 -> SELECT swdist_head.*
 -> FROM swdist_head, tmp2
 -> WHERE swdist_head.dist_id = tmp2.dist_id;
mysql> INSERT INTO tmp_item
 -> SELECT swdist_item.*
 -> FROM swdist_item, tmp2
 -> WHERE swdist_item.dist_id = tmp2.dist_id;

Finally, replace the original tables with the new ones:

mysql> DROP TABLE swdist_head;
mysql> ALTER TABLE tmp_head RENAME TO swdist_head;
mysql> DROP TABLE swdist_item;
mysql> ALTER TABLE tmp_item RENAME TO swdist_item;

12.21.6 Performing a Multiple-Table Delete by Writing a Program

The preceding two methods for deleting related rows from multiple tables are SQL-only techniques. Another approach is to write a program that generates the DELETE statements for you. The program should determine the key values (the distribution IDs) for the records to delete, then process the keys to turn them into appropriate DELETE statements. Identifying the IDs can be done the same way as shown for the previous methods, but you have some latitude in how you want to use them to delete records:

Handle each ID individually. Construct DELETE statements that remove records from the tables one ID at a time.
Handle the IDs as a group. Construct an IN( ) clause that names all the IDs, and use it with each table to delete all the matching IDs at once.
If the ID list is huge, break it into smaller groups to construct shorter IN( ) clauses.
You can also solve the problem by reversing the perspective. Select the IDs for the distributions you want to retain and use them to construct a NOT IN( ) clause that deletes all the other distributions. This will usually be less efficient, because MySQL will not use an index for NOT IN( ) operations.

I'll show how to implement each method using Perl.

For each of the first three methods, begin by generating a list of the distribution IDs for the records to be deleted:

# Identify the newest version for each distribution name

$dbh->do ("CREATE TABLE tmp
 SELECT name, MAX(ver_num) AS newest
 FROM swdist_head
 GROUP BY name");

# Identify the IDs for versions that are older than those.

my $ref = $dbh->selectcol_arrayref (
 "SELECT swdist_head.dist_id
 FROM swdist_head, tmp
 WHERE swdist_head.name = tmp.name
 AND swdist_head.ver_num < tmp.newest");

# selectcol_arrayref( ) returns a reference to a list. Convert the reference
# to a list, which will be empty if $ref is undef or points to an empty list.

my @val = ($ref ? @{$ref} : ( ));

At this point, @val contains the list of IDs for the records to remove. To process them individually, run the following loop:

# Use the ID list to delete records, one ID at a time

foreach my $val (@val)
{
 $dbh->do ("DELETE FROM swdist_head WHERE dist_id = ?", undef, $val);
 $dbh->do ("DELETE FROM swdist_item WHERE dist_id = ?", undef, $val);
}

The loop will generate statements that look like this:

DELETE FROM swdist_head WHERE dist_id = '1'
DELETE FROM swdist_item WHERE dist_id = '1'
DELETE FROM swdist_head WHERE dist_id = '3'
DELETE FROM swdist_item WHERE dist_id = '3'
DELETE FROM swdist_head WHERE dist_id = '2'
DELETE FROM swdist_item WHERE dist_id = '2'

A drawback of this approach is that for large tables, the ID list may be quite large and you'll generate lots of DELETE statements. To be more efficient, combine the IDs into a single IN( ) clause that names them all at once. Generate the ID list the same way as for the first method, then process the list like this:[4]

[4] In Perl, you can't bind an array to a placeholder, but you can construct the query string to contain the proper number of ? characters (see Recipe 2.7). Then pass the array to be bound to the statement, and each element will be bound to the corresponding placeholder.

# Use the ID list to delete records for all IDs at once. If the list
# is empty, don't bother; there's nothing to delete.

if (@val)
{
 # generate list of comma-separated "?" placeholders, one per value
 my $where = "WHERE dist_id IN (" . join (",", ("?") x @val) . ")";
 $dbh->do ("DELETE FROM swdist_head $where", undef, @val);
 $dbh->do ("DELETE FROM swdist_item $where", undef, @val);
}

This method generates only one DELETE statement per table:

DELETE FROM swdist_head WHERE dist_id IN ('1','3','2')
DELETE FROM swdist_item WHERE dist_id IN ('1','3','2')

If the list of IDs is extremely large, you may be in danger of producing DELETE statements that exceed the maximum query length (a megabyte by default). In this case, you can break the ID list into smaller groups and use each one to construct a shorter IN( ) clause:

# Use the ID list to delete records, using parts of the list at a time.

my $grp_size = 1000; # number of IDs to delete at once

for (my $i = 0; $i < @val; $i += $grp_size)
{
 my $j = (@val < $i + $grp_size ? @val : $i + $grp_size);
 my @group = @val[$i .. $j-1];
 # generate list of comma-separated "?" placeholders, one per value
 my $where = "WHERE dist_id IN (" . join (",", ("?") x @group) . ")";
 $dbh->do ("DELETE FROM swdist_head $where", undef, @group);
 $dbh->do ("DELETE FROM swdist_item $where", undef, @group);
}

Each of the preceding programming methods finds the IDs of the records to remove and then deletes them. You can also achieve the same objective using reverse logic: select the IDs for the records you want to keep, then delete everything else. This approach can be useful if you expect to retain fewer records than you'll delete. To implement it, determine the newest version for each distribution and find the associated IDs. Then use the ID list to construct a NOT IN( ) clause:

# Identify the newest version for each distribution name

$dbh->do ("CREATE TABLE tmp
 SELECT name, MAX(ver_num) AS newest
 FROM swdist_head
 GROUP BY name");

# Identify the IDs for those versions.

my $ref = $dbh->selectcol_arrayref (
 "SELECT swdist_head.dist_id
 FROM swdist_head, tmp
 WHERE swdist_head.name = tmp.name
 AND swdist_head.ver_num = tmp.newest");

# selectcol_arrayref( ) returns a reference to a list. Convert the reference
# to a list, which will be empty if $ref is undef or points to an empty list.

my @val = ($ref ? @{$ref} : ( ));

# Use the ID list to delete records for all *other* IDs at once.
# The WHERE clause is empty if the list is empty (in that case,
# no records are to be kept, so they all can be deleted).

my $where = "";
if (@val)
{
 # generate list of comma-separated "?" placeholders, one per value
 $where = "WHERE dist_id NOT IN (" . join (",", ("?") x @val) . ")";
}
$dbh->do ("DELETE FROM swdist_head $where", undef, @val);
$dbh->do ("DELETE FROM swdist_item $where", undef, @val);

Note that with this reverse-logic approach, you must use the entire ID list in a single NOT IN( ) clause. If you try breaking the list into smaller groups and using NOT IN( ) with each of those, you'll empty your tables completely when you don't intend to.

12.21.7 Performing a Multiple-Table Delete Using mysql

If the keys that indicate which records to delete do not include quotes or other special characters, you can generate DELETE statements using mysql. For the software distribution tables, the keys (dist_id values) are integers, so they're susceptible to this approach. Generate the ID list using the same queries as those described in the multiple-table DELETE section, then use the list to create the DELETE statements:

CREATE TABLE tmp
SELECT name, MAX(ver_num) AS newest
FROM swdist_head
GROUP BY name;

CREATE TABLE tmp2
SELECT swdist_head.dist_id
FROM swdist_head, tmp
WHERE swdist_head.name = tmp.name AND swdist_head.ver_num < tmp.newest;

SELECT CONCAT('DELETE FROM swdist_head WHERE dist_id=',dist_id,';') FROM tmp2;
SELECT CONCAT('DELETE FROM swdist_item WHERE dist_id=',dist_id,';') FROM tmp2;

If you have those statements in a file swdist_mysql_delete.sql, execute the file as follows to produce the set of DELETE statements:

% mysql -N cookbook < swdist_mysql_delete.sql > tmp

The file tmp will look like this:

DELETE FROM swdist_head WHERE dist_id=1;
DELETE FROM swdist_head WHERE dist_id=3;
DELETE FROM swdist_head WHERE dist_id=2;
DELETE FROM swdist_item WHERE dist_id=1;
DELETE FROM swdist_item WHERE dist_id=3;
DELETE FROM swdist_item WHERE dist_id=2;

Then execute the contents of tmp as follows:

% mysql cookbook < tmp

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References