Recipe 14.1. Preventing Duplicates from Occurring in a Table

Problem

You want to prevent a table from ever containing duplicates.

Solution

Use a PRIMARY KEY or a UNIQUE index.

Discussion

To make sure that rows in a table are unique, some column or combination of columns must be required to contain unique values in each row. When this requirement is satisfied, you can refer to any row in the table unambiguously by using its unique identifier. To make sure a table has this characteristic, include a PRIMARY KEY or UNIQUE index in the table structure when you create the table. The following table contains no such index, so it would allow duplicate rows:

CREATE TABLE person (   last_name   CHAR(20),   first_name  CHAR(20),   address     CHAR(40) );

To prevent multiple rows with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, the indexed columns must be NOT NULL, because a PRIMARY KEY does not allow NULL values:

CREATE TABLE person (   last_name   CHAR(20) NOT NULL,   first_name  CHAR(20) NOT NULL,   address     CHAR(40),   PRIMARY KEY (last_name, first_name) );

The presence of a unique index in a table normally causes an error to occur if you insert a row into the table that duplicates an existing row in the column or columns that define the index. Section 14.2 discusses how to handle such errors or modify MySQL's duplicate-handling behavior.

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table. The two types of indexes are similar, with the exception that a UNIQUE index can be created on columns that allow NULL values. For the person table, it's likely that you'd require both the first and last names to be filled in. If so, you still declare the columns as NOT NULL, and the following table definition is effectively equivalent to the preceding one:

CREATE TABLE person (   last_name   CHAR(20) NOT NULL,   first_name  CHAR(20) NOT NULL,   address     CHAR(40),   UNIQUE (last_name, first_name) );

If a UNIQUE index does happen to allow NULL values, NULL is special because it is the one value that can occur multiple times. The rationale for this is that it is not possible to know whether one unknown value is the same as another, so multiple unknown values are allowed. (An exception to this is that BDB tables allow at most one NULL value in a column that has a UNIQUE index.)

It may of course be that you'd want the person table to reflect the real world, for which people do sometimes have the same name. In this case, you cannot set up a unique index based on the name columns, because duplicate names must be allowed. Instead, each person must be assigned some sort of unique identifier, which becomes the value that distinguishes one row from another. In MySQL, it's common to accomplish this by using an AUTO_INCREMENT column:

CREATE TABLE person (   id          INT UNSIGNED NOT NULL AUTO_INCREMENT,   last_name   CHAR(20),   first_name  CHAR(20),   address     CHAR(40),   PRIMARY KEY (id) );

In this case, when you create a row with an id value of NULL, MySQL assigns that column a unique ID automatically. Another possibility is to assign identifiers externally and use those IDs as unique keys. For example, citizens in a given country might have unique taxpayer ID numbers. If so, those numbers can serve as the basis for a unique index:

CREATE TABLE person (   tax_id      INT UNSIGNED NOT NULL,   last_name   CHAR(20),   first_name  CHAR(20),   address     CHAR(40),   PRIMARY KEY (tax_id) );

Recipe 14.1. Preventing Duplicates from Occurring in a Table

Problem

Solution

Discussion

See Also