A PRIMARY KEY constraint is used to enforce a rule that a column may contain only unique, non-NULL data. This is to ensure that each row of a table can be uniquely identified. Too frequently, you will encounter the wrong sort of primary-key specification:
CREATE TABLE Table1 ( column1 ... NOT NULL PRIMARY KEY, ...)
No gross harm results, but that's not the usual recommendation. To preserve orthogonality, this definition is often suggested instead:
CREATE TABLE Table1 ( column1 ... NOT NULL, CONSTRAINT Constraint1 PRIMARY KEY (column1), ...)
We'll go a little further and askIs even that specification wrong? After all, primary-key columns can't contain NULL by definition, so specifying NOT NULL with standard SQL is a redundancy. Again, we couldn't find any evidence that gross harm results, but we'd still say that the only fully correct specification for a primary key is:
CREATE TABLE Table1 ( column1 ... , CONSTRAINT Constraint1 PRIMARY KEY (column1), ...)
Unfortunately, five of the Big EightIBM, Ingres, InterBase, MySQL, and Sybasewon't allow it (see Table 10-1).
In the SQL:1999 SQL/Foundation document, all tables are defined to have a primary key via functional dependencies, but this area is a little-known and complicated one that no DBMS has implemented. So everyone recommends that an explicit PRIMARY KEY definition should appear for every table defined. Everyone is rightbut when you know that there will never be duplicates or foreign keys (e.g., with a one-row temporary table), making a primary key is unnecessary and costly.
In a reasonable world, you would be able to add the primary key and any matching foreign keys after the table exists. Adding the foreign key after table creation would solve the cyclic references problem. For example, the two table definitions shown in Listing 10-1 are illegal outside a CREATE SCHEMA statement. The elegant solution to the cyclic references problem is shown in Listing 10-2.
Listing 10-1 Illegal cyclic references table definitions
CREATE TABLE Table1 ( column1 INTEGER PRIMARY KEY, column2 INTEGER FOREIGN KEY REFERENCES Table2) CREATE TABLE Table2 ( column2 INTEGER PRIMARY KEY, column1 INTEGER FOREIGN KEY REFERENCES Table1)
Listing 10-2 Solving the cyclic references problem
CREATE TABLE Table1 ( column1 INTEGER, column2 INTEGER) CREATE TABLE Table2 ( column2 INTEGER, column1 INTEGER) ALTER TABLE Table1 ADD CONSTRAINT Constraint1 PRIMARY KEY (column1) ALTER TABLE Table2 ADD CONSTRAINT Constraint2 PRIMARY KEY (column2) ALTER TABLE Table1 ADD CONSTRAINT Constraint3 FOREIGN KEY (column2) REFERENCES Table2 ALTER TABLE Table2 ADD CONSTRAINT Constraint4 FOREIGN KEY (column1) REFERENCES Table1
MySQL doesn't support FOREIGN KEY constraints. Sybase requires that you also have an explicit NOT NULL constraint for the PRIMARY KEY.
As well as being legal, the CREATE-then-ALTER formulation shown in Listing 10-2 represents the facts better. Columns are part of a table so they should be defined in the CREATE TABLE statement. Constraints are not part of a table so they should be defined separately. However, craving your indulgence , we admit that CREATE-then-ALTER is pie in the sky/soapbox talkthere is a practical difficulty in using the idea.
The practical difficulty is that if your DBMS supports clustered indexes, then the words "PRIMARY KEY" will cause the DBMS engine to think: "Aha, a primary key. How useful. I'll make this table a clustered index, and the clustering key will be the columns of the primary key." These are reasonable assumptions. The trouble is, the DBMS must know at table-creation time whether or not there is a primary key, because that decides the type of file structure the DBMS will use. Therefore it is too late to add or drop a primary-key specification after table creation.
You can avoid this difficulty by letting the cluster key be a nonprimary key or by defining a UNIQUE key later to serve as the primary key (it's a little-known fact that foreign keys can refer to unique keys other than the primary key). We're ignoring such work arounds here because they have no popularity.
The conclusion we've come to is that you should decide the primary key in advance. While deciding, you should take some performance factors into account. Alas, these factors lead to a dilemma.
First of all is the fact that the primary-key columns will automatically be index columnsall DBMSs perform an implicit CREATE INDEX when they see a PRIMARY KEY constraint. (Recall that the same is not true of foreign keys; see Table 9-1 in Chapter 9, "Indexes.") So, in keeping with our earlier discussion of what's the best sort of index key, the best sort of primary key consists of only one column, and that column should be defined as an INTEGER. It must not be a FLOATtwo FLOAT values can appear to be distinct when they are not, and the uniqueness checking would fail. Therefore:
A simple one- user shop should use ascending integers for a primary key.
Second of all is the fact that a serial sequence can cause hot spots. Hot spots are pages in either the index or data file that every job wants to access at the same time. Consider: If Sam and Mary are both adding rows to the same table, and Sam's inserted row is #5550001, then Mary's row will be #5550002because that's the next in sequence and index keys are in sequence. So it's a near certainty that both Sam and Mary will want to lock the same index page. There are ways around the hot spot problem, such as reverse-key indexing and range locking; see Chapter 9, "Indexes," and Chapter 15, "Locks." But the simpler solution is to avoid hot spots entirely. Therefore:
A complex multiuser shop should use an out-of-order series of characters for a primary key. This key can be derived from (a) the time of day to the nearest millisecond or (b) the user's session identifier. Such keys are often defined as CHAR(12) or longer.
So far, our advice has assumed that you will be able to assign arbitrary values to your primary keys. Sadly, the easiest way to make arbitrary values unique is to use a serial sequence.
Here are other rules of thumb for defining primary keys:
But these rules are mere conventions. What counts is that the primary-key column should be easy to form and assuredly unique. It should also, preferably, be a value that is very rarely changed.
The Bottom Line: PRIMARY KEY Constraints
A primary key is a unique, non-NULL column, or group of columns, that uniquely identifies a row of a table.
If you're programming for a simple one-user shop, use ascending integers for primary-key columns.
If you're programming for a complex multiuser shop, avoid hot spotsuse an out-of-order series of characters derived from the time of day to the nearest millisecond or the user's session identifier for primary-key columns.
When defining a primary key, remember the most important criterion: the column chosen should be easy to form and assuredly unique.