FOREIGN KEY | SQL Performance Tuning

A FOREIGN KEY constraint is used to enforce a rule that a column may contain only NULL or data that is also contained in a matching, primary key, column. ^[4] A foreign key should exactly match its referenced primary key, both in data type and in column size . Some DBMSs (for example, Informix, Microsoft, and Sybase) will not even let you define a column as:

^[4] We use "primary key" throughout this section for the sake of simplicity. The SQL Standard actually specifies that a foreign key must match a unique key. In practice, of course, the match is normally made to the type of unique key known as a primary key.

 CREATE TABLE Table2 (    column1 INTEGER REFERENCES Table1,    ...)

if Table1 's definition is:

 CREATE TABLE Table1 (    column1 SMALLINT PRIMARY KEY,    ...)

This restriction makes sense when you consider that a data type is itself a range constraint. You can't possibly have a value of 1,000,000 in the foreign-key column if the primary-key values are SMALLINTs, and therefore must be less than 32,768.

This leads us to the observation that any constraint (except a NOT NULL or UNIQUE constraint) that applies to the primary key must apply to the foreign key as well. Therefore the following pair of CREATE TABLE statements contain an unnecessary CHECK constraint in Table2 's definition:

 CREATE TABLE Table1 (    column1 SMALLINT,    CHECK (column1 > 0),    PRIMARY KEY (column1)) CREATE TABLE Table2 (    column1 SMALLINT,    CHECK (column1 > 0),    FOREIGN KEY (column1) REFERENCES Table1)

In our tests, we found that some DBMSs will actually process the CHECK constraint when inserting or updating Table2 , even though logically they don't have tothey're just wasting time. That is, these two data-change statements should take the same amount of time whether Table2 's definition includes the CHECK clause or omits the CHECK clause:

 INSERT INTO Table2 ... UPDATE Table2 SET column1 ... GAIN: 2/7 if the redundant CHECK on Table2 is omitted

But with two of the DBMSs we tested , the data changes went faster when Table2 's definition did not include the redundant CHECK constraint.

Not only should foreign-key columns match primary-key columns with respect to data type and size, they should match in names too. One of the features of a foreign-key table should be that it's easy to join with a primary-key table. In fact, that's the original meaning of natural join: a join over foreign- and primary-key columns. Unfortunately, the meaning of NATURAL JOIN in current SQL is slightly different. In the SQL Standard, NATURAL JOIN merely means a join over columns with the same name . To make these definitions compatible, make sure that any column that is not part of the foreign key does not have the same name as some primary-key column. For example this pair of table definitions contains a bad column name:

 CREATE TABLE Chains (    chain_id INTEGER,    city CHARACTER(20),    PRIMARY KEY (chain_id)) CREATE TABLE Stores (    store_id INTEGER,    chain_id INTEGER,    city CHARACTER(20),    PRIMARY KEY (store_id),    FOREIGN KEY (chain_id) REFERENCES Chains)

The mistake is that both tables contain a column named city . Therefore a search involving

 ... Chains NATURAL JOIN Stores ...

won't work naturally because the DBMS will try to join over two columns ( chain_id , city ) when there's only one reasonable joining column ( chain_id ). If it's too late to fix all the names in the database, then the recourse is to avoid any use of SQL's NATURAL JOINON or USING clauses are a must. Another good check is to ask if your application ever does a join over something other than primary-key/foreign-key columns. If so, there's probably something wrong with either the database design or the application.

Here's one more check you should makeDoes the same column appear in more than one foreign key? If so, it is a "servant of two masters," and the DBMS will have a hard time prejoining with a join index. Some DBMSs (but none of the Big Eight) will simply disallow constructions like:

 CREATE TABLE Table3 (    column1 INTEGER,    FOREIGN KEY (column1) REFERENCES Table1,    FOREIGN KEY (column1) REFERENCES Table2)

The Bottom Line: FOREIGN KEY Constraints

A foreign key is a column, or group of columns, that may contain only those values found in a similar set of unique (usually primary key) columns belonging to (usually) another table. The rule enforces data integrity: The rationale is that you can't have an order if there is no customer, you can't have an employee working in department "A" if there is no such department, you can't have a factory that makes widgets if you don't have widgets as a product, and so on.

Define all foreign-key columns to match their primary-key column exactly in data type, column size, and column name.

Save time by eliminating redundant CHECK constraints on foreign-key columns where the same constraint already exists on the matching primary-key column.

Avoid the use of NATURAL JOIN when two tables have columns with the same name that are not linked with a primary-key/foreign-key relationship.

Don't use the same column for multiple FOREIGN KEY constraints or your DBMS will have a hard time prejoining with a join index.