Before we go any further, it's important that you understand the purpose of the primary and foreign key fields. Like so many issues in relational database theory, you'll find developers disagree on the subject of primary keys. We'd like to tell you that this section is the definitive resource on the purpose of keys and how to choose them, but that wouldn't be true. However, you'll find that most developers agree with the guidelines we're about to share with you. We just can't guarantee that you'll never end up arguing with a colleague or a boss over primary keys. This section will give you helpful ammunition should the need arise.
A primary key field has one purpose: to uniquely identify each record in the table. Consequently, no normalized table can contain a duplicate record. Now, the data in every field other than the primary key field can be duplicated, but the primary key value must be unique-as long as that one field is unique, no record can be completely duplicated. We're splitting hairs, but it's necessary in order to grasp the nature of the primary key.
The relational model doesn't allow duplicate records, even though many RDBMSs do. It's your job to enforce methods of preventing duplicates.
When speaking of duplicate records, we aren't referring to multiple entries of the same value in one field. We generally mean one of two things: either every field in the record is the same as at least one other record or every value in the primary key exactly matches that of another record. For instance, let's suppose your primary key is based on two fields: LastName and FirstName. You can have two or more LastName entries of Jones or two or more FirstName entries of Janice, but you can have only one Janice Jones.
We purposely kept this example simple to help explain what a duplicate record is. In reality, you wouldn't create a primary key based on just the first and last name fields because you can't guarantee the uniqueness of those fields.
Let's consider our example book database. Each record represents one book. The primary key value must uniquely identify each book. Sometimes, you can depend on naturally occurring data. For instance, a book's ISBN should be unique. Therefore, we can use the ISBN as the table's primary key field. The book's title might be duplicated in another record, and surely the book's author might appear in other records. But the ISBN (in theory, at least) should never be duplicated. Apply the following guidelines when considering key candidates:
A primary key value can't be Null.
A Null value generally refers to an empty field, but it really means much more. In this context, a Null value doesn't necessarily mean there is no value, but rather that the value is unknown. The value might exist, but at the current time, we don't know what it is, or the value might not exist-the value is unknown. For the sake of this discussion, it's enough to understand that you must enter a value for a primary key field when you create the record-you can't leave a primary key field blank. In fact, many RDBMSs won't accept a record until you supply a primary key value.
The primary key value must exist when the record is created and stored.
The primary key must be stable. That means you can't change the primary key structure or its values.
The primary key must be as compact as possible and contain the fewest possible attributes.
(A primary key can consist of more than one field.)
You can't change the primary key field. That includes adding fields to and deleting fields from the original key.
An RDBMS will allow you to break some of these rules, but don't. It's your job as the developer to enforce relational rules. The system's job is to be as flexible as possible.
Note A key candidate is any field or combination of fields that uniquely identifies a record. Not all key candidates make good primary keys.
What's the most foolproof way to create a primary key that conforms to the guidelines? Natural keys (the actual data) don't necessarily make good primary keys. Many developers instead rely strictly on surrogate keys, which contain system-generated, meaningless values. Many RDBMSs provide datatypes, such as AutoNumber in Access and Identify in SQL Server, that automatically generate sequential values for you. That's what we recommend, but it's a subject of heated debate. Developers have based primary keys on naturally occurring data since the inception of the relational model, and many of them maintain that natural keys are the best keys.
The biggest advantage advocates claim for natural keys is one of convenience-natural data is easily recognizable. Even though it isn't a primary key requirement, these developers want a primary key value to somehow identify the other values in the record. They want the user to be able to recognize record data by the record's primary key.
The rebuttal to this is that they've got the rule backward: the non-key fields must describe the primary key, not vice versa. The primary key uniquely identifies the record so that the RDBMS can relate that record to data in other tables; it isn't a convenience of happenstance for the user. And as you'll see later, in a well-designed database, users have no need to see the primary key value.
More important is how well the key values implement the basic requirements:
A primary key uniquely identifies the record.A natural key is subject to input errors. With a surrogate key, a system-generated value is always unique and eliminates input errors.
A primary key value can't be null and must exist when the record is created and stored. It's true that a record with a natural key can't be entered and stored until the data on which the primary key is based is known. Consequently, if the primary key value is unknown, you can't create the record. But a surrogate value is generated by the system when the record is created.
You can't change a primary key value. Natural keys implement this poorly. Databases change as the business they support grows and changes. Information is added and deleted. No matter how stable you think a field is, it is subject to change if it contains natural data. But there's no reason to change a meaningless surrogate key value. In fact, many systems won't let you change an automatically generated value.
You can't change the primary key field. Natural keys have the same disadvantage with database structure as with actual values; databases change as businesses change, and information is added and deleted. But neither a change in table structure or business rules affects the status of a surrogate key.
The primary key must be as compact as possible and contain the fewest possible attributes. A natural key can have one field or consist of potentially all the fields in the table. By contrast, a surrogate key always contains one field.
It's hard to imagine that such a seemingly small issue can make or break a database, but it can. The most competent developer isn't a fortune-teller. You can only anticipate so much growth and change. In addition, no matter how well you safeguard your database, you can't fully protect your data from human errors.
We're not telling you that natural keys will doom your database. We're telling you that surrogate keys are more adaptable and remove the risk of human error.
Foreign keys are much easier to understand-there are no decisions to make. A foreign key is simply another table's primary key. A table can and often will contain both a primary and a foreign key. You need to remember only two rules:
A foreign key value can be duplicated.
A foreign key can be Null, but usually isn't without good reason.
You might be wondering how a foreign key can remain Null if a foreign key value is simply another table's primary key value and a primary key value can't be Null. You simply leave the foreign key field blank when you enter the related record. Consequently, a query doesn't relate the record to its primary key value, but the record is still permissible. The reality is, few databases will need this much flexibility, and you should use extreme caution when applying this rule to a database. However, the fact remains that the relational database model does allow Null values in the foreign key field- but you might never put it to use.
When finding related data, the system matches foreign and primary key values and then combines the data from any matching records to create a pseudo (or temporary) record. The original tables aren't modified; the data are simply displayed together.
Let's return to our book database example. Suppose you want to enter information for a new book that you've just acquired. Along with the book information (title, ISBN, and so on), you'll also identify the book's author. Let's further assume that the author is already listed in a table of authors, which lists each author's contact information. In addition, each author has a primary key value. Instead of entering the author's name in the book table, you enter the author's primary key value from the author's table. In this context, that value becomes a foreign key value in the book table.
You might be wondering just how the user finds a particular author's primary key value. Or, what do you do if the author doesn't exist yet? In a well-designed database, the book-entry form will include a list of available authors from which the user can choose. Then, when the system stores the record, the system also stores the appropriate primary key value, based on the user's choice. There's no reason for the user to ever view a primary key value or to even know that primary key values exist. The form also includes the means for entering new information for authors not yet listed and properly leads the user to enter that information before accepting and storing the new book record.
At this point, the books table has two keys, a primary key value and a foreign key field, that relate the book to the appropriate author in the authors table. To retrieve author information for the new book, query both tables. (Of course, the user probably doesn't know they are constructing a query at all, because your well-designed database has a clear, simple input form.) When the system encounters the author's primary key value in the books table, it tries to match that value in the author's primary key field. When it finds a match, the system combines the data from both records.