Keys are a kind of column. The most significant is the primary key. The primary key is the cornerstone of relational databases. Basically, the primary key is a column or set of columns that are unique in every row. (In the worst case, the primary key might consist of every column in the table.) The primary key allows us to positively identify a row. This is generally useful and absolutely critical when we wish to have tables of related information (a relational database).
For instance, suppose that our Quiz game database includes a table of questions and a table of players. We want one column of the Questions table to allow us to determine which player is the author of a question. We have a couple of options. We could use the player's e-mail address, but multiple players, say from the same family, may share an e-mail address. We could use the player's gamename, but we want to allow players to change gamename s and don't want to update our entire database whenever that happens (although this process can be automated, as we will discover later).
Suppose we assign the player a unique identifying number (UIN) that remains constant. (It could be generated, as we have seen, with AUTO-INCREMENT.) We can name this the "PlaID" (player identifier) and declare it the primary key of the Players table, since it is the way other tables will refer to rows within the table.
The gamename can be considered an alternate key. Like the primary key, it must have the property of being unique. In fact, merely identifying a column as UNIQUE allows the SQL engine to consider it an alternate key, and use it to reinforce and speed up indicing.
We've already seen a foreign key, although we may not have realized it. A foreign key is the column in a table that contains another table's primary key. So the column in the Question table that contains the PlaID number of the author is a foreign key. Declaring it as such gives us a few advantages. First, it will prevent us from entering a value in the column that is not a valid PlaID (a currently active ID in the Player table). We can't have authors that don't exist. Second, it allows us to dictate what will happen if there is a change to the row and column referenced by the foreign key. If the player has never written a question, of course, there is no problem. But if the player's PlaID appears in the Questions table, SQL will be careful about modifying or deleting it. (There are separate rules for each change.) In case of deletion, for example, the programmer will choose one of the following:
A programmer may also choose from the same options for modification. "Cascade" changes the foreign key to match the new primary key value.
If we want to combine information from two (or more) tables, we could get the information from both and piece it together ourselves . However, there's a better way, called a join. We select which columns we will display from all the columns in two or more tables.
For instance, we expect a query to the Questions table to display the Author column in the form of the PlaID, a foreign key in the Questions table:
The PlaID is the primary key in the Players table:
We can use join to replace the PlaID with actual player information:
It is important to specify that you want to display only rows where the foreign key equals the primary key; otherwise , you end up with all possible permutations of the data. This can grow rapidly if you are joining several tables:
Note that the result is evil! It occurs when we forget to specify that the join should take place only where the Author column of the Questions table and the PlaID column of the Player table data are equal.
SQL performs several types of join, and they can be most easily distinguished with set theory. The two tables share entries in the joined columns. Typically this shared entry is a primary key in one table that is also the foreign key in another ”the PlaID in the example.
We have been interested in seeing information only about the players who have submitted questions. That is, we are interested in players whose PlaID is in the Questions table and the Players table. This type of join is called an inner join. In particular, we are using a particular inner join operation that is so common that SQL identifies it as a special case. The natural join is an inner join that removes all the redundant columns. It is easy to see that this is a fundamental type of join.
Note that our table goes further. It also eliminates some nonredundant information, such as the city where each player is from. The method of eliminating columns is counterintuitive. Actually, in RDBMSs, we specify which columns we want; thus, if we want all the columns except the one showing where the players are from, we must list every column by name.
By creating a virtual copy of a table, we can have a table perform inner join with itself, a self-join . To be meaningful, of course, there must be a column that uses the table's own primary key as a foreign key. An example might be the database for a sleazy multilevel marketing scheme, where every seller has a distributor who also is a seller with a distributor, and so on.
Here we select the SellerID (primary key), Name, Territory, and DistributorID (self-referential foreign key) for the database of salespeople:
Here are the same columns after a self-join has been performed on the table and we select those sellers who work in New Jersey. We choose not to display the DistributorID column and instead add the Name column of the person whose SellerID equals the DistributorID. Note that we rename the Name column Distributor for the distributor's name, both to avoid two identically named columns and to make the table easier to understand.
An outer join is similar to an inner join. The difference is that an outer join displays rows that have no corresponding values in the other table. For this reason, outer joins, unlike inner joins, aren't commutative. Two types of outer join exist, left outer joins and right outer joins (see Figure 11.1). The term left defines which of the two tables being joined has all its rows pres ent, the table to the left of the operator. For instance, if the players table were left outer joined with the questions table, we would see output like this:
Figure 11.1. Summary of Joins
If there were any questions that had no author (suppose we had anonymous questions with an author code of 0000), they would not be displayed in a left outer join. If a right outer join were to take place, such anonymous questions would appear with a NULL author, but Garry's name would not appear at all.
R ELATIONAL D ATABASE IN P RACTICE
We have a collection of questions, each of which has a set of answer choices.
How can we store them?
We could just include four answerchoice columns in the questions table. We'd be wasting some space when we only have three answer choices, but maybe space is cheap. It's easy enough to set an unused column to null. But what happens when we want to give a question five answers? It is easy to see that we might waste resources, limit our application, or both. But there's a better way.
Suppose we have a table with just answers. It might need only three columns, one for the answer text, one as a foreign key with the questionID number, and one indicating whether the answer is correct. Of course we won't have two identical answers to the same question, so the primary key of the answers table can be a combination of the two columns that contain the answer text and the question ID. Then when we get a question, we just look through the answer table and select all the answers whose associated ID match that of our question.
All these functions can be performed by the application, but the DBMS can execute them with great speed and simplicity by using the syntax we will learn in the next chapter.