Relational Database Principles | Hands-On Microsoft Access: A Practical Guide to Improving Your Access Skills

When you begin creating relational databases for whatever mission, some of the procedures will seem counterintuitive and needlessly complex. Indeed, at times the whole enterprise might seem dubious, at best.

So before going any further, let's flesh out a few broad principles of relational database design. This will give you a better idea of the playing field you'll be working in, and perhaps make some of the methods you'll be using seem less perplexing.

An Indifference to Where Values Are Stored

Let's say you're a fan of the Boston Red Sox. You turn to your hometown newspaper to find out how many games they are in or out of first place. You know that the sports news is always in, say, section D, and Major League Baseball standings are always printed about halfway into the section. You open Section D to the middle and find the baseball stats. You find first the American League standings and then those for its Eastern Division. Your eye moves from left to right as you find the name of the team, its won/lost record and percentage, and the number of games it's in or out of first place. You're delighted to discover that the Sox are leading their division with a .663 percentage, 12 games ahead of the Yankees.

The way you find information in a newspaper is through location. The newspaper positions information consistently in a certain ordersports news, baseball news, American League standings, Eastern Division standings, team name, won/lost percentage, games in/out of first placeso you can easily find what you need.

A relational database is also consistent and logical, but it is organized quite differently. In a newspaper, the first section (or, at least, the first page) always contains headline news, not sports news (unless, of course, sports news is headline news). In a relational database, there is no first tableor second, or third, or fourth. No table comes before or after any other table. In other words, tables are not placed in any particular sequence.

Moreover, within a table, it doesn't matter which column comes before or after any other column. For example, a customer address can come before or after the customer name. Nor does it matter which row comes before or after any other row. In a field of last names, the G's can come before the A's, and the C's can come after both. The physical storage of the data does not depend on location.

Of course, when you view your data, you might want to make sure that the last names are alphabetized. Of course, a Red Sox fan will want to see his club listed first if it has the best won/loss percentage. But these are matters of user preference, not database design. Access makes it easy to move columns left and right, and to sort records in any field in either ascending or descending order. Regardless of how you move data, the tables themselves are in no particular hierarchy, nor is any column or row in a specific position.

Thus, be careful how you use phrases such as "the fifth row of the table" or "the table's third column." They might describe a row or column's position in the current display, but there really is no fifth row or third column, as such.

One Field, One Kind of Data

A single field has only one data type. In other words, a column to which you have assigned the Number data type will hold only numbers, a column that has a Date/Time data type will hold only dates/times, and so on.

This rule seems to be simple common sense and hardly worth mentioning. Yet in actual practice, there is an inclination to break it. For example, at times it might seem to make good sense to enter the value between 5/18/2005 and 5/19/2005 in a Date/Time field, or 64 to 69 in a Number field. But a Date/Time field cannot contain words such as between and and. And Number fields can hold only numbers; they cannot describe a range. Access is very strict about allowing values of the wrong data type into columns.

Only One Value in One Row of One Field

In each column of each row, there should be only one value (where allowed, a null value is considered one value). This rule sharply contradicts what you saw in the one-table-fits-all design, where there was a strong inclination to put two, three, or even more values in a single column of a single row.

Much of database design involves ensuring that there is only one value in each field of each row. Staying true to this principle will prompt much head-scratching and perhaps a headache or two as well. But it's a cardinal rule of good database design. You'll be reminded constantly of this principle as you design the Classic TV database.

Break Data into Its Smallest Components

A difficulty in implementing the "one value in each field of each row" rule is defining just what a value is.

For dates and numbers, that usually isn't a problem because they are discrete entities (such as 5/10/2004 and 3,468.7). But what about text? Is an address, such as 1600 Pennsylvania Avenue, Washington, DC 20500, one value, two values, or more?

The answer is that you want to break data into its smallest components. For example, an address usually includes separate fields for the city, state, and ZIP code. You'll see many examples of this principle at work as you move through the database design process.

Q&A

Q1:
Hold on a second. I'm not sure what you mean by "smallest component." How small is small?
A1:
I should say the "smallest meaningful component for the purposes of the database." I don't want you to pulverize data. For example, a city such as New York could theoretically be divided into two values, New and York, or even seven values, for each of its seven letters. But you wouldn't want to divide the text string New York into two values, let alone seven, because in the context of city names, New and York are not meaningful as separate entities.

To make an attempt at a contrasting case, I can (with some effort) imagine a table in a database for grammarians that stores data about U.S. cities whose names are of Spanish origin. Values such as El Segundo, Las Vegas, Los Angeles, and Los Alamos could be meaningfully divided into two fields, one for the article (las, los, el) and the other for the noun that follows.

Keep Redundant Data to a Minimum

Redundant data is like cholesterol: There's "bad" redundant data and "good" redundant data. And like cholesterol, the idea that redundant data can be either good or bad for you is difficult to digest (bad pun fully intended).

As I've already argued, repeating the same data in more than one field can threaten the consistency and integrity of your data. This "bad" redundant data arises from poor database design that forces you to enter the same values in two or more fields unnecessarily.

At the same time, there are duplicate fields with "matching data" that are required for creating a relational database. They enable you to establish relationships, which are essential for bringing together the data in the various tables of your database.

Unless otherwise indicated, when I discuss redundant data, I'm talking about the bad kind.

What Does Data Integrity Mean?

Often when discussing databases you will hear the term data integrity. That undoubtedly sounds like a very good thing, but what actually does it mean?

On one level, it means that the data is consistent, accurate, and correct. But there are actually four specific types of data integrity:

Entity integrity means that each row in a table is uniquely identified by a primary key that is never a null value. The table thus has no duplicate records. This concept is discussed in this chapter.
Domain integrity ensures the validity and consistency of data by defining the range of values that are permissible within a specific field. It includes restrictions imposed by the data type, the format, and the field size. This concept is discussed in Chapter 5, "Building Tables."
Referential integrity keeps the relationships between tables synchronized. It is usually enforced by including the primary key of one table as the foreign key of another table. Enforcing referential integrity guarantees that records in the two tables remain in sync whenever data is entered, edited, or deleted. This concept is discussed in Chapter 4.
User-defined integrity encompasses business rules outside the scope of other integrity categories. These are rules imposed by the nature of the organization and the way it conducts business. A typical business rule might be that you sell only to customers whose credit rating meets certain standards. This concept is discussed in Chapters 5, "Building Tables," and 6, "Entering, Editing, and Displaying Data."