When you begin creating relational databases for whatever mission, some of the procedures will seem counterintuitive and needlessly complex. Indeed, at times the whole enterprise might seem dubious, at best. So before going any further, let's flesh out a few broad principles of relational database design. This will give you a better idea of the playing field you'll be working in, and perhaps make some of the methods you'll be using seem less perplexing. An Indifference to Where Values Are StoredLet's say you're a fan of the Boston Red Sox. You turn to your hometown newspaper to find out how many games they are in or out of first place. You know that the sports news is always in, say, section D, and Major League Baseball standings are always printed about halfway into the section. You open Section D to the middle and find the baseball stats. You find first the American League standings and then those for its Eastern Division. Your eye moves from left to right as you find the name of the team, its won/lost record and percentage, and the number of games it's in or out of first place. You're delighted to discover that the Sox are leading their division with a .663 percentage, 12 games ahead of the Yankees. The way you find information in a newspaper is through location. The newspaper positions information consistently in a certain ordersports news, baseball news, American League standings, Eastern Division standings, team name, won/lost percentage, games in/out of first placeso you can easily find what you need. A relational database is also consistent and logical, but it is organized quite differently. In a newspaper, the first section (or, at least, the first page) always contains headline news, not sports news (unless, of course, sports news is headline news). In a relational database, there is no first tableor second, or third, or fourth. No table comes before or after any other table. In other words, tables are not placed in any particular sequence. Moreover, within a table, it doesn't matter which column comes before or after any other column. For example, a customer address can come before or after the customer name. Nor does it matter which row comes before or after any other row. In a field of last names, the G's can come before the A's, and the C's can come after both. The physical storage of the data does not depend on location. Of course, when you view your data, you might want to make sure that the last names are alphabetized. Of course, a Red Sox fan will want to see his club listed first if it has the best won/loss percentage. But these are matters of user preference, not database design. Access makes it easy to move columns left and right, and to sort records in any field in either ascending or descending order. Regardless of how you move data, the tables themselves are in no particular hierarchy, nor is any column or row in a specific position. Thus, be careful how you use phrases such as "the fifth row of the table" or "the table's third column." They might describe a row or column's position in the current display, but there really is no fifth row or third column, as such. One Field, One Kind of DataA single field has only one data type. In other words, a column to which you have assigned the Number data type will hold only numbers, a column that has a Date/Time data type will hold only dates/times, and so on. This rule seems to be simple common sense and hardly worth mentioning. Yet in actual practice, there is an inclination to break it. For example, at times it might seem to make good sense to enter the value between 5/18/2005 and 5/19/2005 in a Date/Time field, or 64 to 69 in a Number field. But a Date/Time field cannot contain words such as between and and. And Number fields can hold only numbers; they cannot describe a range. Access is very strict about allowing values of the wrong data type into columns. Only One Value in One Row of One FieldIn each column of each row, there should be only one value (where allowed, a null value is considered one value). This rule sharply contradicts what you saw in the one-table-fits-all design, where there was a strong inclination to put two, three, or even more values in a single column of a single row. Much of database design involves ensuring that there is only one value in each field of each row. Staying true to this principle will prompt much head-scratching and perhaps a headache or two as well. But it's a cardinal rule of good database design. You'll be reminded constantly of this principle as you design the Classic TV database. Break Data into Its Smallest ComponentsA difficulty in implementing the "one value in each field of each row" rule is defining just what a value is. For dates and numbers, that usually isn't a problem because they are discrete entities (such as 5/10/2004 and 3,468.7). But what about text? Is an address, such as 1600 Pennsylvania Avenue, Washington, DC 20500, one value, two values, or more? The answer is that you want to break data into its smallest components. For example, an address usually includes separate fields for the city, state, and ZIP code. You'll see many examples of this principle at work as you move through the database design process.
Keep Redundant Data to a MinimumRedundant data is like cholesterol: There's "bad" redundant data and "good" redundant data. And like cholesterol, the idea that redundant data can be either good or bad for you is difficult to digest (bad pun fully intended). As I've already argued, repeating the same data in more than one field can threaten the consistency and integrity of your data. This "bad" redundant data arises from poor database design that forces you to enter the same values in two or more fields unnecessarily. At the same time, there are duplicate fields with "matching data" that are required for creating a relational database. They enable you to establish relationships, which are essential for bringing together the data in the various tables of your database. Unless otherwise indicated, when I discuss redundant data, I'm talking about the bad kind.
|