Section 2.5. Six Principles of Database Design | Access 2007 for Starters[c] The Missing Manual

2.5. Six Principles of Database Design

With great power comes great responsibility. As a database designer, it's up to you to craft a set of properly structured tables. If you get it right, you'll save yourself a lot of work in the future. Well-designed databases are easy to enhance, simpler to work with, and lead to far fewer mind-bending problems when you need to extract information.

Sadly, there's no recipe for a perfect database. Instead, a number of recommendations can guide you on the way. In this section, you'll learn about a few of the most important.

UP TO SPEED
Why It's Important to Be Unique

You won't completely understand why it's so important for each record to have a unique ID number until you work with the more advanced examples in later chapters. However, one of the reasons is that other programs that use your database need to identify a record unambiguously .

To understand why there's a problem, imagine that you've built a program for editing the Dolls table. This program starts by retrieving a list of all your table's bobbleheads. It displays this list to the person using the program, and lets her make changes. Here's the catchif a change is made, the program needs to be able to apply the change to the corresponding record in the database. And in order to apply the change, it needs some unique piece of information that it can use to locate the record. If you've followed the best design practices described above, the unique "locator" is the bobblehead's ID.

Tip: Building a good database is an art that takes practice. For best results, read these guidelines, and then try building your own test databases.

2.5.1. 1. Choose Good Field Names

Access doesn't impose many rules on what field names you can use. It lets you use 64 characters of your choice. However, field names are important. You'll be referring to the same names again and again as you build forms, create reports , and even write code. So it's important to choose a good name from the outset.

Here are some tips:

Keep it short and simple . The field name should be as short as possible. Long names are tiring to type, more prone to error, and can be harder to cram into forms and reports.
CapitalizeLikeThis . It's not a set-in-stone rule, but most Access fans capitalize the first letter of every word (known as CamelCase), and then cram them all together to make a field name. Examples include UnitsInStock and DateOfExpiration.
Avoid spaces . Spaces are allowed in Access field names, but they can cause problems. In SQL (the database language you'll use to search for data), spaces aren't kosher. That means you'll be forced to use square brackets when referring to field name that includes spaces (like [Number Of Guests]), which gets annoying fast. If you really must have spaces, then consider using underscores instead.
Be consistent . You have the choice between the field names Product_Price and ProductPrice. Either approach is perfectly reasonable. However, it's not a good idea to mingle the two approaches in the same databasedoing so's a recipe for certain confusion. Similarly, if you have more than one table with the same sort of information (for example, a FirstName field in an Employees table and a Customers table), use the same field name.
Don't repeat the table name . If you have a Country field in a Customers table, it's fairly obvious that you're talking about the Country where the customer lives. The field name CustomerCountry would be overkill.
Don't use the field name Name . Besides being a tongue-twister, Name is an Access keyword. Instead, use ProductName, CategoryName, ClassName, and so on. (This is one case where it's OK to violate the previous rule and incorporate the table name in the field name.)

You should also give careful thought to naming your tables. Once again, consistency is king. For example, database nerds spend hours arguing about whether or not to pluralize table names (like Customers instead of Customer). Either way's fine, but try to keep all your tables in line.

2.5.2. 2. Break Down Your Information

Be careful that you don't include too much information in a single field. You want to have each field store a single piece of information. Rather than have a single Name field in a table of contacts, it makes more sense to have a FirstName and a LastName field.

There are many reasons for breaking down information into separate fields. First of all, it stops some types of errors. With a Name field, the name could be entered in several different ways (like "Last, First" or "First Last"). Splitting the name avoids these issues, which can create headaches when you try to use the data in some sort of automated task (like a mail merge). But more importantly, you can more easily work with data that's broken down into small pieces. Once the Name field's split into FirstName and LastName, you can perform sorts or searches on just one of these two pieces of information, which you couldn't otherwise do. Similarly, you should split address information into columns like Street, City, State, and Countrythat way, you can far more easily find out who lives in Nantucket.

Figure 2-21 (top) shows an example of proper separation. Figure 2-21 (bottom) shows a dangerous mistakean attempt to store more than one piece of information in a single field.

Figure 2-21. This example shows the right way to subdivide information in the Contacts table (top), and the wrong way (bottom).

2.5.3. 3. Include All the Details in One Place

Often, you'll use the same table in many different tasks . You may use the Dolls table to check for duplicates (and avoid purchasing the same bobblehead twice), to identify the oldest parts of your collection, and to determine the total amount of money you've spent in a given year (for tax purposes). Each of these tasks needs a slightly different combination of information. When you're calculating the total money spent, you aren't interested in the Character field that identifies the doll. When checking for a duplicate, you don't need the DateAcquired or PurchasePrice information.

Even though you don't always need all these fields, it's fairly obvious that it makes sense to put them all in the same table. However, when you create more detailed tables, you may not be as certain. It's not difficult to imagine a version of the Dolls table that has 30 or 40 fields of information. You may use some of these fields only occasionally. However, you should still include them all in the same table. All you'll see in this book, you can easily filter out the information you don't need from the datasheet, as well as in your forms and printed reports.

2.5.4. 4. Avoid Duplicating Information

As you start to fill a table with fields, it's sometimes tempting to include information that doesn't really belong. This inclusion causes no end of headaches, and it's a surprisingly easy trap to fall into. Figure 2-22 shows this problem in action with a table that tries to do too much.

Figure 2-22. This table lists the available pets at an exotic animal breeder's. It also lists some helpful information about the life expectancy, temperament, and meal requirements of each type of animal. Initially, this design seems fairly reasonable. However, a problem appears when you have several of the same type of animals (in this case, three elephants). Now the elephant-specific details are repeated three separate times.

Duplicate data like that shown in Figure 2-22 is inefficient. You can easily imagine a table with hundreds of similar records, needlessly wasting space repeating the same values over and over again. However, this concern's minor compared to the effort of updating that information, and the possibility of inconsistency. What happens if you want to update the life expectancy information for every elephant based on new studies? Based on the current design of the table, you need to change each record that has the same information. Even worse , it's all too easy to change some records but leave others untouched. The overall result's inconsistent datainformation in more than one spot that doesn't agreewhich makes it impossible to figure out the correct information.

This problem occurs because the information in the Pets table doesn't all belong. To understand why, you need to delve a little deeper into database analysis.

As a rule, every table in a database stores a single thing . In the Pets table, that thing is pets. Every field in a table is a piece of information about that thing.

In the Pets table, fields like Name, Animal, and Weight all make sense. They describe the pet in question. But the LifeSpan, Temperament, and Diet fields aren't quite right. They don't describe the individual pet. Instead, they're just standards for that species. In other words, these fields aren't based on the pet (as they should be)they're based on the animal type . The only way to solve this problem is to create two tables: Pets and AnimalTypes (Figure 2-23).

Figure 2-23. Now the animal-specific information is maintained in one place, with no duplicates. It takes a little more work to get all the pet information you needfor example, to find out the life expectancy for Beatrice, you need to check out the Elephant record in the AnimalTypes tablebut the overall design's more logical.

It takes experience to spot fields that don't belong. And in some cases, breaking a table down into more and more sub-tables isn't worth the trouble. You could theoretically separate the address information (contained in fields like Street, City, Country, and PostalCode) from a Customers table, and then place it into a separate Addresses table. However, it's relatively uncommon for two customers to share the same address, so this extra work isn't likely to pay off. You'll consider how to define formal relationships between tables like Pets and AnimalTypes in Chapter 5.

Tip: Many database gurus find the best way to plan a database is to use index cards. To do this, start by writing down all the various types of information you need in your database. Then, set aside an index card for each table you expect to use. Finally, take the fields on the scrap paper, and write them down on the appropriate index cards, one at a time, until everything's set into neat, related groups.

2.5.5. 5. Avoid Redundant Information

Another type of data that just doesn't belong is redundant informationinformation that's already available elsewhere in the database, or even in the same table, sometimes in a slightly different form. As with duplicated data, this redundancy can cause inconsistencies.

Calculated data's the most common type of redundant information. An AverageOrderCost field in a Customers table is an example. The problem here is that you can determine the price of an average order by searching through all the records in the Orders table for that customer, and averaging them. By adding an AverageOrderCost field, you introduce the possibility that this field may be incorrect (it may not match the actual order records). You also complicate life, because every time a customer places an order, you need to recalculate the average, and then update the customer record.

Note: Database gods do sometimes use calculated data as a performance-improving technique. However, this type of optimization's very rare in Access databases. It's more common in industrial-strength server-side databases that power large companies and Web sites.

Here are some more examples of redundant information:

An Age and a DateOfBirth field (in a People table) . Usually, you'll want to include just a DateOfBirth field. If you have both, then the Age field contains redundant information. But if you have only the Age field, you're in troubleunless you're ready to keep track of birthdays and update each record carefully , your information will soon be incorrect.
A DiscountPrice field (in a Products table) . You should be able to calculate the discount price as needed based on a percentage. In a typical business, markups and markdowns change frequently. If you calculate 10 percent discounts and store the revised prices in your database, then you'll have a lot of work to do when the discount drops to nine percent.

2.5.6. 6. Include an ID Field

As you learned earlier (Section 2.4), Access automatically creates an ID field when you create a table in Datasheet view and sets it to be the primary key for the table. But even now that you've graduated to Design view, you should still add an ID field to all your tables. Make sure it uses the AutoNumber data type so Access fills in the numbers automatically, and set it to be the primary key.

In some cases, your table may include a unique field that you can use as a primary key. Resist the temptation . You'll always buy yourself more flexibility by adding an ID field. You never need to change an ID field. Other information, even names and social insurance numbers, may change. And if you're using table relationships, Access copies the primary key into other tables. If a primary key changes, you'll need to track down the value in several different places.

Note: It's a good idea to get into the habit of using ID fields in all your tables. In Chapter 5, you'll see the benefits when you start creating table relationships.