Normalization


Up to this point in the chapter, we have been normalizing tables—though without using that term. Database normalization can be (and often is) discussed in a very complex and opaque manner, bristling with technical terms, but this isn’t necessary. When designing Access databases, normalization boils down to eliminating duplication of data in different tables, and using key fields to link tables so you can get the data you need from other tables through the links. There are five levels of database normalization (first normal form through the fifth normal form); generally, only the first three are used in Access databases. I define the five normal forms below, first in technospeak, then in regular English.

First Normal Form: Eliminate Repeating Groups

This means that you shouldn’t have multiple fields for the same type of information in a table, such as multiple phone numbers or addresses for a customer. In some cases (generally only in tables imported from flat-file databases) the repeating data may be in a single field, such as a list of graduate degrees for faculty members, separated by commas. The problem with putting separate bits of information into a single field is obvious: If you wanted to create a query to view all the faculty with Ph.Ds, this would be a difficult task, requiring the creation of complex expressions to extract the different degrees for each faculty member, and you would probably not get all the records you need, because of differences in punctuation when the data was entered.

Several of the table templates in the Table Wizard violate the first normal form, such as the Contacts table with its multiple phone numbers. Instead of keeping multiple phone fields in tblVendors and tblCustomers, I created separate tables to hold phone numbers and email addresses for vendors and customers: tblCustomerPhones, tblCustomerEmails, tblVendorPhones, and tblVendorEmails. Breaking out this information into separate tables serves two practical purposes: It guarantees that you will always be able to enter another phone number or email address for a client (if you have just phone and fax fields, how are you going to enter the customer’s cell phone number?), and it also makes it easier to use the information elsewhere in the database—you can pick up the phone numbers belonging to a customer by linking to tblCustomerPhones by CustomerID, instead of having to reference each of a set of named phone fields separately.

Repeating data in a single field (as in the faculty degrees example mentioned previously) should also be broken out into a separate table, both for accuracy of data entry (users should select degrees from a lookup table, rather than typing them into a field), and to allow entry of as many degrees as are needed for a faculty member.

Second Normal Form: Eliminate Redundant Data

There are two ways that redundant data can get into a database: One is by entering the same data into different records of a table. This could happen if you use the Table Wizard’s Orders table template, with its address fields, and enter several orders from the same customer. If you enter that customer’s shipping address into three different records, that is duplicate data. I avoided this situation by breaking out shipping address data into its own table, tblShippingAddresses, and placing a ShipAddressID field in tblOrders. This field links to the key field of the same name in tblShippingAddresses, which avoids the need to enter the same data into many records, and also guarantees that if there is a change in the shipping address, it needs to be made only once, not in every order record using that address.

You can also have redundant data when the same information is entered into two different tables. For example, if you have a Customers table and an Orders table, you should not put customer billing address fields in both tables. Either place the billing address fields in a separate table, linked one-to-one with tblCustomers by CustomerID, or place them in the Customers table and remove them from the Orders table. Shipping address fields should also not be duplicated in two tables; in this case, they should definitely be moved to a separate table (as I do in the sample database), because there can be multiple shipping addresses per customer.

In some cases, for recordkeeping purposes, it might be desirable to keep a record of the shipping address used at the time an order is shipped—even if that address changes later on. In that case, the shipping address fields could be retained in tblOrders, along with the ShipAddressID field, and when a shipping address is selected on a form, data from the selected shipping address could be pulled from tblShippingAddresses and written to the shipping address fields in tblOrders. This method eliminates the need to type the shipping address into every record, but preserves the shipping address data for each order even if the customer’s shipping address is changed later on.

Third Normal Form: Eliminate Columns Not Dependent on Key

This means that any fields that don’t belong to the record should be moved into a separate table. For example, the initial version of tblMailingList, made from the Table Wizard Mailing List table template, contains both information about the person receiving the mailings (name information, title, and so forth) and information about the company (company name and address). Because the company information doesn’t belong to the person, I created a separate tblMailingListCompanies table linked by a CompanyID field to store mailing list company data. However, I left the address fields in tblMailingList, so they could be used for personal addresses for persons on the mailing list who are using their own addresses rather than company addresses.

Fourth Normal Form: Isolate Independent Multiple Relationships

In a database with many-to-many relationships, don’t add irrelevant fields to the linking table that connects the two “many” tables. In a student records database, for example, using a many-to-many relationship between Students and Classes, with a linking table tblStudentClasses, you might have a Semester and Year field, indicating that a particular student took a particular class in a specific semester and year. That would be appropriate, but if you were to add a phone number field to the linking table, that would violate the fourth normal form because that field doesn’t belong to the combined student-class record, but to the student record, so it should be placed in the Student table.

In the Toy Workshop sample database, there is a many-to-many relationship between Toys and Materials. For example, a toy can use multiple materials, and a material can be used for multiple toys. tblToyMaterials is the linking table for this many-to-many relationship. As is typical of such tables, it contains only the two foreign key fields that link to the key fields in the two “many” tables. If you add any extra fields other than the two key fields to such a table, they should be related to the combination of the two linked records, to avoid violation of the fourth normal form.

It is unlikely that you will have to worry about violating the fourth normal form because (unlike the first through third normal forms) it isn’t likely that you’ll be inclined to set up tables that violate it, or even have to rework tables that violate this form.

Fifth Normal Form: Isolate Semantically Related Multiple Relationships

Violation of this normal form requires a complex and unlikely scenario, and frankly there is a minimal chance that you will ever have to worry about it. In some circumstances, this form requires the separation of even related fields into a separate table. For example, in a many-to-many Students-Classes relationship, although semester and year information could appropriately be added to the linking tblStudentsClasses table, in some cases it would be preferable to maintain that information in a separate table, with information linking classes to specific Semester-Year combinations.




Expert One-on-One(c) Microsoft Access Application Development
Expert One-on-One Microsoft Access Application Development
ISBN: 0764559044
EAN: 2147483647
Year: 2006
Pages: 124
Authors: Helen Feddema

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net