What Is Normalization?


Publisher William F. Buckley Jr. told a wonderful story about a waiter who had a complaint about Buckley's journal, National Review. "I love reading the articles," said the waiter. "But why do you have to use so many big words?"

A year later, Buckley met the same waiter, who thanked Buckley for taking his advice and dumbing down the vocabulary. Of course, Buckley had done no such thing. The waiter had learned the "big words" that had given him such trouble, so they were longer "big."

The database term normalization reminds me of that story. The word itself sounds peculiarly technical and opaque. The database theory that it defines is difficult and, for many, impenetrable. (This is by no means a put-down: Database theory is absolutely essential and fundamental to what database designers do; it is just hard to understand.)

But if you look beyond the technical language in which the theory is expressed, you'll find a body of principle that makes much sense. In fact, if you've read and understood this chapter, I think you'll be able to understand the most important rules of normalization. This section will actually serve as a review of what you already know.

An exposure to formal database theory and normalization won't necessarily improve your database design skills. But you will come across some of its terms and rules, and you shouldn't feel intimidated by them (well, no more than necessary). Hopefully this short discussion will put you in the mind-set of Mr. Buckley's waiter: Normalization just won't seem like a big word anymore.

As you read through the chapter, you probably noted that much of your work involved excluding duplicate values. The elimination of repeating data is the overall purpose of creating a normal form. The process of creating a database involves satisfying various levels of the normal form. You start at the first normal form, make sure your database satisfies its requirements, proceed to the second normal form, satisfy its requirements, and so on. As you satisfy each requirement, you eliminate redundancy and set up your data so that it can be processed more efficiently and so that it also is less likely to become inconsistent.

As I've indicated, each level of the normal form is stated in dense theoretical language. I try to describe them in simple language. Although there are several levels of the normal form, the first three are generally considered the most important.

First Normal Form

To meet the requirements of the first normal form, you move repeating and multivalue fields to another table. You've done that throughout the chapter. For example, consider a Books field (or fields) in an Authors table; the field(s) contain the authors' major works. Each author has written several books.

There are two commonsense approaches to resolving this issue. Table 3.35 shows the "repeated field" approach: You create as many additional columns as you need, one for each book. (The author names are not part of the table and are included for information purposes only.) Table 3.36 shows the "multivalue" approach: You stuff all the values (books) into one field.

Table 3.35. The "Repeated Fields" Approach to Storing Values

AuthorID

Book 1

Book 2

Book 3

<<other fields>>

1 (Dostoevsky)

The Brothers Karamazov

The Gambler

Crime and Punishment

 

2 (Turgenev)

Fathers and Sons

   

3 (Tolstoy)

War and Peace

Anna Karenina

  


Table 3.36. The "Multivalue" Approach to Storing Values

AuthorID

Books

<<other fields>>

1 (Dostoevsky)

The Brothers Karamazov, The Gambler, Crime and Punishment

 

2 (Turgenev)

Fathers and Sons

 

3 (Tolstoy)

War and Peace, Anna Karenina

 


Neither approach is used in relational databases. As you've seen, a solution is to create an entirely new table (see Table 3.37). You copy the primary key from the original table and include it as a foreign key. Each book is now a separate value. Each record comprises the value of the primary key of the original table and the book name.

Table 3.37. A Resolved Multivalue Field

AuthorID

Book

1 (Dostoevsky)

The Brothers Karamazov

1 (Dostoevsky)

The Gambler

1 (Dostoevsky)

Crime and Punishment

2 (Turgenev)

Fathers and Sons

3 (Tolstoy)

War and Peace

3 (Tolstoy)

Anna Karenina


This is just what you did much earlier in the chapter when we removed TV characters from the Programs table and placed them in a separate table. You placed the primary key of the Programs table, along with the character names, in an entirely new table. This new table is the child table, and the original table is the parent table.

Your work is by no means finished: You still have to decide on a primary key for the new table, identify what other fields will be included (if any), and determine how the table fits into the web of relationships that bind the database together. But you've made an important start.

Second Normal Form

To comply with the second normal form, you need to remove fields that do not depend on the entire primary key. In other words, each column should be kept with the table it describes.

Table 3.38 is similar to the Roles table you've seen, but with one addition and one assumption. The addition is that I've added a Network field. The assumption is that there is no separate Networks table. (Again, solely for illustrative purposes, I've included the program and actor names for each ID in parentheses.)

Table 3.38. The Values in the Network Field Are Repeated Unnecessarily Because the Network Field Depends on Only One Component of the Composite Primary Key

Roles with Network Field

ProgramID (CPK/FK)

ActorID (CPK/FK)

Network

RoleFirstName

<<otherfields>>

14 (Bewitched)

20 (Elizabeth Montgomery)

ABC

Samantha

 

14 (Bewitched)

22 (Dick York)

ABC

Darrin

 

18 (M*A*S*H*)

27 (Alan Alda)

CBS

Hawkeye

 

18 (M*A*S*H*)

28 (McLean

CBS

Henry

 
 

Stevenson)

   

18 (M*A*S*H*)

30 (Gary

CBS

Radar

 
 

Burghoff)

   


Recall that the table has a composite primary keythat is, the primary key comprises both the ProgramID and the ActorID. Take a look at the values in the Network column. The values are repeated for the same program. For example, there is one value of ABC for each value of 14 (Bewitched), so the table tells you twice that ABC is the network of Bewitched.

The reason for the redundant data is that the Network field depends on only one part of the primary key of the Roles table: ProgramID. The network has nothing to do with the other component of the primary key: ActorID. Thus, the Network field is in the wrong table; it belongs in the Programs table (see Table 3.39) because only the program identifies the network, not the actor.

Table 3.39. The Network Field Belongs in the Programs Table Because Only the Program Describes the Network

ProgramID

Program

Network

<other fields>

14

Bewitched

ABC

 

15

The Odd Couple

ABC

 

16

The Twilight Zone

CBS

 

17

Northern Exposure

CBS

 

18

M*A*S*H

CBS

 


Third Normal Form

The third normal form should be mostly familiar to you. To meet its requirements, you need to remove any field that depends on other, nonkey fields.

You tackled this problem in the Classic TV database when you found a calculated fieldspecifically, Years on Air. The number in that field can be derived by subtracting the start date from the end date. In this case, the value in the calculated field depends on the values in one or more fields in the table. Thus, the field is unnecessary.

In Table 3.40, the Total Discount field is calculated by multiplying the discount by the unit price. This field should be eliminated because it totally depends on the Unit Price and Discount fields for its own value.

Table 3.40. The Total Discount Field Is a Calculated Field That Violates the Third Normal Form

Order Details with Calculated Field

OrderID

ProductID

UnitPrice

Discount

Total Discount

11600

Clam Chowder

$10.00

5%

$0.50

11600

Dried Apples

$15.00

10%

$1.50

11601

Tartar Sauce

$12.00

15%

$1.80

11601

Mozzarella

$12.00

5%

$ .60

11601

Ravioli Angelo

$15.00

5%

$ .75





Hands-On Microsoft Access(c) A Practical Guide to Improving Your Access Skills
Hands-On Microsoft Access: A Practical Guide to Improving Your Access Skills
ISBN: 0321245458
EAN: 2147483647
Year: 2005
Pages: 169
Authors: Bob Schneider

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net