Normalization

Once the relationships between the database entities have been established, it is time for your database to undergo the normalization process.

Normalizing means disassembling your large database tables into a number of smaller ones to reduce redundancy. This, of course, means revising some relationships between the tables, even discovering new entities. The normalization steps are called forms, and database theoreticians have so far defined five Normal Forms; this is as far as you can possibly take your database on the way to normalization.

As you may see, designing a database is an iterational process — the same steps get repeated over and over again to the highest level until you're quite satisfied with the results. The same goes for the normalization phase. There is no such thing as a completely normalized database: for one thing, any database that goes beyond third Normal Form incurs a significant performance hit (as it struggles with all the additional joins needed to accommodate additional tables); for another, it could be difficult to detect any clues for further separation of data into the fourth and fifth Normal Forms. Use your best judgment as to where to stop while normalizing your database. Third Normal Form is adequate for most uses.

Note 

A Normal Form is a set of rules applied to a table to ensure that there is no redundant information (not only duplicate information but also information that could be derived on demand). For example, if you store an employee's age in addition to the employee's date of birth, you would be stuck with updating the employee age column every year.

First Normal Form

The first Normal Form deals with repeating groups. Let's normalize the table CUSTOMER shown in Table C-1.We start by recording all the information we feel would be relevant: customer name, address, orders they've placed, and so on.

Now, while orders definitely belong to the customer, putting them into the CUSTOMER table would be an enormous waste of storage space, and it breaks relational database principles. Customer information would have to be replicated for each new order, and the order's information (such as description, suppliers) would be repeated for each customer that placed a similar order. The logical (as well as relational) solution would be to remove orders from the CUSTOMER table and create an ORDER table. Each customer would get a unique ID, which would be the primary key for the table CUSTOMER, and all information about orders would be moved into the ORDER table. To tie customers with their orders, the ORDER table would have to have a column with CUSTOMER_ID as a foreign key, as shown in Table C-2 and Table C-3.

Table C-2: Table CUSTOMER after First Normal Form

CUST_ID

CUST_NAME

SALES_ REGION

SALES_ REGION_ID

ADDRESS

CREDIT_ RATING

1

Lone Dove Ent.

North Pacific

3

1234 Elm Street, Salem, OR 95679

Excellent

2

ACME Corp

Midwest

4

567 Pine Street, Chicago, IL 07891

Average

Table C-3: New ORDER Table

CUST_ID

ORDER_ID

PRODUCT_ID

DESCRIPTION

1

NA1234-89

P01245

Paper Pulp

2

GA4358-92

SD3457

Sawdust

1

HA5432-02

WC7863

Wood Chips

While removing redundancy for customer information, this solution does nothing to alleviate the problem of redundancy for order information. The order is unique, and the tables CUSTOMER and ORDER are in a one-to-many relationship, where for each customer there may be one or more orders, but for each order there can be one and only one customer. Here, the order is for some products, and keeping product alongside with orders results in repeating product information, as new orders for the same product are entered.

The same process would be repeated for every other irrelevant group: customer's address, bank accounts, credit rating, and so on.

Second Normal Form

The second Normal Form eliminates redundant data; it establishes that there can be no nonkey attributes (fields) that depend on a portion (or the whole) of the primary key. The ORDER table needs to be normalized further: a composite primary key made up of ORDER_ID and PRODUCT_ID is a possible answer. But the product description depends on PRODUCT_ID only, not on the ORDER_ID. This works fine if each order consists of only one product; what if a customer places an order for multiple products to save on shipping? Or the product description changes — and now you have to update each and every order that contains this product? This calls for introducing the PRODUCT table. The PRODUCT_ID remains in the ORDER table but becomes a foreign key.

Table CUSTOMER for again normalized to the first Form with the column CREDIT_RATING removed to CREDIT table (not shown). See Tables C-4, C-5, and C-6.

Table C-4: Table CUSTOMER after Second Normal Form

CUST_ID

CUST_NAME

SALES_REGION

SALES_REGION_ID

ADDRESS

1

Lone Dove Ent.

North Pacific

3

1234 Elm Street, Salem, OR 95679

2

ACME Corp.

Midwest

4

567 Pine Street, Chicago, IL 07891

Table C-5: New ORDER Table

CUST_ID

ORDER_ID

PRODUCT_ID

1

NA1234-89

P01245

2

GA4358-92

SD3457

1

HA5432-02

WC7863

Table C-6: New PRODUCT Table

PRODUCT_ID

QUANTITY

DESCRIPTION

P01245

20000

Paper Pulp

SD3457

400000

Sawdust

WC7863

70000

Wood Chips

Third Normal Form

The third Normal Form declares that there should not be attributes (fields) that depend on other nonkey attributes. That means that only relevant information describing an entity's primary key has a place in your table. The table CUSTOMER contains SALES_REGION_ID and SALES_REGION_NAME, which do not describe a customer. This data must be moved into a separate table called, say, REGIONS to achieve the third Normal Form as shown in Table C-7 and Table C-8.

Table C-7: Table CUSTOMER After First Normal Form

CUST_ID

CUST_NAME

SALES_REGION_ NAME

SALES_ REGION_ID

ADDRESS

1

Lone Dove Ent.

North Pacific

3

1234 Elm Street, Salem, OR 95679

2

ACME Corp.

Midwest

4

567 Pine street, Chicago, IL 07891

Table C-8: New REGION Table

SALES_REGION_ID

SALES_REGION_NAME

3

North Pacific

4

Midwest

5

North Atlantic

Note 

The fourth Normal Form states that independent multiple relationships should be isolated. It applies only to designs that include one-to-many and many-to-many relationships. The theory states that no table may contain two or more one-to-many and many-to-many relationships that are not directly related.

The fifth Normal Form requires isolation of semantically related multiple relationships. There might be reasons for separating even logically related attributes into separate tables, but you would pay dearly in performance if you decide to go there.

Things are moving along quite nicely, but now you have a problem: your customers opened more than one location and now one order could be related to more than one customer and shipped to more than one destination; at the same time one customer still could place more than one order. To resolve many-to-many relationships, an intermediary table is introduced. As an example, you may look at the ORDER_SHIPMENT table in the ACME database (see Appendix B). It links table ORDER and table SHIPMENT, since for one shipment there could be more than one order, and each order could be sent out in more than one shipment. It consists of two columns, which represent foreign keys — one (ORDSHIP_ORDHDR_ID_FN) from the table ORDER_HEADER, and another (ORDSHIP_SHIPMENT_ID_FN) from the table SHIPMENT.




SQL Bible
Microsoft SQL Server 2008 Bible
ISBN: 0470257040
EAN: 2147483647
Year: 2005
Pages: 208

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net