Further Normalization

The first three normal forms were included in Codd's original formulation of relational theory, and in the vast majority of cases they're all you'll need to worry about. Just remember the jingle I learned in grad school: "The key, the whole key, and nothing but the key, so help me Codd."

The further normal forms—Boyce/Codd, fourth, and fifth—have been developed to handle special cases, most of which are rare.

Boyce/Codd Normal Form

Boyce/Codd normal form, which is considered a variation of third normal form, handles the special case of relations with multiple candidate keys. In fact, for Boyce/Codd normal form to apply, the following conditions must hold true:

  • The relation must have two or more candidate keys.
  • At least two of the candidate keys must be composite.
  • The candidate keys must have overlapping attributes.

The easiest way to understand Boyce/Codd normal form is to use functional dependencies. Boyce/Codd normal form states, essentially, that there must be no functional dependencies between candidate keys. Take, for example, the relation shown in Figure 2-21. The relation is in third normal form (assuming supplier names are unique), but it still contains significant redundancy.

click to view at full size.

Figure 2-21. This relation is in third normal form but not in Boyce/Codd normal form.

The two candidate keys in this case are {SupplierID, ProductID} and {SupplierName, ProductID}, and the functional dependency diagram is shown in Figure 2-22.

Figure 2-22. This is the functional dependency diagram of the relation in Figure 2-21.

As you can see, there is a functional dependency {SupplierID} {SupplierName}, which is in violation of Boyce/Codd normal form. A correct model is shown in Figure 2-23.

Figure 2-23. This model is a fully normalized version of Figure 2-21.

Violations of Boyce/Codd normal form are easy to avoid if you pay attention to what the relation is logically about. If Figure 2-21 is about Products, the supplier information shouldn't be there (and vice versa, of course).

Fourth Normal Form

Fourth normal form provides a theoretical foundation for a principle that is intuitively obvious: independent repeating groups should not be combined in a single relation. By way of example, let's assume that the own-brand products sold by Northwind Traders come in multiple package sizes, that they are sourced from multiple suppliers, and that all suppliers provide all pack sizes. A completely unnormalized version of the Products relation might look like Figure 2-24.

click to view at full size.

Figure 2-24. This is an unnormalized relation.

Now, the first step in normalizing this relation is to eliminate the nonscalar PackSize attribute, resulting in the relation shown in Figure 2-25.

click to view at full size.

Figure 2-25. This version of the relation shown in Figure 2-24 is in Boyce/Codd normal form.

Surprisingly, Figure 2-25 is in Boyce/Codd normal form, since it is "all key." But there are clearly redundancy problems, and maintaining data integrity could be a nightmare. The resolution to these problems lies in the concept of multivalued dependency pairs and fourth normal form.

A multivalued dependency pair is two mutually independent sets of attributes. In Figure 2-24, the multivalued dependency is {ProductName} {PackSize}| {SupplierName}, which is read "Product multidetermines PackSize and Supplier." Fourth normal form states, informally, that multivalued dependencies must be divided into separate relations, as shown in Figure 2-26. Formally, a relation is in fourth normal form if it is in Boyce/Codd normal form and in addition all the multivalued dependencies are also functional dependencies out of the candidate keys.

Figure 2-26. Relations containing multivalued dependencies should be decomposed.

The important thing to understand about fourth normal form is that it comes into play only if there are multiple values for the attributes. If each product in the example above had only a single pack size or a single supplier, fourth normal form would not apply. Similarly, if the two sets of attributes are not mutually independent, the relation is most likely in violation of second normal form.

Fifth Normal Form

Fifth normal form addresses the extremely rare case of join dependencies. A join dependency expresses the cyclical constraint "if Entity1 is linked to Entity2, and Entity2 is linked to Entity3, and Entity3 is linked back to Entity1, then all three entities must necessarily coexist in the same tuple."

To translate this into something resembling English, it would mean that if {Supplier} supplies {Product}, and {Customer} ordered {Product}, and {Supplier} supplied something to {Customer}, then {Supplier} supplied {Product} to {Customer}. Now, in the real world this is not a valid deduction. {Supplier} could have supplied anything to {Customer}, not necessarily {Product}. A join dependency exists only if there is an additional constraint that states that the deduction is valid.

It is not sufficient, in this situation, to use a single relation with the attributes {Supplier, Product, Customer} because of the resulting update problems. Given the relationship shown in Figure 2-27, for example, inserting the tuple {"Ma Maison", "Aniseed Syrup", "Berglunds snabbköp"} requires the insertion of a second tuple, {"Exotic Liquids", "Aniseed Syrup", "Berglunds snabbköp"}, since a new link, "Aniseed Syrup" to "Berglunds snabbköp" has been added to the model.

click to view at full size.

Figure 2-27. This relation is not in fifth normal form.

Decomposing the relation into three distinct relations (SupplierProduct, ProductCustomer, and SupplierCustomer) eliminates this problem but causes problems of its own; in re-creating the original relation, all three relations must be joined. Interim joins of only two relations will result in invalid information.

From a system designer's point of view, this is a terrifying situation, since there's no intrinsic method for enforcing the three-table join except through security restrictions. Further, if a user should create an interim result set, the results will seem perfectly reasonable, and it's unlikely that the user will be able to detect the error by inspection.

Fortunately, this cyclical join dependency is so rare that the complications can be safely ignored in most situations. Where they can't, your only resource is to build the database system in such a way as to ensure the integrity of the multirelation join.



Designing Relational Database Systems
Designing Relational Database Systems (Dv-Mps Designing)
ISBN: 073560634X
EAN: 2147483647
Year: 1999
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net