5.1 Normalization


In relational database theory, five normal forms are defined. Normal forms represent a guideline for record design and are essential when building complex data structures. They can be used to prevent inconsistencies and anomalies in the data and will help you to build clear and efficient data structures. In this book normal forms will only be covered very briefly because normalization can be a complicated subject if you want to build data models that satisfy all demands of the fifth normal form.

5.1.1 The First Normal Form

In the first normal form, all records in the table must have the same number of columns and all columns have to be the same data type. Every name of a field has to be unique within a table. The relational theory does not allow records with a variable number of columns, and the name of a column has to be unambiguous. In addition, a table should be created for every group of related data. Every table must have one unique column, which will be the primary key. One cell must not contain more than one value.

All restrictions of the first normal form have nothing to do with database design, but these restrictions are important for definition purposes.

5.1.2 The Second Form

The second and the third normal form deal with the relations between fields used as keys and fields that are not keys.

The most important condition that has to be fulfilled is that separate tables for sets of values that apply to multiple records have to be created. These tables have to be related to a foreign key.

In addition to the demands of the second form, all demands of the first normal form have to be fulfilled as well.

5.1.3 The Third Normal Form and Beyond

A model in third normal form has to satisfy all demands of the first and second normal forms. In addition, fields that do not depend on a key have to be eliminated.

Normally, people do not build database models that satisfy more than the demands of the third normal form. In addition to the third normal form, three more normal forms are defined. The next normal forms in the list are the Boyce-Codd normal form, the fourth, and the fifth normal form.

5.1.4 An Example

In order to illustrate what you have just learned, here is a practical example of how normalization can be done.

Table 5.1 shows an unnormalized table.

Table 5.1. An Unnormalized Table
Student Advisor Adv-Room Class 1 Class 2 Class 3
999 Alan 214 101 341 951
2300 Carl 612 102 112 412

The problem with this table is that more than one column containing the same kind of data is in the table. This is extremely inflexible because in this model there is only space for three classes every class added to the scenario would need a separate column. To solve this problem, we use Codd's first normal form, as shown in Table 5.2.

Table 5.2. First Normal Form
Student Advisor Adv-Room Class
999 Alan 214 101
999 Alan 214 341
999 Alan 214 951
2300 Carl 612 102
2300 Carl 612 112

We have made one column out of three to avoid repeatable columns in first normal form. As you can see, the table is much longer now and a lot of redundancies are in the table because the names of the advisor and the IDs of the students are listed more than just once. This is extremely bad because a lot of space is wasted and UPDATE operations will take far too long. In addition, it is not useful to update a lot of records if only one name changes. The problem can be solved by using Codd's second normal form, as shown in Table 5.3.

Table 5.3. The Main Table
Student Advisor Adv-Room
999 Alan 214
2300 Carl 612
Students and Classes
Student Class  
999 101  
999 341  
999 951  
2300 102  
2300 122  
2300 412  

Two tables have been made out of one. This way the redundancies in the table have been reduced. Names are listed only once, so UPDATE operations can be performed more easily. In this example, the column Student in the second table is a foreign key of the first table.

As you can see, modifying a model so that it satisfies the demands of the second normal form is easy. In most cases second, or respectively third, normal form will be enough; otherwise, the number of tables will be far too high and performing simple selections will be more complicated. Having a lot of tables might help you to build more flexible data structures, but on the other hand, it will be more complicated for the database to find the best way through a query because the number of ways through a query will grow exponentially.



PHP and PostgreSQL. Advanced Web Programming2002
PHP and PostgreSQL. Advanced Web Programming2002
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 201

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net