Understanding Normalization


Normalization is simply a set of rules that will ultimately make your life easier when you're acting as a database administrator. It's the art of organizing your database in such a way that your tables are related where appropriate and flexible for future growth.

The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form.

Throughout this chapter, you'll learn about each rule in the first, second, and third normal forms and, we hope, will follow them as you create your own applications. You'll be using a sample set of tables for a students-and-courses database and taking it to the third normal form.

Problems with the Flat Table

Before launching into the first normal form, you have to start with something that needs to be fixed. In the case of a database, it's the flat table. A flat table is like a spreadsheetit has many, many columns. There are no relationships between multiple tables; all the data you could possibly want is right there in that flat table. This scenario is inefficient and consumes more physical space on your hard drive than a normalized database.

In your students-and-courses database, assume that you have the following fields in your flat table:

  • StudentNameThe name of the student.

  • CourseID1The ID of the first course taken by the student.

  • CourseDescription1The description of the first course taken by the student.

  • CourseInstructor1The instructor of the first course taken by the student.

  • CourseID2The ID of the second course taken by the student.

  • CourseDescription2The description of the second course taken by the student.

  • CourseInstructor2The instructor of the second course taken by the student.

  • Repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes students can take during their academic career. Understanding Normalization

With what you've learned so far, you should be able to identify the first problem area: CourseID, CourseDescription, and CourseInstructor columns are repeated groups.

Eliminating redundancy is the first step in normalization, so next you'll take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarilynot an efficient table design.

First Normal Form

The rules for the first normal form are as follows:

  • Eliminate repeating information.

  • Create separate tables for related data.

If you think about the flat table design with many repeated sets of fields for the students-and-courses database, you can identify two distinct topics: students and courses. Taking your students-and-courses database to the first normal form would mean that you create two tables: one for students and one for courses, as shown in Figure 15.9.

Figure 15.9. Breaking the flat table into two tables.


Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they want and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.

The next step is to put the tables into second normal form.

Second Normal Form

The rule for the second normal form is as follows:

  • No non-key attributes depend on a portion of the primary key.

In plain English, this means that if fields in your table are not entirely related to a primary key, you have more work to do. In the students-and-courses example, you need to break out the courses into their own table and modify the students_courses table.

CourseID, CourseDescription, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see this new design in Figure 15.10.

Figure 15.10. Taking your tables to second normal form.


This structure should look familiar to you as a many-to-many relationship using an intermediary mapping table. The third normal form is the last form we'll look at, and you'll find it's just as simple to understand as the first two.

Third Normal Form

The rule for the third normal form is as follows:

  • No attributes depend on other non-key attributes.

This rule simply means that you need to look at your tables and see whether you have more fields that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answerinstructors. Inevitably, an instructor will teach more than one class. However, CourseInstructor is not a key of any sort. So, if you break out this information and create a separate table purely for the sake of efficiency and maintenance (as shown in Figure 15.11), that's the third normal form.

Figure 15.11. Taking your tables to third normal form.


Third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. The next section will give you some pointers for the thought process involved in database design and where it fits in the overall design process of your application.




Sams Teach Yourself PHP, MySQL And Apache All in One
Sams Teach Yourself PHP, MySQL and Apache All in One (3rd Edition)
ISBN: 0672328739
EAN: 2147483647
Year: 2004
Pages: 327

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net