Understanding Database Normalization

Database normalization is essentially a set of rules that allows you to organize your database in such a way that your tables are related, where appropriate, and flexible for future growth and relationships. The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered to be in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form. We'll go through the normal forms using the concept of students and courses within a school.

Looking at a Flat Table

Before looking at the first normal form, let's start with something that needs to be normalized. In the case of a database, it's the flat table. A flat table is like a spreadsheet with many columns for data. There are no relationships between multiple tables, as all of the data you could possibly want is right there in that single flat table. The flat table is not the most efficient design and will consume more physical space on your hard drive than a set of normalized database tables.

Suppose you have a table that holds student and course information for a school. You might have the fields shown in Table 3.1 in your flat table.

Table 3.1: Students and Courses Table

Field Name

Description

StudentName

Name of the student

CourseID1

ID of the first course taken by the student

CourseDescription1

Description of the first course taken by the student

CourseIntructor1

Instructor of the first course taken by the student

CourseID2

ID of the second course taken by the student

CourseDescription2

Description of the second course taken by the student

CourseIntructor2

Instructor of the second course taken by the student

You might then repeat the CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes a student can take during his or her academic career. While redundant, this is the method used when creating a single flat table to store information. Eliminating this redundancy is the first step in normalization, so next you'll take this flat table to the first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarily-not an efficient table design!

The First Normal Form

The main rules for the first normal form are

  • Eliminate repeating information

  • Create separate tables for related data

If you look at the flat table design, with its many repeated sets of fields for students and courses, you can identify students and courses as its two distinct topics. Taking your student and courses flat table to the first normal form would mean that you create two tables: one for students (call it students) and one for students plus courses (call it students_courses). You can see the new table designs in Tables 3.2 and 3.3.

Table 3.2: The students Table

Field Name

Description

StudentID

A unique ID for the student. This new field is now a primary key.

StudentName

Name of the student

Table 3.3: The students_courses Table

Field Name

Description

StudentID

Unique ID of the student, matching an entry in the students table.

CourseID

ID of the course being taken by the student

CourseDescription

Description of the course taken by the student

CourseIntructor

Instructor of the course taken by the student

Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they wish and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.

You still have some work to do; the next step is to put the tables into second normal form.

The Second Normal Form

The basic rule for the second normal form is

  • No non-key attributes depend on a portion of the primary key

In plain English, this means that if fields in your table are not entirely related to a primary key, then you have more work to do. In the students and courses example we're using, this means breaking out the courses into their own table so that the original flat table is now just a table full of students.

CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see the new table designs in Tables 3.4 and 3.5.

Table 3.4: The courses Table

Field Name

Description

CourseID

Unique ID of a course

CourseDescription

Description of the course

CourseIntructor

Instructor of the course

Table 3.5: The New students_courses Table

Field Name

Description

StudentID

Unique ID of the student, matching an entry in the students table.

CourseID

Unique ID of the course being taken, matching an entry in the courses table.

Believe it or not, you can go even further with this example, to the third normal form.

The Third Normal Form

The rule for the third normal form is

  • No attributes depend on other non-key attributes

This rule simply means that you need to look at your tables and see if more fields exist that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer-instructors. Inevitably, an instructor will teach more than one class. However, the CourseInstructor field in the courses table is not a key of any sort. So if you break out this information and create a separate table purely for the sake of efficiency and maintenance, that's the third normal form. Take a look at the new courses table and the instructors table in Tables 3.6 and 3.7.

Table 3.6: The courses Table

Field Name

Description

CourseID

Unique ID of a course

CourseDescription

Description of the course

CourseIntructorID

ID of the instructor, matching an entry in the instructors table

Table 3.7: The Instructors Table

Field Name

Description

InstructorID

Unique ID of an instructor

InstructorName

Name of the instructor

InstructorNotes

Any notes regarding the instructor

The third normal form is usually adequate for removing redundancy and allowing for flexibility and growth while remaining efficient. That, after all, is the goal of database normalization!

With all that theory behind you, it's time to learn some of the language of relational databases: SQL, or Structured Query Language.



PHP Essentials
PHP Essentials, 2nd Edition
ISBN: 1931841349
EAN: 2147483647
Year: 2002
Pages: 74

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net