Database normalization is essentially a set of rules that allows you to organize your database in such a way that your tables are related, where appropriate, and flexible for future growth and relationships. The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered to be in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form. We'll go through the normal forms using the concept of students and courses within a school.
Before looking at the first normal form, let's start with something that needs to be normalized. In the case of a database, it's the flat table. A flat table is like a spreadsheet with many columns for data. There are no relationships between multiple tables, as all of the data you could possibly want is right there in that single flat table. The flat table is not the most efficient design and will consume more physical space on your hard drive than a set of normalized database tables.
Suppose you have a table that holds student and course information for a school. You might have the fields shown in Table 3.1 in your flat table.
Field Name | Description |
---|---|
StudentName | Name of the student |
CourseID1 | ID of the first course taken by the student |
CourseDescription1 | Description of the first course taken by the student |
CourseIntructor1 | Instructor of the first course taken by the student |
CourseID2 | ID of the second course taken by the student |
CourseDescription2 | Description of the second course taken by the student |
CourseIntructor2 | Instructor of the second course taken by the student |
You might then repeat the CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes a student can take during his or her academic career. While redundant, this is the method used when creating a single flat table to store information. Eliminating this redundancy is the first step in normalization, so next you'll take this flat table to the first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarily-not an efficient table design!
The main rules for the first normal form are
Eliminate repeating information
Create separate tables for related data
If you look at the flat table design, with its many repeated sets of fields for students and courses, you can identify students and courses as its two distinct topics. Taking your student and courses flat table to the first normal form would mean that you create two tables: one for students (call it students) and one for students plus courses (call it students_courses). You can see the new table designs in Tables 3.2 and 3.3.
Field Name | Description |
---|---|
StudentID | A unique ID for the student. This new field is now a primary key. |
StudentName | Name of the student |
Field Name | Description |
---|---|
StudentID | Unique ID of the student, matching an entry in the students table. |
CourseID | ID of the course being taken by the student |
CourseDescription | Description of the course taken by the student |
CourseIntructor | Instructor of the course taken by the student |
Your two tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they wish and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that existed in the flat table.
You still have some work to do; the next step is to put the tables into second normal form.
The basic rule for the second normal form is
No non-key attributes depend on a portion of the primary key
In plain English, this means that if fields in your table are not entirely related to a primary key, then you have more work to do. In the students and courses example we're using, this means breaking out the courses into their own table so that the original flat table is now just a table full of students.
CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see the new table designs in Tables 3.4 and 3.5.
Field Name | Description |
---|---|
CourseID | Unique ID of a course |
CourseDescription | Description of the course |
CourseIntructor | Instructor of the course |
Field Name | Description |
---|---|
StudentID | Unique ID of the student, matching an entry in the students table. |
CourseID | Unique ID of the course being taken, matching an entry in the courses table. |
Believe it or not, you can go even further with this example, to the third normal form.
The rule for the third normal form is
No attributes depend on other non-key attributes
This rule simply means that you need to look at your tables and see if more fields exist that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer-instructors. Inevitably, an instructor will teach more than one class. However, the CourseInstructor field in the courses table is not a key of any sort. So if you break out this information and create a separate table purely for the sake of efficiency and maintenance, that's the third normal form. Take a look at the new courses table and the instructors table in Tables 3.6 and 3.7.
Field Name | Description |
---|---|
CourseID | Unique ID of a course |
CourseDescription | Description of the course |
CourseIntructorID | ID of the instructor, matching an entry in the instructors table |
Field Name | Description |
---|---|
InstructorID | Unique ID of an instructor |
InstructorName | Name of the instructor |
InstructorNotes | Any notes regarding the instructor |
The third normal form is usually adequate for removing redundancy and allowing for flexibility and growth while remaining efficient. That, after all, is the goal of database normalization!
With all that theory behind you, it's time to learn some of the language of relational databases: SQL, or Structured Query Language.