Eliminating redundant data may be the need that gave rise to the relational model, but normalization is the process by which we put that purpose into action. Normalization is the process of dividing data into two or more related tables. We accomplish this task by carefully reviewing tables for groups of repeated data or inappropriate dependencies and then moving each group to a new table.
A number of terms may be new to you, so let's review those before we tackle the actual subject of relational database theory:
Database A collection of persistent data.
Relational database A collection of persistent data that's stored in multiple, but related, tables.
Entity A conceptual collection of one type of data.
Entity class The complete collection of a single type of data.
Entity set The known collection of a single type of data-the data of any given entity class that is stored in your database.
Table A collection of related data that's stored in rows and columns, also known as a relation.
Field The smallest unit of data in the database, commonly referred to as a field or a column. It defines a category of information within an entity.
Row A combination of fields that together completes one entity.
Record The data in one row-consisting of related data for one entity.
Scheme The set of field names for a table.
Primary key A value that uniquely identifies each record.
Foreign key A field that refers to a related table's primary key field.
Relationship An association between two tables based on a unique value known as the primary key.
Some of the terms listed here won't appear early in our discussion. However, they are terms you may see later, so you might as well learn them now. Specifically, let's look at the three rather cryptic "entity" terms as they refer to our database that stores information on books. Within that database, we have several tables, one of which is Books.
Following this scenario, an entity is all the information about any one book-its title, ISBN, category, and so on. The entity set is the information stored in our database on books, which might be 100 books or 100,000 books. In contrast, the entity class encompasses all possible books.
We're not going to try to teach you an entire semester's worth of relational algebra and its supportive terminology. However, introducing you to these terms does set the stage for what you're about to learn.
Almost always, you can reduce repeated entries to one record in a new table and then relate this record to each corresponding record in the original table via the primary and foreign key values. Generally, each table contains a primary key field that uniquely identifies each record in that table. Related tables use the primary key value as a foreign key to associate their data with the primary key value (or record). For example, suppose our Books database contains an authors table as well as a books table. Each book's unique identification number can be the primary key in the books table and a foreign key in the authors table. Both the primary and foreign key values are the same; however, their functions are different. The primary key value identifies a record, and the foreign key identifies an association between two records.
To normalize your data, you apply a set of rules (or constraints) known as normal forms. There are seven forms, but we'll review just the first four, since that level of normalization is usually adequate for most of the available Relational Database Management Systems (RDBMSs).
First Normal Form (1NF) The compliant table contains no multivalued items or repeating groups. In addition each field must be atomic-that is, each field must contain the smallest data unit possible. Repeating and multivalued data should be moved to another table. The 1NF-compliant table must have a primary key. (The next section provides guidelines for choosing a primary key.)
Second Normal Form (2NF) The table must be normalized to 1NF. Also, all fields in the table must refer to (or describe) the primary key value. In other words, each field must be fully dependent on the primary key value. (Don't worry if this concept of dependency seems dauntingly abstract; you'll see what it means in practice as we design a sample database later in this chapter.) When the primary key is based on more than one field, all non-key values must depend on the complex primary key, not just on one value (or field) within the key. Any non-key value that doesn't support the primary key should be moved to another table.
Third Normal Form (3NF) The table should conform to 1NF and 2NF. All fields must be mutually independent. Any value that describes a non-key field must be moved to another table.
Boyce-Codd Normal Form (BCNF) There must be no possibility whatsoever of non-key dependent fields. This subrule of 3NF catches values that might fall through 3NF cracks. Any field that shows a dependence on any non-key value must be moved to another table.
Developers disagree about the level of normalization needed to create the most efficient and easily maintained design. Many require that all tables be fully normalized to BCNF, while others are satisfied with 3NF, especially if BCNF begins to degrade performance. Some assert that current system speeds are fast enough to negate the performance issue in almost all instances and that consequently, performance isn't a valid excuse for not normalizing through BCNF.