A good database design is fundamental to the success of any application. Logical database design for relational databases follows a set of rules called rules of normalization. As a result of normalization, you create a data model that is usually, but not necessarily , translated into a physical data model. A logical database design does not depend on the relational database you intend to use. The same data model can be applied to Oracle, Sybase, SQL Server, or any other relational database. On the other hand, a physical data model makes extensive use of the features of the underlying database engine to yield optimal performance for the application. Physical models are much less portable.
Any database designer must address two fundamental issues:
Normalization is a technique used by relational databases to organize data across many tables so that related data is kept together based on certain guidelines. Normalization results in controlled redundancy of data; therefore, it provides a good balance between disk space usage and performance. Normalization helps people understand the relationships between data, and enforces rules that ensure the data is meaningful.
Five normalization forms exist. If you follow the rules for the first rule of normalization, your database can be described as "in first normal form." This is represented by the symbol 1NF for first normal form, 2NF for second normal form, and so on.
Each rule of normalization depends on the previous rule for successful implementation, so to be in second normal form (2NF), your database must also follow the rules for first normal form.
A typical relational database used in a business environment falls somewhere between second and third normal form. It is rare to progress past the third normal form because fourth and fifth normal form are more academic than practical in real-world environments.
Following is a brief description of the first three rules of normalization.
First Normal Form
The first rule of normalization requires removing repeating data values and specifies that no two rows can be identical in a table. This means that each table must have a logical primary key that uniquely identifies a row in the table.
Consider a table that has four columns, PublisherName, Title1, Title2, and Title3 for storing up to three titles for each publisher. This table is not in first normal form due to the repeating Title columns . The main problem with this design is it limits the number of titles associated with a publisher to three.
Removing the repeating columns so there is just a PublisherName column and a single Title column puts the table in first normal form. A separate data row is stored in the table for each title published by each publisher. The combination of PublisherName and Title becomes the primary key that uniquely identifies each row and prevents duplicates.
Second Normal Form
A table is considered to be in second normal form if it conforms to the first normal form and all nonkey attributes of the table are fully dependent on the entire primary key. If the primary key consists of multiple columns, then nonkey columns should depend on the entire key and not just on a part of the key. A table with a single column as the primary key is automatically in second normal form.
Assume you need to add the Publisher address to the database. Adding it to the table with the PublisherName and Title column would violate second normal form. The primary key consists of both PublisherName and Title, but the PublisherAddress attribute is an attribute of the Publisher only. It does not depend on the entire primary key.
To put the database in second normal form requires adding an additional table for storing publisher information. One table consists of the PublisherName column and PublisherAddress. The second table contains the PublisherName and Title columns. To retrieve the PublisherName, Title, and PublisherAddress information in a single result would require a join between the two tables on the PublisherName column.
Third Normal Form
A table is considered to be in third normal form if it already conforms to the first two normal forms and none of the nonkey columns are dependent on any other nonkey columns. All such attributes should be removed from the table.
Following is an example that comes up often during database architecture. Suppose that an employee table has four columns: EmployeeID (the primary key), salary, bonus, and total_salary, where total_salary = salary + bonus. Existence of the total_salary column in the table violates the third normal form because a nonkey column (total_salary) is dependent on two other nonkey columns (salary and bonus). Therefore, to conform to the third rule of normalization, you must remove the total_salary column from the employee table.
Benefits of Normalization
The following are the major advantages of normalization:
Drawbacks of Normalization
One result of normalization is that data is stored in multiple tables. To retrieve or modify information, you usually have to establish joins across multiple tables. Joins are expensive from an I/O standpoint. Multitable joins can have an adverse impact on the performance of the system. The following sections discuss some of the denormalization techniques that you can use to improve the performance of the system.