Normalizing Tables

The process of normalization is the division of entities in an attempt to achieve the most efficient use of data storage. At times, denormalization is planned redundancy that is subsequently performed to improve response time and use of those same resources. Designing the logical structure of a database is an attempt to provide a degree of normalization, combined with aspects of denormalization to produce optimum storage efficiency yet still ensure acceptable levels of performance and resource utilization.

There are three basic rules, known as normal forms (discussed in the following sections), to normalizing a database structure. Using normal forms gives you these basic advantages:

  • No data redundancy contributing to data integrity

  • Index columns for faster sorting and searching

  • Smaller entities that reduce table locking and data contention

  • Query optimization

First Normal Form

The first normal form (1NF) states that an entity attribute shouldn't have more than one definable piece of data or repeating groups. Therefore, a full name should never be used. For example, a field called Customer Name could be divided into First Name and Last Name and would, therefore, break the first normal form rule. The first name is a piece of data that is independent from the last name, so it should be a separate attribute.

Second Normal Form

The second normal form (2NF) states that a non-key attribute of an entity must depend on the entire primary key, not just a portion of the key. For example, if the primary key of an Orders entity contained two fields, Customer ID and Product ID, the attribute field Product Description wouldn't belong because it has no connection to the customer IDonly to the product ID. Including a Product Description field would break the second normal form rule.

Third Normal Form

The third normal form (3NF) states that a non-key field must not depend on another non-key field. The most obvious example of this rule is in address information. The ZIP Code is dependent on the address area; the city is dependent on the address itself. A large corporation or government agency might choose to have ZIP Code information stored in a separate table, not within the base data to a perfectly normalized form. Although in most table designs this situation is denormalized, in a pure normal form a separate entity would be used to provide additional address information, such as city and ZIP Code based on the address.

Fourth and Fifth Normal Forms

Two other normal forms do exist but aren't commonly implemented. It's entirely possible that by adhering to a third normal form, you might actually accomplish the fourth and fifth forms. Fourth normal form dictates that a third normal form have no multi-valued dependencies. In other words, every value of an attribute must appear in at least one row with every other value of the other attribute. Fifth normal form is intended to eliminate joint dependency constraints. This is a theoretical consideration thought to have no practical value. If you disregard these forms, the design of the database might be less than perfect, but it should have no loss of functionality.

Denormalization

After you have a logical design completely normalized, rarely do you keep it in that state as you proceed to the physical design of the actual database. Although normalization gives you a great deal of storage efficiency and might result in increased performance in some situations, there are a number of drawbacks to a completely normalized database. You should consider the tradeoffs between storage efficiency, performance, and maintainability in your final design.

If you go too far with the normalization process, you might actually reverse the effect you're trying to achieve. Although normalization reduces data redundancy, results in smaller tables with fewer rows, and provides a logical and consistent form, it also requires joins for the implementation and does not allow for summary, duplicate, or other data that a user might expect to find in a single table. Normalizing a database design too far can decrease performance and make it difficult to alter the underlying table structure and might make it harder to work with the data.

Denormalization can occur at any number of levels. At the absolute extreme, a database schema under SQL 7 can be completely duplicated to a number of servers across the network by implementing SQL replication. This replication could be warranted if you need to distribute data access across slow network links or to multiple remote locations. Many advantages are gained through database replication because the data is more easily available at the locations where it will be used. The drawback is increased maintenance of a number of servers. Also, if database replication isn't configured properly, it could monopolize a wide area network (WAN). In addition, if there are network problems or a poor setup, the data might not be synchronized to a level that keeps it up to date. Data can be maintained as an exact duplication against a number of servers, but this process would require a high-speed network and the configuration of a two-phase commit.

Another, simpler example of planned denormalization is maintaining complete address information for customers, suppliers, employees, and so on in the tables with the rest of their general information. This setup is what most users expect, and maintaining a separate address table is difficult. There are no defined rules for denormalization, but some definite guidelines will help you understand what level might be appropriate in a given situation.

Data warehousing schemas often use a denormalized approach referred to as a star or snowflake schema. This schema structure takes advantage of typical decision support queries by using one central "fact" table for the subject area and many dimension tables containing denormalized descriptions of the facts.

Additionally, if a join requires implementing more than three tables, denormalization should be considered. When the number of columns in a table can grow very large, a denormalized structure splits the table into more easily handled portions and uses a one-to-one relationship to connect the information.

The completed structure will have to be modified over time as the live use of the database warrants. Never consider a database design to be perfect or complete. It often takes several years over the lifetime of actual use to determine the best levels of normalization and denormalization to use.



Analyzing Requirements and Defining. Net Solution Architectures (Exam 70-300)
MCSD Self-Paced Training Kit: Analyzing Requirements and Defining Microsoft .NET Solution Architectures, Exam 70-300: Analyzing Requirements and ... Exam 70-300 (Pro-Certification)
ISBN: 0735618941
EAN: 2147483647
Year: 2006
Pages: 175

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net