Understanding Denormalization | Mastering MySQL 4

Denormalization is the process of reversing the transformations made during normalization for performance reasons. It's a topic that stirs controversy among database experts; there are those who claim the costs are too high and never denormalize, and there are those who tout its benefits and routinely denormalize.

For proponents of normalization, the thinking is as follows: Normalization creates more tables as you proceed toward the higher normal forms, but more tables mean there are more joins to be made when data is retrieved, which in turn slows down your queries. For that reason, to improve the speed of certain queries, you can override the advantages to data integrity and return the data structure to a lower normal form.

I suggest a practical approach, taking into account the limitations of SQL, and MySQL in particular, but being cautious not to needlessly denormalize. The following tips will help you in your decision:

If your performance with a normalized structure is acceptable, you should not denormalize.
If your performance is unacceptable, make sure denormalizing will cause it to become acceptable. Also investigate alternatives, such as better hardware, which may avoid the need to denormalize. It's hard to undo structural changes later.
Be sure you are willing to trade the decreased data integrity for the increase in performance.
Consider possible future scenarios, where applications may place different requirements on the data. Denormalizing to enhance performance of a specific application makes your data structure dependent on that application, when in an ideal situation it will be application independent.

Table 8.36 introduces a common structure where it may not be in your best interests to denormalize. Can you tell which normal form the table is in?

Table 8.36: Customer Table
Customer Table
ID First name Surname Address line 1 Address line 2 Town ZIP code

Table 8.36 must be in 1st normal form because it has a primary key and there are no repeating groups. It must be in 2nd normal form because there's only one key, so there cannot be any partial dependencies. And 3rd normal form? Are there any transitive dependencies? It looks like it. ZIP code is probably determined by the town attribute. To make it into 3rd normal form, you should take out ZIP code, putting it in a separate table with town as the key. In most cases, I would suggest not doing this though. Although this table is not really in 3rd normal form, separating this table is not worth the trouble. The more tables you have, the more joins you need to do, which slows the system down. The reason you normalize at all is to reduce the size of tables by removing redundant data (which can often speed up the system). But you also need to look at how your tables are used. Town and ZIP code would almost always be returned together, as part of the address. In most cases, the small amount of space you save by removing the duplicate town/ZIP code combinations would not offset the slowing down of the system because of the extra joins. In some situations, this may be useful, perhaps where you need to sort addresses according to ZIP codes or towns for thousands of customers, and the distribution of the data means that a query to the new, smaller table can return the results substantially quicker. In the end, experienced database designers can go beyond rigidly following the steps, as they understand how the data will be used. And that is something only experience can teach you. Normalization is just a helpful set of steps that most often produces an efficient table structure and not a rule for database design.

Tip

I've seen some scary database designs out there, almost always because of not normalizing rather than too much normalization. So if you're unsure, normalize!