When Not to Normalize | MySQL Database Design and Tuning

< Day Day Up >

Although relational database design theory is not part of this book's scope, it's worth taking a few moments to explore a particular scenario in which you might be justified in disobeying the normally inviolate three-normal-form rules.

Typically, it's unwise to store calculated information (for example, averages, sums, minimums, maximums) in your tables. Instead, users will receive more accurate information (and you will be in compliance with solid database design theory) if you compute the requested results at runtime, when the query is executed. However, if you are faced with a very large set of data that doesn't change very often, you might want to consider performing your calculations in bulk during off-hours, and then storing the results.

For example, let's look at how Mad-Hatter Club members interact with one MySQL-based solution. Customers might elect to either call a toll-free phone number or log in to High-Hat's website to check their mileage balance. However, this can cause enormous processing backlogs given the cost of traversing the billions of detailed transactions stored in the database multiplied by the thousands of queries being run at the same time. In cases like this, it is prudent to consider an alternative method to providing customers with this information.

You could use the MERGE table option to partition and compress data, but there is a simpler solution: Instead of wading through mounds of data in real time, why not add a column to the customer master table and store the mileage total in that convenient location? You could then run a daily batch operation during off-hours to update the totals.

Another trap that overzealous database designers fall into is the practice of creating lookup tables for columns that have fewer than five different values stored in small columns. This commonly happens for columns that store information from small potential pools, such as eye color, gender, seat preference (for example, window or aisle), and so on.

For example, suppose that the reservations table tracks the customer's preference for seating, as follows:

 CREATE TABLE reservations (     reservation_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, ...     seat_pref_id INT NOT NULL, ... );

The next table holds a grand total of two rows, one for 'Window' and one for 'Aisle':

 CREATE TABLE seat_pref_lookup (     seat_pref_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,     seat_pref_description CHAR(40) );

To get a meaningful seating preference value when viewing data from the reservations table, you must join the two tables together. These kinds of joins can add up and add unnecessary complexity to your database. For situations like this, it is better to simply store the actual seating preference value in the main reservations table, even though this is technically a violation of good database design theory.

To enforce the data integrity of the reservations table, you could then use the ENUM restriction when creating the table:

 CREATE TABLE reservations (     reservation_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, ...     seat_pref_description ENUM ('Window', 'Aisle') ... );

Chapter 5 reviews ENUM along with all of MySQL's other constraints.

< Day Day Up >