Back to the example. At this point, you have determined that a film database will make your job easier and might even help preserve your sanity. You create a table with columns for movie title, tag line, release date, and the rest of the required data. You enter your movie list into the table, one row at a time, and are careful to put the correct data in each column.
Next, you instruct the database application to sort the list by movie title. The list is sorted in a second or less, and you print it out. Impressed, you try additional sortsby rating and by budgeted amount. The results of these sorts are shown in Figures 5.5, 5.6, and 5.7.
Figure 5.5. Data entered once can be sorted any way you want.
Figure 5.6. Data sorted by rating.
Figure 5.7. Data sorted by budgeted amount.
You now have two or more lists, but you had to enter the information only once; because you were careful to break the records into multiple columns, you can sort or search the list in any way necessary. You just need to reprint the lists whenever your records are added, edited, or deleted. And the new or changed data is automatically sorted for you.
A Database Primer
You have just seen a practical use for a database. The movie list is a simple database that involves a single table and a small set of columns. Most well-designed database applications require many tables and ways to link them. You'll revisit the movie list when we discuss relational databases.
Your first table was a hit. You have been able to accommodate any list request, sorted any way anyone could need. But just as you are beginning to wonder what you're going to do with all your newfound spare time, your boss informs you that he'll need reports sorted by the director name.
"No problem," you say. You open your database application and modify your table. You add two new columns, one for the director's first name and one for the last name. Now, every movie record can contain the name of the director, and you even create a report of all movies including director information. Once again, you and your database have saved the day, and all is wellor so you think.
Just when things are looking good, you get a memo asking you to include movie expenses in your database so as to be able to run reports containing this information.
You think for a few moments and come up with two solutions to this new problem. The first solution is simply to add lots more columns to the table, three for each expenses item (date, description, and amount).
But you realize this isn't a long-term solution at all. How many expenses should you allow space for? Every movie can, and likely will, have a different set of expenses, and you have no way of knowing how many you should accommodate for. Inevitably, whatever number you pick won't be enough at some point. In addition, adding all these extra columns, which won't be used by most records, is a tremendous waste of disk space. Furthermore, data manipulation becomes extremely complicated if data is stored in more than one column. If you need to search for specific expenses, you'd have to search multiple columns. This situation greatly increases the chance of incorrect results. It also makes sorting data impossible because databases sort data one column at a time, and you have data that must be sorted together spread over multiple columns.
An important rule in database design is that if columns are seldom used by most rows, they probably don't belong in the table.
Your second solution is to create additional rows in the table, one for each expense for each movie. With this solution, you can add as many expenses as necessary without creating extra columns.
This solution, though, isn't workable. Although it does indeed solve the problem of handling more than a predetermined number of expenses, doing so introduces a far greater problem. Adding additional rows requires repeating the basic movie informationthings such as title and tag lineover and over, for each new row.
Not only does reentering this information waste storage space, it also greatly increases the likelihood of your being faced with conflicting data. If a movie title changes, for example, you must be sure to change every row that contains that movie's data. Failing to update all rows would result in queries and searches returning conflicting results. If you do a search for a movie and find two rows, each of which has different ratings, how would you know which is correct?
This problem probably isn't too serious if the conflicting data is the spelling of a name, but imagine that the data is customer-billing information. If you reenter a customer's address with each order and then the customer moves, you could end up shipping orders to an incorrect address.
You should avoid maintaining multiple live copies of the same data whenever possible.
Another important rule in database design is that data should never be repeated unnecessarily. As you multiply the number of copies you have of the same data, the chance of data-entry errors also multiplies.
One point worth mentioning here is that the "never duplicate data" rule does not apply to backups of your data. Backing up data is incredibly important, and you can never have too many backup plans. The rule of never duplicating data applies only to live datadata to be used in a production environment on an ongoing basis.
And while you are thinking about it, you realize that even your earlier solution for including director names is dangerous. After all, what if a movie has two directors? You've allocated room for only one name.