When we left off in Chapter 4, we had defined eight tables (soon to be ten!) that we would need for our waiver exam database. Now our first step is to make sure we have all the items we will need for each table. Remember, these items are the columns of the tables; although we can add columns later, modify columns, and actually drop columns, it is really bad practice to have to make changes. Let's start with the STUDENTS table. Chapter 4 defined just enough data elements so that we could do the data normalization necessary. However, suppose that you show your table design to some of the instructors, as a good database designer would, and their suggestion is to add a couple more items. The instructors want to know what department each student is in or, at a university level, what the student's major is. They believe that they'll be able to get some interesting demographic information once they have an operational database. This is the kind of thing researchers and marketing types are always looking for, and your proposed database is a natural. The instructors also mention that the TEST_HISTORY table should include the place where the test was taken, primarily for the sake of completeness. As if this were not enough, the instructors tell you that they need to store the passing grade for each test. "Ah," you think, "simple." But then they say that sometimes the questions will have different weights. "Enough!" you think, as you take out another pencil to replace the one you just snapped. However, they show you the formula to use, and it becomes clear that all you have to do is add another column to each question on a particular test that will store the weight (with a default value of 1, since most exams are not weighted). As you're getting up to leave, another instructor stops you and asks the simple question: "When your system is done, can I get a list of all the questions by topic? Can I get all the ones for software or hardware or networking on a report? You know, things keep changing in the industry, and I want to make sure that what we ask is accurate and not outdated ." You go back to your office thinking that you now have a couple of important changes to make: You need to be able to give them the report by type of question, and you have to store information on place, passing grade, and weight. The last three are easy, so you sit at your terminal and make some changes, and the tables now look like this:
You have to add the Passing Grade and Weight columns to two tables: TEST_ID and TEST_QUESTIONS_LINK. Fine; that's easy:
What about the type of question? What did the instructor mean when she asked for "a list of all the questions by topic"? Let's see; she wanted a report by type of question ”that is, all the questions on a certain topic. So we need a way to mark each question with its "type" as well as to get a description of that type. Think back to normalization in Chapter 4. We automatically want to put Type and the Type_Desc with each question, right? Well, let's just add two columns: Question_Type and Type_Desc. But wait a minute! Think about the nightmare that will be created when you want to change the description. You will have to go into every row in the QUESTIONS table and make the change! Instead, we do two things: (1) create a small table that has two columns ”Question_Type and Question_Type_Desc ”and then (2) add the Question_Type column to the QUESTIONS table. Anytime you need to report on the types of questions, you will just read the QUESTIONS table and link to the QUESTION_TYPE_DESC table. Because no other changes are necessary to handle question type, grade, location, and weight, all other tables stay as they are:
At the beginning of this chapter I told you that our eight tables would soon be ten. So now that we've added table nine, the QUESTION_TYPE_DESC table, what is the tenth one? If any of you caught the problem, congratulations! It is a simple oversight, and I purposely waited until now to correct it. My thought was that it would make more sense to you at this point, where we've already covered normalization and preliminary table column definitions. What was missed in Chapter 4 was the fact that one person can author more than one test . See the break in normalization? We do not want to repeat the person's name over and over. Instead, we need a small AUTHORS table with just two entries ”an identifier and the author's name :
The significance is that this will become one of the base tables that another table is dependent on. Note By leaving out the AUTHORS table until now, I hope to have taught you to be very aware that table design and data normalization are difficult. In almost every case, these processes mean going over and over the tables and matching them against the real application. And even after you start building your tables, you may suddenly realize that something is missing ”be it a column, table, or link. Now is the time to make the change! |