Let's say you're one of those people whose favorite TV channels are TV Land and Nick at Nite. You comb the Internet for histories of shows such as Gunsmoke and The Twilight Zone, you read biographies of TV actors such as Lucille Ball and Desi Arnaz, and you're curious about when programs such as Bewitched and Happy Days aired. You decide you want to store all this data somehow so you can retrieve and manipulate it quickly and efficiently.
Your first step is to prepare a mission statement, which describes the purpose of the database. You could well be wary of mission statements. In the business world, they are often forgotten and ignored in the tumult of running a day-to-day business or meeting a quarterly earnings target. That's true even when the statements are carefully and thoughtfully prepared.
But for databases, mission statements are useful, even mandatory. They help you determine database objectives, and they provide a framework for data collection. For most databases, the more taxing problem is not a dearth of data, but a glut. The mission statement puts constraints on the data to be collected. More important, it provides a central focus and direction for your work.
Examples of Mission Statements
Mission statements obviously vary by the type of organization. For a law firm, a typical mission statement might be "The mission of the Walker and Goodman database is to maintain the data we use to provide legal services for our clients." For an international charitable organization, it might be "The purpose of the Good Hands database is to safeguard the data we use to alleviate poverty and illness in underdeveloped nations." For a paint business, it might be "The aim of the Rainbow Paint Corporation is to maintain data that helps us provide fine paint products to our customers." Although they're seemingly mundane, these mission statements make it plain that they do not comprehend extraneous objectives, such as tracking the investment portfolios of senior executives (no sly dig intended).
Even a home-grown, nonbusiness database such as one on classic TV shows can benefit from a mission statement. The topic offers many possibilities and can veer in many directions. Is the purpose of the database to record your daily viewing habits and critical remarks? Will it catalogue the DVDs you record or rent? Do you want it to tell you who you were with when you saw Rachel have her baby on Friends, or on which TV set you saw the last episode of M*A*S*H*?
Your Classic TV database could, instead, focus on the programs themselves. What is the program's genre? In other words, is it a rural comedy such as Green Acres, a western such as Maverick, or a variety show such as Ed Sullivan? When did the show air and on what station? Who were the lead actors? What characters did they play?
Writing the Mission Statement
The mission statement should not be used to describe the types of information you want to collect, the tasks you want to accomplish, or the reports you want to create. Instead, it should be a general statement of database purposeand no more than a sentence or two.
Suppose you decide that you're interested in collecting data about the shows themselves (such as who played the character of Rhoda Morgenstern on The Mary Tyler Moore Show), and you are uninterested in any personal experiences associated with them (for example, who you were dating when you saw the last episode of Northern Exposure). Some day you plan to write a best-selling book on the best programs in TV history. Here's a possible mission statement for this database:
After you create your mission statement, you must define mission objectives. These are the tasks that the database should accomplish. In a business organization, they would include statements such as "We need to track fixed asset purchases." In nonprofits, they might include "We need to provide donors with information on their contributions for their tax returns." For a police department, one might be "We need to track the location of all detainees in custody."
A mission objective should not be as specific as "We need to create a report each month that shows the breakdown of fixed assets by asset type, the accumulated depreciation of each item, the method of depreciation for each asset class, and any gains or losses recorded on asset sales." Instead, you want to reduce each objective to a single task and eliminate all unnecessary details.
To develop your mission objectives for the TV database, you consider your own interests as well as the ease with which you can acquire data on various shows. Let's assume that you develop the following mission objectives:
Now let's look at the role of mission objectives in deciding what data you need to assemble.
Two Approaches to Database Design
Two basic approaches to database design exist. To oversimplify, one line of attack is to use the mission objectives you've developed to determine which data to collect. The data can then be assigned to fields, and the fields can be organized into tables. (A little later, I'll suggest why you should first determine the fields and then decide which tables will hold them.) This is a top-down approach.
Under the alternative approach, you still prepare mission objectives. But first you determine the data that you want to collect and the fields that will hold the data. You organize the fields into tables and compare the tables and their contents against your mission objectives. You then modify the fields accordingly. This is a bottom-up approach.
Top-down might seem much more logical than bottom-up: How can you decide what data you need if you don't know why you're collecting it? But as a practical matter, you might find it more efficient to determine all the data that currently is being used and that might be needed in the future, and then make sure that this data satisfies all your objectives.
Neither the top-down nor the bottom-up approach is cast in stone. Many database professionals develop their own methods that combine aspects of both approaches.
One key element to consider is cost. Each piece of data collected has a price tag, even if it's very low, and the cost of data varies. You might decide that the added benefit of some data simply isn't worth the cost of collecting it. Similarly, easy availability and low cost might make it attractive to collect and use data that you might otherwise forgo.
In the case of the Classic TV database, let's use a top-down approach to make a preliminary list of the data you might want to include (see Table 2.1).
Begin to Define Columns
As you look over your data needs, the number and nature of the fields begins to take shape.
For example, consider each item in the first two mission objectives "I need to know the show's broadcast history" and "I need to know the main story line of each show." Here is an imaginary thought process you might go through:
Remember, this is just a preliminary list of possible columns. You'll continue to refine the fields as you create the database.
Create a Preliminary List of Columns
After you evaluate the columns required for each of the other three mission objectives, you come up with a preliminary list of columns(see Table 2.2) and column descriptions. You have checked to make sure that the data is available to you at minimal cost from your own knowledge, the Internet, library resources, and other available sources.
The One Big Table Solution
With your initial list of columns drawn up, you're now faced with the decision of how to organize them into one or more grids. Your first instinct is to put them all in one big table, commonly known as a flat file. That way, all your data will be in one place whenever you need it. After all, there doesn't seem to be that many columns.
But as you set about adding the first few rows of data to your table, you realize that the one-table-fits-all strategy isn't going to work.
Most of the programs in your table were likely aired on one of four networks: CBS, NBC, ABC, or FOX. Each time you add a new program, you find you're entering the same network dataofficial name, popular name, and historyagain and again (see Table 2.3). You're wasting lots of time making the same keystrokes. Moreover, what will happen when you want to edit a single value in one of the network columns? You have many duplicate entries, so you'll have to find and edit each occurrence of the same value to make sure your data is current. That might be easy to do with a column that has values with few characters, such as the network's popular name. It's a lot harder to do for the Network Notes field, which might have many sentences and thousands of characters.
As you begin entering data for actors and their roles, you also find some rows require many more columns than others. Take The Odd Couple, a show about two middle-aged guys, one a slob and the other a fussbudget, living in a New York apartment. It has two lead actors, Jack Klugman as Oscar Madison, the sportswriter; and Tony Randall as Felix Unger, the photographer. Because you want to include columns for the actor's name, gender, and biography, as well as his character name and occupation, eight columns are required (4 columns x 2 actors) for all the actor and character values.
Eight columns is not an overwhelming number. But suppose you decide to include the program Friends as well. That means you'll have to enter data for six lead actors. Four columns are needed for each actor and character, so you'll need 24 columns for all the data. In the list of fields in Table 2.2, you created columns for just three actors, at most. Where can you find room for three other actors?
For some shows, you might pick the top actors and leave it at that. But what true Friends fan could possibly decide to leave out even one of the six-member cast? And the problem could get much worse. What are you going to do if you want to enter a record for Eight Is Enough? What if they decide to make a TV series of the book Cheaper By the Dozen?
Let's try an alternative. Table 2.4 uses two rows to record the values for shows with several lead actors, such as Friends. This attempt has the advantage of tables with fewer columns. But this solution won't work, either; you're compounding the problem of duplicate data. Because you have two rows for one program, you'll have to enter the numerous other fields (Location, Network Official Name, Network Name, Network Notes) twice.
Table 2.5 shows another option. You put two actors in the first two columns and then stuff the remaining four into the third Actors column. This scheme is perhaps the worse. First, it misrepresents the data because each actor in Friends is (in our opinion) equally important. Second, as you'll see a little later, cramming several values of the same type in a single column is a bad idea because it makes it difficult to retrieve, sort, edit, and delete individual values.