The Discovery ProcessFinding the Data | Absolute Beginners Guide to Microsoft Office Access 2003

At this stage of the process, it isn't important to know all that much about the database program you'll be using, which in this case is Access. Concentrate on the data you'll be storing. There's simply no substitute for knowing your data. Fortunately, when you build a database for yourself, you probably have a pretty good idea of the things you want to keep track of.

Creating a Paper Trail

Get ready to learn the job your database will be expected to perform, step by step. When you're done, you should be on very intimate terms with the data and everything you do with that data.

You can't build a database without knowing what you want to store in the database. That might seem obvious, but if you're used to just jotting down things on scraps of paper or tossing them all in an Excel spreadsheet, you might not have thought about keeping just one type of information separate. To build the gardening database, we need to define just what it is about gardening we want to keep track of.

In preparation for creating the gardening database we'll design in this chapter, you could gather your favorite gardening books, magazines, and seed catalogs and have them ready for inspiration and actual data. Or perhaps you already have a list of catalogs, such as the one in Figure 4.1. Now, get ready to design!

Figure 4.1. Not a databasebut it's a start!

graphics/04fig01.jpg

The First Data List

Whether you're working from printed forms, personal notes, or gardening books, this is the time to list the actual data the database will store and process. Don't worry about the order, and don't try to categorize the itemssimply create one long list. Let's do that now with our gardening database:

Common name
Latin name
Decorative , edible, or medicinal
General notesproblems, future plans, and so on
Picture of the plant
Names and addresses of favorite seed catalogs

Don't limit the list or your expectations. It doesn't matter how short or long the list is. The point is to just get started. Maybe you already have a paper gardening journal, such as the one in Figure 4.2, or a list of plants to start from. Great! That will make the process much easier because so much of the discovery work is already done. However, don't make the mistake of just listing the things you're already tracking in written formyou could omit important data. This is your chance to take a fresh look at the data and decide what's really important. For example, digital camera pictures, such as the one in Figure 4.3, can also be in your database.

Figure 4.2. Your existing gardening journal can serve as a source of inspiration for a database.

graphics/04fig02.gif

Figure 4.3. Pictures can also be stored in an Access database.

graphics/04fig03.jpg

Identifying Tables

Resist the urge to throw all your data into one big table and hope for the best. That isn't the way a relational database works. A table is a collection of related data stored in rows and columns . In a relational database, the data is stored in multiple, but related, tables.

Each table stores data for an entity , a conceptual collection of one type of data. In other words, for each entity type, you'll need one table. (It's actually a bit more complicated, but for now, that description satisfies our purposes.)

We said that a table consists of columns and rows. In database terms, the column houses a field , the smallest unit of data in the database. Each field in the table defines the table's entity. Together, the fields create a record, and one record stores the data for one entity.

That's all clear as mud, right? Don't worry, it'll make more sense as we continue. Let's apply all these new terms to our gardening database by identifying the entities from our earlier list.

Tip

Don't assume that your tables will mirror your existing paper records exactly. You'll see as we proceed that the information in Figure 4.2 gets split up among several tables.

Initially, we seem to have two entitiesindividual plant information and catalog data. That means we'll need at least two representative lists:

Plants	Catalogs
Common Name	Name
Latin Name	Address
Type	Specialty
Notes
Picture

Remember, we're not actually working with Access yet. We're just making lists of potential tables and fields. You'll learn how to actually create an Access table in Chapter 5, "Building Your First Tables."

Congratulations! You just took your first step toward organizing your data. An entire field of study is devoted to figuring out how to split data into tables that make senseit's called normalization . But you don't have to worry about that. You've taken the first step based on common sense, and as we continue, you'll see that the rest of database design is largely common sense as well.

Rather than deal with technical definitions for database design, here are the common-sense rules you can use when designing your own databases:

Rule One Split up your data into the smallest pieces that make sense.
Rule Two Don't try to store two things in one place.
Rule Three Make sure you can tell things apart.

Caution

Don't get too attached to your original design. As you think more deeply about your data, you should be prepared to rearrange it for the best results.

Applying Rule One

Rule one says that each unit of data (field) should be as small as possible. Right now, we've followed that rule in designing our potential table of plants. We can't say the same for the catalog table, however. Storing the entire address in one field is a bad idea and breaks the first rule.

As you know, an address has several components : a street address, a city, a state, a ZIP code or regional code, and sometimes even the country. That means the address field could contain up to five (or even more) pieces of data.

You're probably wondering why you should bother to split this up. It's very difficult to work with multiple pieces of data stored in one field. For instance, let's suppose you want information on a catalog, but you can't remember the catalog's name. All you can remember is that the company is based in Iowa. If Iowa is stuck in the middle of a field between Sioux City and USA, you're going to have a hard time finding that record. On the other hand, if you store Iowa in its own field, all you have to do is search that field for Iowa entries. To that end, let's rethink the catalog table:

Catalogs
Name
Street Address
City
State
ZIP
Country
Specialty

Applying Rule Two

Rule two says you can't store more than one piece of data in a field. The type field identifies the plant as decorative, edible, or medicinal. But what do you do when an overlap exists? Some plants could be both decorative and medicinal, for example, but you can't enter more than one type in the same field. To resolve the conflict, you can create a new table to hold the type information. After doing so, we have two tables (not including the catalogs table):

Plants	Types
Common Name	Type
Latin Name
Notes
Picture

Later, you'll see how you can link these two tables together to tell which type applies to which plant. Right now, you should still concentrate on splitting things up.

Applying Rule Three

You'll be storing information on corn, beets, petunias, and who knows what else. How do you tell one plant from another? The answer is that each plant has some unique information about it. In database terms, this unique information is referred to as a primary key . The third rule says that each table must contain a primary key. Simply stated, a primary key field contains a value that uniquely identifies each record and can't be null (blank or otherwise unknown). The following are the two ways to choose a table's primary key:

You can rely on the stored information to uniquely identify each record One or more fields in the data might already be unique for each record. Developers refer to this type of primary key as a natural key .
You can insert an AutoNumber data type field When using this data type, Access automatically enters a consecutive value when you insert a record. For instance, the first record's primary key value would be 1, the second record's value would be 2, the third record's value would be 3, and so on. You might think of this as an artificial primary key , in contrast to a natural key.

Figure 4.4 shows the difference between the two types of keys schematically.

Figure 4.4. Natural and artificial keys.

graphics/04fig04.gif

Note

As you become more familiar with Access, you'll find that developers disagree on whether to use artificial or natural primary keys. We'll use both, depending on which works better for a particular table.

At this point, we really need to define just what we mean when we say a primary key uniquely identifies each record: That means the primary key field can't contain duplicate entries. Suppose, for example, that you chose the type of plant as the primary key for the plants table. What happens when you get two plants with the same type? If you'd made type the primary key, you couldn't put them both in the same table. Clearly, that's not a good choice of primary key.

The goal of choosing a primary key is to find data that's different for every record that you might ever store in the table. You have three basic choices:

Use an autonumber as an artificial key.
Choose a single field from the data.
Choose more than one field from the data.

That third choice might require a bit more explanation. Suppose you were truly determined that type should be the primary key of the plant table. How could you make this unique for every record? Well, you might make the primary key consist of the type and something elsesay, the type and the common name of the plant. Perhaps that combination is unique, even though types and common names individually could be repeated. If you need to, you can select any number of fields together to make up the primary key.

Fortunately, you don't need to get that fancy to find a good primary key for the plant table. The plant table's most likely key candidate is the Latin name field because each Latin name should be unique. What about the catalog table? It's unlikely that we'll ever run into two or more catalogs with the same name, so we'll use the name of the catalog as our primary key for that catalog table.

The type table has just one field. When you run into a table with only one field, you almost always have what's known as a lookup table: a table that is used to look up values for data linked to another table. In this case, the type table is a lookup table used by the plant table. You could leave the table as is (making the single field the primary key) or add an AutoNumber field. Let's do the latter for the experience, and we'll make the AutoNumber field the primary key. Our current table lists are as follows , with each table containing several fields (we've marked the primary key with the letters PK):

Plants Common Name, Latin Name (PK), Notes, Picture
Catalogs Name (PK), Street Address, City, State, ZIP, Country, Specialty
Types Type (PK), Description

At this point, you might be wondering how the plant table knows which type goes with each plant or how you know which catalog has your favorite carrot seeds . That's where relationships come into play. A relationship is simply an association between two tables. You might express a relationship in words by saying, "Each catalog has information about many plants," or "Every plant has a type." (You'll learn more about relationships in Chapter 6, "Tapping the Power of Relationships.")

Related tables require two keysthe primary key and what's known as a foreign key. A foreign key is simply another table's primary key inserted in a related table. Figure 4.5 shows this schematically.

Figure 4.5. Primary and foreign keys.

graphics/04fig05.gif

We can best illustrate the way related tables work by adding the appropriate foreign keys to our gardening tables.

If you recall, we moved the type field from the plant table to a new table. Now, we need to relate both the plant table and the type table. To do so, we must include the primary key field from the type table in the plant table as a foreign key:

Plants Common Name, Latin Name (PK), Notes, Picture, TypeID (FK)
Types Type (PK), Description

Now, let's do the same for the plant and catalog tables, so we can quickly view the catalogs that offer our favorite seeds. In this case, we should add the primary key from the catalog tablethe name fieldto the plant table as a foreign key:

Plants Common Name, Latin Name (PK), Notes, Picture, TypeID (FK), Catalog Name (FK)
Catalogs Name (PK), Street Address, City, State, ZIP, Country, Specialty

Did you notice that both foreign key fields don't use their corresponding primary key fields' names? That's all rightthe field names don't have to match. You can call the field "Name" in the Catalogs table and "CatalogName" in the Plants table; Access will know how to keep them linked.

At this point, our final table lists are

Plants Common Name, Latin Name (PK), Notes, Picture, TypeID (FK), Catalog Name (FK)
Types Type (PK), Description
Catalogs Name (PK), Street Address, City, State, ZIP, Country, Specialty

Tip

If you need help dividing the data, you're not alone. This task can be perplexing to even the experts. That's why Access includes the Table Analyzera utility that helps you correctly divide your data. We'll review this tool in Chapter 6.