Defining Tables and Columns | MCSD Self-Paced Training Kit: Analyzing Requirements and Defining Microsoft .NET Solution Architectures, Exam 70-300: Analyzing Requirements and ... Exam 70-300 (Pro-Certification)

Entities in a logical data model correlate to tables in the physical data structure or database and are the main objects in an ER model. They act as a container for the attributes and participate in dependency relationships. Entities represent the things, places, people, and concepts involved in a real-world situation. An easy way to identify candidates for entities is to look for the nouns in a case scenario. For example, look at this excerpt from a medical case study: "The pharmacy is available to all doctors, and medicine can be bought by any of the patients." From this excerpt, you can determine entity candidates by looking at the nouns: Pharmacy, Doctors, Medicine, and Patients.

Four steps need to be undertaken when determining entities. Observe these steps carefully because they are related to this part of the Microsoft exam objective.

Isolate key nouns in the case study (Patient, Doctor, Prescription, HMO, Drug, Pharmacist, Store, and so forth).
Identify which nouns are needed as entities.
Draw the entities.
Write a brief description for each.

Now begin the model for Billington Pharmaceuticals. Entities can easily be recognized for elements that pertain to people in this case study: Patient, Doctor, HMO, Pharmacist, and Help Desk Operator. The case study stated that the scope doesn't need to include employees (Pharmacist and Help Desk Operator), so that leaves only three "personages."

From there, look at the physical things that can be pulled from the case: Prescriptions, Drugs, and Products. Other not-so-physical elements are represented by activity nouns in the Calls and Orders elements. There is one other associative entity that might be more difficult to discern. Often the associative, connecting entities are the most difficult to see initially. The Orders entity will require an Order Items entity to connect it to the Products entity. Many products can be ordered, and each entry in the Order Items entity will point to one Product entity. There will be a couple more entities in the finished product, but they are discussed later in the sections on relationships ("Defining Relationships") and normalization ("Normalizing Tables").

Now is when the model starts to take shape and the project documentation is even more important. While you have an entity list, supply a simple description for each entity. You might want to include some of the process and relationship information, but that will come as we polish and complete the model. Any drawing tool will do to help lay out the initial model. You will find, however, that the more full-featured the tool, the better the finished documentation. For example, Microsoft Visio was used to create Figure 8.2.

Figure 8.2. Using Visio to create a beginning diagram of entities.

Designing a firm and efficient structure at this stage does not require you to know how the project will be implemented; rather, it depends on how well you can comprehend a scenario and its elements. The next section takes a look at composing entities and defining their attributes.

Defining Attributes/Columns

After recognizing each entity, we should identify a complete list of attributes, usually referred to as fields or columns in the physical model. Attributes are characteristics or properties defined within a single entity and correspond to real-world properties. Initially, each table should contain only the elements specific to that table and a key element to ensure that each record in the table can be uniquely identified.

A major focus of the exam will be placing attributes into the most appropriate entities. Practice in designing entities and selecting attributes for a variety of projects will help you prepare for this facet of the exam.

Attribute Makeup

Next, we should look at the attribute makeupthat is, exactly what makes a single attribute? Part of this process involves normalization and/or denormalization of data, which is discussed later in this chapter in "Normalizing Tables." To break attributes down, more commonly known as "decomposing" attributes, you take an attribute from its most basic form and divide it into its components. A good example is breaking down the Address attribute into the following attributes:

 Street  :  Contains the street address of the user. City    :  Where the user lives. State   :  The state the user lives in. PostalCode     :  The user's postal code.

As you can see in the preceding example, the general Address attribute could be broken down into smaller attributes. Decomposing an attribute aids in developing a normalized database structure and is also a function of usage; if a person's name is needed only in its full form, for example, you might decide not to break it down into separate attributes. Attribute decomposition also improves data integrity. When attributes are decomposed, different methods of ensuring data integrity can be applied to the decomposed segments instead of to the attribute as a whole. In most instances, addresses would be broken down, but if you don't need to handle each element of the address separately, you can keep them together. This is likely the case with our HMO entity. Because there is no need for the separate elements of the HMO address, it makes sense just to keep it together.

This chapter doesn't spend a lot of time on this concept, as very little of the exam is based on attribute definition. The information is provided here to more completely define the data modeling process.

A "Key" Attribute

In most cases, an attribute that acts as a key is identified for each entity. This attribute could be a customer number, a product number, or simply an attribute that serves no other purpose but to act as a key identifier. When an attribute is added solely for the sake of being an identifier, it is known as a surrogate key. Whether you use a true or surrogate key, it represents an important part in establishing relationships.

Entity key definition is an important part of designing the structure of an ER model. Two types of keys exist: primary and foreign. These keys are used in establishing relationships and provide an easier and more efficient way to retrieve data in the physical implementation. After all attributes have been identified, the design moves into a stage referred to as "normalization."

Defining Primary and Foreign Keys

A primary key is not necessarily required for all tables; however, providing one does offer a number of benefits and should be considered for every table. When defining a primary key, you should keep a couple of factors in mind. The primary key normally defines uniqueness in a table, in that every record of a table has its own unique primary key. The primary key is used in developing relationships and indexes.

The foreign key uses the primary key as a reference. For each foreign key entry, there must be a matching primary key in the referred table. This constraint, better known as referential integrity, prevents phantom entries and serves as a checking mechanism.

The primary key should be as small as possible. It's possible to create a compound primary key that uses multiple attributes or a key that contains a large number of bytes, but in the physical design, this increases the overhead and response time of data retrieval.

Primary Keys

Primary keys consist of sets of attributes whose values uniquely identify the rows in an entity. Primary keys give an ID to each row, meaning that rows can easily be located by this identifier.

Choosing a primary key for a table is relatively simple. A primary key should be a numeric value if possible, but this is not a hard-and-fast rule. In many cases, surrogate keys are actually a preferred mechanism, as the values never change and there is little reason to want a change. Sill, others dislike using surrogates because it adds a column to the data structure that has no particular correlation to the data. You could consider patients' social security numbers (SSNs) unique and use them as a primary key. However, because the case study has an international component and not all patients have an SSN, it would make more sense to use a surrogate key and add the SSN as a separate attribute.

Historically, Microsoft exams prefer surrogate keys over all other key types, as there is little, if ever, a reason to change the key value always maintaining the same unique identifier.

Primary and foreign keys are an important aspect of setting up relationships. Where a primary key sets up a unique value for a row in the entity, a foreign key is used as a mechanism for relating one entity's records to another entity.

Foreign Keys

A foreign key is used to tie or relate one entity to the primary key of another entity for the purpose of creating a dependency. An attribute or a combination of attributes can act as a foreign key. A foreign key doesn't have to be unique. In fact, foreign keys are often in a many-to-one relationship with a primary key in another entity. Foreign key values should be copies of the primary key values. No value in the foreign key except NULL should ever exist unless the same value exists in the primary key. A foreign key works with the primary key to enforce referential integrity and allow for cascading operations.