Domain Analysis | Designing Relational Database Systems (Dv-Mps Designing)

In Figure 9-6, the attribute listing takes the form Name:Domain. Many analysts ignore the existence of domains and specify the attributes directly in terms of their data types and constraints. So if you ignore this step of the process, you'll be in good company. You won't be correct, but it's unlikely that anyone will fault you for it.

The reason I perform domain analysis in my own work, and recommend that you do so as well, is that it saves work and provides additional information. As far as I'm concerned, anything that's easier and better is a good idea. And this task has the added advantage of being technically correct, as well.

Let's take just one example: the CompanyName and IndividualName attributes in Figure 9-6 all specify that they derive their values from the domain of Name.

We can now define the Name domain as follows:

"A string of one or more words in proper case, with a maximum length of 75 characters. Only characters and the punctuation marks period (.) and comma (,) are allowed."

We have to define the domain only once, and we can reference it any number of times throughout the system. We could have defined these constraints for each applicable attribute, but why bother? Furthermore, because these attributes are defined on the same domain, we know that they can be logically compared. This wouldn't necessarily be clear if we had defined the attributes directly.

Finding the customer with the same value in the CompanyName field as in the IndividualName field might not be the most useful thing to do, but it is at least a possible thing. The same could not be said of comparing company names to customer numbers that might, coincidentally, have the same structure and constraints.

The technical definition of a domain is "the set of values from which an attribute can draw its values." This is conceptually straightforward, but how does one define a domain? Essentially, you need to identify three things:

The data type of the domain

Any restrictions to the range of values accepted by the data type

Optionally, any formatting that pertains to the domain

Choosing a Data Type

The first step in defining a domain is to choose the core data type that will be used in the database schema to represent it. This is one instance where it's practical to break the rule about separating the database schema from the conceptual data model.

The data type serves as a shorthand description of a range of values. While "Integer" is not a domain unless you're modeling mathematics, values of the domain "Quantity" are almost certainly integers. I wouldn't recommend getting too involved with the specifics of database engine types, however. At this point, the choice of database engine is still subject to change.

The "data type" of a domain can also be another domain. You might have already defined a generic Date domain, which specifies, for example, that all dates in the system must be on or after 1 January 1900 and formatted using a four-digit year. It's perfectly acceptable to define the Event Date as "A Date after 23 October 1982 (the date on which trading commenced)."

Restricting the Range of Values

Having identified the base data type for your domain, the next step is to specify the values within that data type's range that are valid for the domain. Sometimes the easiest way to do this is by specifying a rule: "Quantities must be positive whole numbers."

Sometimes it's simpler to list the valid values for a domain. "Region must be one of: Northwest, Northeast, Central, Southern." In this instance, you will almost certainly want to include the domain as an entity in the data model. This is far easier than typing the values everywhere they're referenced, and also allows them to be easily changed after the system has been implemented.

The only possible exception to this rule is when the domain values are few in number and cannot possibly change. Say, for example, that you're modeling a questionnaire or an exam, and you have an Answer domain that consists of the values "True" and "False." There is no point in modeling these two options as an entity. There are no other possible values, and referencing a table during implementation will almost certainly be more trouble than typing in the rule directly.

You will also use an entity to model domains that must be defined using more than one attribute. The best example of this is the domain of State. If you must account for multiple countries, you cannot determine whether a given state value is valid without reference to the country specified.

If a customer is located in Australia, for example, "New South Wales" is a valid state, but "Alabama" is not. In this case, the domain look-up entity would consist of both the Country and State attributes. This example is not strictly a domain definition, and it's modeled using required relationships in the E/R model. It is, however, easy to think of this sort of situation as a kind of composite domain, and treat it as such.

After all, the point here is to simplify the task of identifying the constraints that pertain to the system, and bending the domain definition for domains that appear repeatedly in the data model saves time and reduces the chance of error.

Your domain specification must also indicate whether nulls or zero-length strings, or both, are acceptable values for attributes defined on the domain. It's useful to explicitly declare this in your definition even if you're modeling the range using a system entity, in which case the nullability can be determined by the relationship between the two entities.

Performing the domain analysis and identifying the list of attributes for any given entity are closely related, iterative processes. In actual practice, you'll probably find it most effective to define the domains at the same time that you're listing the attributes. If the domain of an attribute is already defined, you can simply list it. If not, you can define the domain while you have an example in front of you.

During this process, you might find that certain attributes have restrictions in addition to those defined for the domain. This is perfectly proper and not at all unusual. You might have defined an Event Date domain, for example, which represents the date on which any event can occur. This date is restricted to dates after the company began trading. In the Sales Order example, both the Order Date and the Shipping Date would be defined on the Event Date domain. The Shipping Date attribute, however, must also be after the Order Date. This is an entity-level constraint and should be listed as such in the entity description.

In defining domain constraints (and additional attribute constraints, for that matter), you should try to be as specific as possible without compromising usability. We'll discuss this in greater detail in Part 3, but at this point you should be aware that the more precisely you define a domain, the more assistance you can provide users. If you accidentally eliminate values, however, you will get in the users' way and can ultimately make the system unusable.

Defining the Format

It's not strictly necessary, but it's often a good idea to specify the appropriate format for a domain. If you specify once that all dates must be displayed as DD-MMM-YYYY, you need never do it again.