Data Modeling Conventions

Team-Fly    

 
Requirements Analysis: From Business Views to Architecture
By David C. Hay
Table of Contents
Chapter 3.  Column One: Data

Data Modeling Conventions

Data modeling is a relatively new field, and standards have not yet fully been laid out. There are actually three levels of conventions to be defined in the data modeling arena:

  • The first is syntactic , about the symbols to be used. Barker's technique, the UML, and IDEF1X are all examples of syntactic conventions.

  • Positional conventions dictate how entity types are laid out. These guide the shape of the model. Often they are not followed, resulting in models that are chaotic , confusing, and very difficult to read.

  • Semantic conventions describe standard ways for representing common business situations. Semantic conventions are relatively new in the industry. They were first described in 1995 book by David Hay, Data Model Patterns: Conventions of Thought , and then in the 1997 book, Analysis Patterns , by Martin Fowler. Other books on the subject have been published since then.

These three sets of conventions are, in principle, completely independent of each other. Given any of the syntactic conventions described here, you can follow any of the available positional or semantic conventions. In practice, however, promoters of each syntactic convention typically also promote at least particular positional conventions.

SyntacticSymbols

Different sets of syntactic conventions are presented in Appendix B of this book. The appendix is organized in terms of the appropriate audience for each. The basic elements are the same across all the techniques, but each approach is concerned with different kinds of details, as was deemed appropriate for its particular audience.

All notations have symbols for entity types (or object classes), relationships (or associations), and attributes. The Barker notation has an additional symbol to represent the case where an entity type may be related to one or another different entity types, but not both. IDEF1X has special symbols to represent foreign key implementations . The UML has room for describing the behavior of entity types, and the ability to describe business rules between relationships.

PositionalThe Crow's Foot Rule

The Barker technique imposes an additional constraint upon the drawing, over and above determining what symbols to use to represent different things: Entity types on a drawing are to be arranged so that the crow's feet in the relationships point either to the top of the diagram or to its left. In addition, boxes are stretched so that all relationship lines are straight, without "elbows".

This has the effect of providing some sort of "shape" to the drawing. A drawing with a random assortment of boxes and lines going every which way is very difficult to read. Figure 3.24, for example, shows a data model organized randomly . What is the model about? Hard to say, really. People seem to be visiting someone. A block party?

Figure 3.24. A Random Model.

graphics/03fig24.gif

Following the positional convention has the effect of placing the reference entity types (those that don't depend on any others) in the lower right, and the intersect or transaction entity types in the upper left. This makes it much easier to see the basic elements in the model, separately from what is done to them. Figure 3.25 is the model from Figure 3.24 rearranged according to this rule. Here it is easier to see that the model is basically about STUDIES of PEOPLE . O BSERVATIONS of VARIABLES are then collected on PEOPLE during these STUDIES .

Figure 3.25. An Orderly Model.

graphics/03fig25.gif

There are those with the opinion that crow's feet should point down and to the right. While your author stands firmly against this heresy, the fact of the matter is that as long as you are consistent, you will achieve the same result.

SemanticData Model Patterns

If you follow the crow's foot rule, you will begin to notice certain things about your models. The aesthetics of the drawing suggest similarities. Figure 3.26, for example, shows a model of PURCHASE ORDER s, each of which is to a VENDOR and composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE . [8]

[8] The examples in this section are derived from examples in David C. Hay's Data Model Patterns: Conventions of Thought (New York: Dorset House Publishing, 1996).Used with permission of Dorset House Publishing Co., Inc.

Figure 3.26. Purchase Orders.

graphics/03fig26.gif

Figure 3.27, on the other hand, shows a model of sales orders. Sales orders, of course, are handled by a completely different department from purchase orders, so there would seem to be no relationship whatsoever between them. In this figure, a SALES ORDER must be from one CUSTOMER and composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE .

Figure 3.27. Sales Orders.

graphics/03fig27.gif

As you look at these two models, what do you see? First, the structures of SALES ORDER and PURCHASE ORDER are identical. Each must be composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE. The big difference is that a PURCHASE ORDER must be to a VENDOR , and a SALES ORDER must be from a CUSTOMER. But what are CUSTOMER and VENDOR? Each of these is fundamentally either a PERSON or an ORGANIZATION. Having said that, we can now recognize that our own enterprise is in fact an ORGANIZATION.

An ORGANIZATION or a PERSON is only a "customer" if it is a buyer in a (purchase or sales) ORDER. It is only a "vendor" if it is a seller in such an ORDER . [9] Thus, the real model is of an ORDER that has two relationships to PARTY: it must be from one PARTY and to another PARTY. Each party may be the buyer in one or more ORDERS, and each PARTY may be the seller in one or more ORDERS. This is shown in Figure 3.28.

[9] In this example, that is. In some companies you must be designated as a "vendor" before you can sell something. This would require creation of an entity type linking PARTY to PRODUCT TYPE to portray the role.

Figure 3.28. Orders.

graphics/03fig28.gif

The vendor's sales order is exactly the same order as the customer's purchase order. If we are the buyer, we call it a PURCHASE ORDER . If we are the seller, we call it a SALES ORDER . Any order always has both a buyer and a seller. Moreover, some organizations might well be both buyer and seller. The fact of the matter is that our organization is assuming one or the other of those roles.

Note that both PERSON AND ORGANIZATION share roles in an ORDER. In a large model, they will share other roles as well, so it is useful to define a super-type PARTY which is defined as either a person or organization of interest to the enterprise.

It turns out that this model for ORDER (which could also be called CONTRACT or AGREEMENT , or whatever) is common to any enterprise that does business with others. Once you recognize that this is the generic pattern for orders, you can now use it for almost any company. It may have to be elaborated on, but the underlying structure will be pretty much the same, whether the company is selling advertising for a cable television network, gasoline, pharmaceuticals , aluminum futures , or what have you.

In addition, there are other patterns available that apply to most commercial businesses. Many of these are laid out in Data Model Patterns , and a few will be presented below. These can be viewed from a high level, describing common business phenomena, or at a lower level, describing components of those higher-level models.

At the higher level, you have (among others):

  • People and organizations

  • Geography

  • Products, assets, or materials

  • Activities

  • Contracts

At the lower level, you have (among others):

  • Hierarchy/network

  • Type

These are described more fully in the following sections.

People and Organizations

As described above, a PARTY is a person or an organization of interest to the enterprise. In Figure 3.29, you can see that not only is a PARTY either a PERSON or an ORGANIZATION , but an ORGANIZATION , in turn , must be either a COMPANY , an INTERNAL ORGANIZATION (such as a department), a GOVERNMENT AGENCY , or an OTHER ORGANIZATION . Depending on your situation, you can define sub-types further, itemizing kinds of COMPANIES , for example.

Figure 3.29. Parties.

graphics/03fig29.gif

Note that the sub-type structure is fundamental . That is, any PARTY must be either a PERSON or an ORGANIZATION and not both. O RGANIZATION is then further subcategorized. There are in fact other kinds of classification that companies often wish to identify, but these are not so fundamental. These categories could be demographic categories, for example, such as "annual income". These can be accommodated by the entity types PARTY CATEGORY, PARTY CATEGORY SET , and PARTY CLASSIFICATION .

A PARTY CATEGORY is one of those classifications that someone is interested in, such as "Income greater than $50,000 per year". A PARTY CLASSIFICATION is the fact that a particular PARTY falls into that category, such as "Sam has an income of greater than $50,000 per year." Note that the PARTY CATEGORY, "Income greater than $50,000 per year", must be part of the PARTY CATEGORY SET, "Annual Income".

Note also that a PARTY CLASSIFICATION must also be by someone (a PARTY). This could be the Marketing Department, for example.

P ARTIES are related to each other. An INTERNAL ORGANIZATION may be part of another INTERNAL ORGANIZATION; a PERSON may be married to another PERSON; a PERSON may be a member of an OTHER ORGANIZATION , such as the Teamsters Union or The Data Administration Management Association. Each of these is an example of a PARTY RELATIONSHIP from one PARTY to another. Each PARTY RELATIONSHIP , in turn, is also an example of a PARTY RELATIONSHIP TYPE , such as "organizational structure", " marriage ", or "membership". This is also shown in Figure 3.29.

Note that making the entity types more generic makes any systems based on them more robust. For example, new categories of PARTY can be added without changing the data structure. This is at the cost, however, of losing representation of the business rules that lie behind the data. There might be rules that say, for example, that only PEOPLE can participate in the PARTY RELATIONSHIP of PARTY RELATIONSHIP TYPE "marriage", or that a DEPARTMENT in PARTY CATEGORY "sales office" can only report to a DEPARTMENT in PARTY CATEGORY "sales district ".

These rules must be documented separately from the model drawing.

Geography

It may seem that an appropriate attribute of party is "Address". But unfortunately , many parties have more than one address. These include "billing address", "delivery address", "home address", and so forth. To have multiple addresses as an attribute of party would thus violate First Normal Form.

What is required is a separate "address" entity type, shown in Figure 3.30 as SITE . A PARTY PLACEMENT is then defined as the fact that a PARTY is located at a particular SITE. Thus, not only can one PARTY be located in one or more SITES , but one SITE may be the location of one or more PARTIES.

Figure 3.30. Geography.

graphics/03fig30.gif

Note that by recognizing both PHYSICAL SITE and VIRTUAL SITE , the same entity type can be a place to store not only street addresses, but also telephone numbers , web addresses, and e-mail addresses.

A SITE is a particular virtual or real place with a purpose, such as a house, office, or website. A SITE must be an example of a SITE TYPE , such as "one-family home", "office building", "archeological dig", "warehouse", and so forth. A PARTY PLACEMENT the fact that a particular PARTY is located at a particular SITE must be an example of a PARTY PLACEMENT TYPE , such as "home address", "billing address", etc.

Each PHYSICAL SITE , in addition, must be located in at least one but possibly more GEOPOLITICAL AREAS. A GEOPOLITICAL AREA is a kind of GEOGRAPHIC AREA whose boundaries are defined by law or international treaty. A GEOGRAPHIC AREA is simply any bounded area on the earth. If it is not a GEOPOLITICAL AREA , a GEOGRAPHIC AREA may be either an ADMINISTRATIVE AREA (whose boundaries are defined by an enterprise's policies), a SURVEYED AREA (whose boundaries are defined by a survey, in terms of townships, sections, and the like), or a NATURAL AREA ( whose boundaries are determined by a natural phenomenon , such as a lake or habitat).

Products

So, what does the company make? What does it use? What does it otherwise manipulate? These fundamentally are products. The word "product" is troublesome , however, because while in principle it refers to anything tangible that can be bought, sold, or handled, in fact many companies' products (like those of banks) are intangible. In many ways, though, even these behave the same as tangible ones, such as computers and steam compressors.

In this model we will constrain PRODUCT to mean something discrete that is bought or sold, distinguishing it from EQUIPMENT which is used in the manufacturing process and MATERIAL such as powder or goo. The super-type we will define that encompasses all of these is ITEM. ( In Data Model Patterns , this is called ASSET.)

We want to distinguish between ITEM TYPE , which is the definition of a thing, such as might be found in a catalogue or specification sheet, and ITEM, an occurrence or instance of the thing that exists in a physical place. In Figure 3.31 ITEM TYPE is shown with the sub-types PRODUCT TYPE, MATERIAL TYPE, EQUIPMENT TYPE , and OTHER ITEM TYPE , as were described above.

Figure 3.31. Items and Item Types.

graphics/03fig31.gif

I TEM, on the other hand, distinguishes between INVENTORY , whose primary attribute is "Quantity", and DISCRETE ITEM , whose primary attribute is "Serial number". That is, a DISCRETE ITEM can be identified, piece by piece, while INVENTORY is an undifferentiated quantity of things.

Each ITEM must be (currently) located at only one SITE. This implies that an INVENTORY is defined to be that quantity of a particular ITEM that is stored in a particular SITE, just as a DISCRETE ITEM is a single object stored in a particular SITE.

Also an ITEM or ITEM TYPE may refer to anythingfinished good, intermediate, or raw material. By itself, an occurrence of ITEM or ITEM TYPE tells you nothing of its composition. For this we need ITEM TYPE STRUCTURE ELEMENT and ITEM STRUCTURE ELEMENT .

At the ITEM TYPE level, an ITEM TYPE STRUCTURE ELEMENT is the fact that a particular ITEM TYPE is a component of another ITEM TYPE. This information is usually found in engineering specifications. That is, each ITEM TYPE may be part of one or more ITEM STRUCTURE ELEMENTS , each of which must be the use (of that ITEM TYPE ) in one and only one other ITEM TYPE. Looking at it from the other direction, each ITEM TYPE may be composed of one or more ITEM TYPE STRUCTURE ELEMENTS , each of which must be the use of one and only one other ITEM TYPE.

An ITEM STRUCTURE ELEMENT is similar. Instead of being concerned with ITEM TYPES , however, it is concerned with actual occurrences of ITEMS. That is, each ITEM STRUCTURE ELEMENT must be the use in a physical ITEM and the use of a physical ITEM. Actually, an ITEM may contain not only identifiable other ITEMS, but also a quantity of an unidentified ITEM TYPE , such as "water" or "natural gas". Consequently, what the model actually says is that an ITEM STRUCTURE ELEMENT may be either the use of an ITEM or the use of an ITEM TYPE. ( The arc across the two relationships denotes this "exclusive or" concept.)

Ideally, each ITEM STRUCTURE ELEMENT would be based on an ITEM TYPE STRUCTURE ELEMENT. Whether this is possible or not in your company is another question.

Note that attributes of both ITEM TYPE STRUCTURE ELEMENT and ITEM STRUCTURE ELEMENT include "Quantity per" (the amount of the component required to make one unit of the assembly), "Effective date", and "Until date".

Activity Type/Activity

The work of an enterprise is defined by its activity types . In Column Two (Chapter 4), we will model the nature of activity types. To the extent that activities and activity types are themselves things of significance to the business, however, with data describing them, they will show up here in the Column One model.

An ACTIVITY is an example of either a SERVICE or an ACTIVITY TYPE . If we are talking about something offered for sale, it is probably a SERVICE . If, instead, we are talking about something done inside the company only, it could be called an ACTIVITY TYPE or PROCEDURE . Structurally, these are identical. A SERVICE or ACTIVITY TYPE may be embodied in one or more ACTIVITIES. Each ACTIVITY must occur on a particular "Date" (and optionally , "Time"), and must be performed at a SITE. All of this is shown in Figure 3.32.

Figure 3.32. Activities.

graphics/03fig32.gif

Also shown in Figure 3.32 is the fact that an ACTIVITY may be part of a WORK ORDER . A WORK ORDER is a specific authorization for a relatively large effort to be carried out. A WORK ORDER is usually composed of one or more ACTIVITIES.

A WORK ORDER must be either a PRODUCTION WORK ORDER (to make an ITEM TYPE) or a MAINTENANCE WORK ORDER ( to fix, install, or replace a DISCRETE ITEM).

In those companies where the model is concerned primarily with SERVICES that are sold to customers (or bought, for that matter), it may be simpler to make SERVICE a sub-type of ITEM TYPE. This has some odd implications, but most relationships actually apply both to SERVICE and the more tangible kinds of ITEM TYPES.

Figure 3.33 expands on the ACTIVITY idea, adding the PARTIES who participate in a WORK ORDER or in a particular ACTIVITY and adding also the consumption of labor and other resources by an ACTIVITY.

Figure 3.33. Activity Management.

graphics/03fig33.jpg

Note the attributes shown in this diagram. You can see that ACTIVITY, for example, has a "Scheduled start date", "Scheduled end date", "Actual start date", and "Actual end date". These position the ACTIVITY in time. By specifying scheduled dates only, you can define a planned activity before it is actually carried out. Similarly, WORK ORDER had an "Order date" and a "Due date", as well as a "Completion date".

A WORK ORDER ROLE is the fact that a PARTY ( a PERSON or an ORGANIZATION) has something to do with a WORK ORDER. This could be its manager, someone contributing to its execution, or even a beneficiary of it. Each WORK ORDER ROLE must be played by a PARTY, for a WORK ORDER.

Similarly, an ACTIVITY ROLE is the fact that a PARTY has something to do with an ACTIVITY. Each ACTIVITY ROLE must be played by a PARTY, for an ACTIVITY.

W ORK ORDERS and ACTIVITIES consume both labor and other resources, such as materials. The fact that time is consumed for an ACTIVITY is called a TIMESHEET ENTRY. Each TIMESHEET ENTRY must be by one PERSON and charged to a single ACTIVITY. Its most interesting attribute is "Hours": the number of hours (including fractions of hours) spent by this PERSON on this particular ACTIVITY. An attribute of the PERSON is "Charge rate": the dollars per hour charged for this PERSON'S work. This allows us to calculate the derived attribute "(Value)" as the "Hours" from this TIMESHEET ENTRY times the "Charge rate" of the PERSON that is the source of this TIMESHEET ENTRY. ( Parentheses denote the fact that the attribute is derived.)

The "(Value)" of all the TIMESHEET ENTRIES that are charged to an ACTIVITY can then be summed up to yield the derived attribute "(Labor cost)" for the ACTIVITY. The "(Labor cost)" of all ACTIVITIES that are part of a WORK ORDER can similarly be summed up to yield the "(Labor cost)" of the WORK ORDER.

A RESOURCE USAGE is the fact that an ITEM or ITEM TYPE is consumed during the course of an ACTIVITY. That is, each RESOURCE USAGE must be of either a particular ITEM (a specific DISCRETE ITEM or an INVENTORY), or of a generic ITEM TYPE such as "natural gas" or "water".

As with the calculation of labor cost, the cost of resources used combines the "Unit cost" of an ITEM or the "Standard cost" of an ITEM TYPE with the "Quantity" used of the RESOURCE USAGE. The "(Value)" of the RESOURCE USAGE is calculated by multiplying its "Quantity" by either the "Standard cost" of the ITEM TYPE that is consumed as the RESOURCE USAGE or the "Unit cost" of the ITEM that is consumed as the RESOURCE USAGE . The attribute "(Value)" can then be summed across all the RESOURCE USAGES that are charged to an ACTIVITY , to get the "(Resource cost)" of that ACTIVITY. Similarly, the "(Resource cost)" of an ACTIVITY can be summed across all ACTIVITIES that are part of a WORK ORDER to compute the total "(Resource cost)" for that WORK ORDER.

The "(Total cost)" of either an ACTIVITY or a WORK ORDER can then be computed by adding together that entity type's "(Labor cost)" and "(Resource cost)".

Contract

The business of any enterprise is contractspurchase orders, sales orders, leases, and the like. As we have seen, all contracts fundamentally have the same structure. In the introduction to this section we saw that a SALES ORDER and a PURCHASE ORDER are simply examples of different kinds of ORDERS. Here we will generalize the concept even further to the idea of CONTRACT. ( O RDER is simply a kind of CONTRACT.) A CONTRACT can be any agreement between two PARTIES



Requirements Analysis. From Business Views to Architecture
Requirements Analysis: From Business Views to Architecture
ISBN: 0132762005
EAN: 2147483647
Year: 2001
Pages: 129
Authors: David C. Hay

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net