Recommendations | Requirements Analysis: From Business Views to Architecture


Team-Fly

	Requirements Analysis: From Business Views to Architecture By David C. Hay
	Table of Contents

	Appendix B. A Comparison of Data Modeling Techniques (Syntactic Conventions)

Recommendations

Because the orientation and purposes of data modeling are very different when supporting analysis than when supporting design, no one modeling technique currently available is appropriate for both. Those with the best aesthetics don't describe as many aspects of the issue as others, which are much less accessible.

The one exception to this is object role modeling, which is both rich in detail and relatively easy to read. It differs radically from the other modeling approaches, so it has therefore been less successful in gaining acceptance.

Among those using the more common entity-type/relationship view of the world, Richard Barker's notation is clearly superior as a vehicle for discussing models with prospective system users, and the UML has advantages in supporting designparticularly object-oriented design.

For AnalysisRichard Barker's Notation

There are several arguments in favor of Mr. Barker's data-modeling syntax for use in requirements analysis:

Aesthetic simplicity

This notation is the easiest to present to a user audience. It is the simplest and clearest among those that are as complete. By using fewer kinds of symbols, Barker's technique keeps drawings relatively uncluttered, and fewer kinds of elements have to be understood . Simpler, less cluttered diagrams are more accessible to nontechnical managers and other end users.

It uses a line in two parts , each of which may be dashed or solid, to convey the entire set of optional or mandatory aspects of the relationship pair. The presence or absence of a crow's foot is all that is necessary to represent the upper limit of a relationship. The single symbol of a split line which is either solid or dotted , plus the presence or absence of a crow's foot, is aesthetically simpler than, say, information engineering, which requires combinations of four separate symbols to convey the same information.

In Barker's notation, the "dashedness" or solidness of a line (its most visible aesthetic quality) represents the optionality of the relationship, which is its most important characteristic to most users. IDEF1X, on the other hand, uses "dashedness" to represent the extent to which a relationship is in a unique identifier.

Other systems of notation add symbols unnecessarily: Chen's notation uses different symbols for objects that are implementations of relationships and objects that are tangible entity types; Chen also uses separate symbols for each attribute; IDEF1X also distinguishes between "dependent" entity types and "independent" ones. IDEF1X also uses different symbols at the different ends of relationships. The UML designates certain kinds of relationships ("part of" and "member of") by either of two special symbols, depending on the referential integrity constraint in effect.

In each case, the additional symbols merely add to the complexity of a diagram and make it more impenetrable, without communicating anything that is not already contained in the simpler notation and names of Barker's notation.

James Martin's version of information engineering is the only one other than Barker's notation that represents sub-types inside super-types, thereby reinforcing the fact that it is a subset, and saving diagram space in the process.

Also, other techniques introduce extra complexity by allowing relationship lines to meander all over the diagram. Barker's notation calls for a specific approach to layout which keeps relationship lines short and straight.

Completeness

Most of the techniques show the same things that Barker's notation technique does, although some are more complete than others. Each of them lacks something that Barker's notation has.

Information engineering does not show attributes; IDEF1X does not show constraints; only Mr. Martin's version of information engineering shows sub-types within super-types. Mr. Chen's notation, information engineering, and UML do not show unique identifiers. Only ORM has all of the same features that the Barker method has, but with its external attributes and sub-types it uses way too much space on the diagram.

In fairness, some of the techniques do things that Barker's does not. IDEF1X, ORM, and the UML show nonexhaustive sub-types, where the sub-types do not represent all occurrences of the super-type. (Barker's technique deals with this only indirectlyby defining a sub-type called " OTHER ..."). The UML also shows nonexclusive sub-types, where an occurrence of the super-type can be an occurrence of more than one sub-type. Information engineering and the UML also show nonexclusive constraints between relationships, not available in Barker's technique.

These are all useful things.

The addition of processing logic to data models in the manner of object-modeling techniques (including behavior in the model) is also a very powerful idea. Clearly provision for describing the behavior of an entity type is something that could be added to Barker's notation. Whether it is more appropriate to extend this notation, in the manner of the UML, or to use separate models, such as entity-type life histories and state/transition diagrams, remains to be seen.

Language

Barker's notation requires the analyst to describe relationships succinctly and in clear, grammatically sound, easy-to-understand English. As mentioned above, where all the other techniques use verbs and verb phrases as relationship names, Barker's notation uses prepositional phrases. This is more appropriate, since the preposition is the part of speech that describes relationships. Verbs describe not relationships but actions, which makes them more appropriate for function models than data models. To use a verb to describe a relationship is to say that the relationship is defined by actions taken on the two entity types. It is better simply to describe the nature of the relationship itself.

Using verbs makes it impossible to construct a clean, natural English sentence that completely describes the relationship. "Each party sells in zero, one, or more purchase orders" is not a sentence one would normally use in conversation.

Moreover, finding the right prepositional phase to capture the precise meaning of the relationship is often more difficult than finding a verb that approximately gets the idea across. The requirement to use prepositions then adds a level of discipline to the analyst's assignment. The analyst must understand the relationship very well to come up with exactly the right name for it. Correctly naming relationships often reveals that in fact there is more than one.

This requirement for well-built relationship sentences, then, improves the precision of the resulting model. In each modeling technique, Mr. Barker's naming conventions could be used, but analysts are not encouraged to do so.

For Object-Oriented DesignThe UML

While Mr. Barker's notation is preferred as a requirements analysis tool, UML is more complete and detailed and therefore the most suited to support designparticularly object-oriented design.

The method for annotating optionality and cardinality is much more expressive of different circumstances than any of the other techniques. It can specifically say that an occurrence of an entity type is related to 1, 79, or 10 occurrences of another entity type.

The UML can describe many more constraints between relationships than can other notations. With proper annotation, it can describe both exclusive and inclusive or relationships, or any other that can be named.

For business rules that are not simple relationships between two associations, UML introduces a small flag that can include text describing any business rule.

Attributes can be described in more detail than in other notations.

Overlapping and incomplete configurations of sub-types are allowed.

"Multiple inheritance", where a sub-type may be one of more than one super-types, is permitted, as are multiple type hierarchies. While these may not be desirable in analysis models, they could be useful as solutions to particular design problems.

In an object-oriented environment, the extra symbols address specific object-oriented situations.

For Relational DesignIDEF1X

For the reasons described above, it is not advisable to use IDEF1X in an analysis project, since the notation is far too complex to present to a non-technical audience. This complexity, however, is exactly what makes it a good tool for representing relational database design. Its notation highlights the existence of foreign keys, and these are documented explicitly. The differences in annotating optionality and cardinality reflect the different way these could be implemented.

Summary

The ideal CASE tool, then, will be one which supports Mr. Barker's techniques for doing requirements analysis, then has the facilities for converting entity-type definitions into either (1) table definitions or (2) class definitions that can be used by C++ or a similar language. It would then have the ability to represent these design artifacts in IDEF1X or the UML for further refinement.


Team-Fly

Top