Appendix B. A Comparison of Data Modeling Techniquesbr(Syntactic Conventions)


Team-Fly

	Requirements Analysis: From Business Views to Architecture By David C. Hay
	Table of Contents

Appendix B. A Comparison of Data Modeling Techniques
(Syntactic Conventions)

Peter Chen first introduced entity/relationship modeling in 1976 [Chen 1976, 1977]. It was a brilliant idea that has revolutionized the way we represent data. It was a first version only, however, and many people since then have made improvements on it. A veritable plethora of data-modeling techniques have been developed.

Things became more complicated in the late 1980s with the advent of a variation on this theme called "object modeling". Now there were even more ways to model the structure of data. This was mitigated somewhat in the mid-1990s with the introduction of the UML, a modeling technique intended to replace at least all the "object-modeling" ones. As will be seen in this appendix, it is not quite up to replacing other entity/relationship approaches, but it has had a dramatic effect on the object-modeling world.

This appendix presents the most important of these techniques and provides a basis for comparing them with each other.

Regardless of the symbols used, data or object modeling is intended to do one thing: describe the things about which an organization wishes to collect data, along with the relationships among them. For this reason, all of the commonly used systems of notation are fundamentally convertible one to another. The major differences among them are aesthetic , although some make distinctions that others do not, and some do not have symbols to represent all situations.

This is true for object-modeling notations as well as entity/relationship notations.

There are actually three levels of conventions to be defined in the data-modeling arena: The first is syntactic , about the symbols to be used. These conventions are the primary focus of this appendix. The second defines the organization of model diagrams. Positional conventions dictate how entity types are laid out. Richard Barker has defined a very effective set of positional conventions [Barker 1990]. These are described in Chapter 3 (page 113). Finally, there are conventions about how the meaning of a model may be conveyed. Semantic conventions describe standard ways for representing common business situations. These are described briefly in Chapter 4 (pages 114132). You can find more information about these in books by David Hay [Hay, 1996] and Martin Fowler [Fowler, 1997].

These three sets of conventions are, in principle, completely independent of each other. Given any of the syntactic conventions described here, you can follow any of the available positional or semantic conventions. In practice, however, promoters of each syntactic convention typically also promote at least particular positional conventions, if not the semantic ones as well.

In evaluating syntactic conventions, it is important to remember that data modeling has two audiences. The first is the business community that uses the models and their descriptions to verify that the analysts in fact understand their environment and their requirements. The second audience is the set of systems designers, who use the structures in the models and the business rules implied by them as the basis for computer system designs.

Different techniques are better for one audience or the other. Models used by analysts must be clear and easy to read. This often means that these models may describe less than the full extent of detail available. First and foremost, they must be accessible by a non-technical viewer. Models for designers, on the other hand, must be as complete and rigorous as possible, expressing as much as possible.

The evaluation, then, will be based both on the technical completeness of each technique and on its readability.

Technical completeness is in terms of the representation of:

Entity types and attributes
Relationships
Unique identifiers
Sub-types and super-types
Constraints between relationships

A technique's readability is characterized by its graphic treatment of relationship lines and entity-type boxes, as well as its adherence to the general principles of good graphic design. Among the most important of these principles is that each symbol should have only one meaning, which applies wherever that symbol is used, and that each concept should be represented by only one symbol. Moreover, a diagram should not be cluttered with more symbols than are absolutely necessary, and the graphics in a diagram should be intuitively expressive of the concepts involved. Your author has written several articles on this subject [e.g., Hay, 1998.]

Each technique has strengths and weakness in the way it addresses each audience. As it happens, most are oriented more toward designers than they are toward the user community. These produce models that are very intricate and they focus on making sure that all possible constraints are described. Alas, this is often at the expense of readability.

This document presents seven notation schemes:

Peter Chen He's the man who started it all.
Information Engineering Clive Finkelstein and James Martin combined data modeling with an approach to systems development.
Richard Barker His is the notation used in Europe's SSADM methodology and by the Oracle Corporation.
IDEF1X This technique is supported and extensively used by the United States Department of Defense.
Object Role Modeling (ORM) This is a different approach to modeling facts and data.
The Unified Modeling Language (UML) This is the latest technique supported in the object-oriented world.
The Extended Markup Language (XML) This is not strictly a data-modeling language, but it demonstrates some interesting data-structure ideas.

For comparison purposes, the same example model is presented in the following sections using each technique. Note that the UML is billed as an "object modeling" technique, rather than as a data (entity/relationship) modeling technique, but as you will see, its structure is fundamentally the same. This comparison is in terms of each technique's symbols for describing entity types (or "object classes", for the UML), attributes, relationships (or object-oriented "associations"), unique identifiers, sub-types, and constraints between relationships.

At the end of the individual discussions is your author's argument in favor of Mr. Barker's approach for use in requirements analysis, along with his argument in favor of UML to support object-oriented design and IDEF1X to support relational database design.


Team-Fly

Top

Appendix B. A Comparison of Data Modeling Techniques (Syntactic Conventions)

Appendix B. A Comparison of Data Modeling Techniques
(Syntactic Conventions)