How Was the Schema Defined? | Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)

People (process owners, developers, database analysts) build models of the data they would like to store. The more detailed and precise they make their model, the more possibility there is to extend the useful range of things that can be done with it. The flip side is that the more complete, complex, detailed, and explicit the model is, the more difficult it is to change. Developers are constantly faced with this tradeoff.

Explicit models come in two major varieties: descriptive and prescriptive. A descriptive model describes some aspect of the world. For example, a map is a descriptive model of a region of the world. A prescriptive model describes something that we wish to bring into existence, such as the blueprints for a house, or the model that predicts its ability to withstand wind shear.

Vocabulary, taxonomy, ontology, and categories are primarily descriptive; that is, they enable us to describe what we find in the world. The models that we will deal with in the remainder of this chapter are all prescriptive; their primary purpose is to do something.

Another aspect of models is that each model addresses some aspect of the thing to be built. In the case of the models to be described, we generally proceed from abstract and high-level models and refine them until we get to models that can be implemented. Following is a summary of some of the key models that you might encounter as you begin the process of codifying the semantics of an application into a database model.

As we examine these models, we need to keep in mind that each model is generated from a set of requirements (which themselves may be models). We take a set of requirements, add a set of constraints for the target we're trying to instantiate the requirements into, and then, through a design process, render a model that attempts to balance the needs of the requirements with the constraints of the target environment.

Semantic Model

A semantic model is a conceptual model in which the goal of the process and the resulting model is to convey unambiguously what the items being modeled mean. The main difference between a conceptual model and a semantic model is the effort spent on resolving meaning. Typically, a conceptual model would document the definition of concepts in terms that the business would understand, whereas a semantic model would attempt to resolve the concepts to semantic primitives or some other means of reducing ambiguity.

Unfortunately, you rarely see either in practice. I believe there are several reasons for this. One is that very few tools focus on conceptual or semantic modeling. Two, many practitioners saw conceptual modeling as a steppingstone to their more refined logical and physical models, and as soon as they could figure out how to safely skip a step, they did. Three, because there is no implementation of a semantic model, all maintenance is done to the physical model, further cutting the conceptual models out of the loop. Chapters 6 through 9 go into much more detail on how and why to do semantic modeling, and Chapters 10 through 14 discuss tools that are becoming available for semantic modeling.

We'll start with a simple conceptual model such as the one in Figure 5.6, where we model that customers can order products.

click to expand
Figure 5.6: Conceptual model.

In Figure 5.7 we embellish the model with some previously well-defined semantic primitives (shown in gray). In practice we would further embellish this, for example, by indicating the constraints on the parties in the role relationship that make them eligible to be customers.

click to expand
Figure 5.7: Semantic model.

Most of the semantic modeling texts use ovals rather than boxes, but there isn't any real difference at that level. There is also a mixed message in the literature on semantic modeling. Some suggest, as per this example, that semantic modeling be done at a high or conceptual level. Others seek to model all the properties, which makes for a much more complex graph at this point in the analysis.

There is nothing inherently semantic about customers, orders, and products, as drawn here. We will return to this and describe where the semantics come from in Chapters 6 through 9; for now we will focus on the evolution of a model, from conceptual modeling through logical and physical modeling.

Logical (Entity Relationship) Modeling

Entity relationship modeling can be used for conceptual and logical modeling. In the example in Figure 5.8 we use the earlier, Chen notation.^[22] There are many variations, but they share the identification of entities and relationships, and especially the cardinality of relationships (in this case the 1's and the M's, signifying that there must be at least one customer, but the customer can place more than one order).

click to expand
Figure 5.8: High-level logical model.

Most data modelers add attributes to the model at this point and introduce the entities that will be needed to resolve the many-to-many relationships (in this case product to order will need an order line entity to be implementable in a relational model, which we show in Figure 5.9).

click to expand
Figure 5.9: Logical model, normalized with some attributes.

It is also at this point that most modelers "normalize" their data models. Normalization is a process of refining the design such that each table contains only attributes that are dependent on the key of the table. This arrangement greatly improves the integrity of the data, especially when updating or deleting information (you don't have to traipse through all the rest of the data to see what might have been affected).

In Figure 5.9 we introduce some of the key attributes and normalize the OrderLine entity. Note that by introducing the OrderLine entity, the cardinality on the relationship to product goes from M to 1. This is because the OrderLine entity acts as a junction record and removes the many-to-many relationship. It should also be noted that extended-entity relational modeling includes the concept of subtypes and inheritance, but the use of inheritance is much more widespread in object-oriented design.

Physical Modeling

The physical model is a transformation of the logical model into a form that has acceptable access and performance characteristics for the intended use, using a specific target technology.

In Figure 5.10 we have added the attributes to the entities (which now represent physical tables) on which they will reside. The inverted triangles represent indexes, which have been added for performance reasons. Several of the tables have multiple indexes, to speed access on different types of queries. At this stage we introduce calculated attributes that we expect to be used often enough to "cache." We consider QtyOnHand to be a cached value, because it could be recalculated from issues and receipts, but generally it is easier to maintain as a value that is refreshed whenever any of the factors change.

click to expand
Figure 5.10: Physical data model.

^[22]Peter P. Chen, "The Entity-Relationship Model—Toward a Unified View of Data," ACM Transactions on Database Systems, March 1976, pp 9–36. Available at http://bit.csc.lsu.edu/~chen/chen.html.