Chapter 5: Data and Object Modeling


In this chapter we begin the exploration of how to bring semantics into systems design. The primary mechanism for enterprise applications is in the design of databases, and the primary tool is the modeling of these databases. As mentioned in Chapter 3, semantics is a combination of data and behavior, and this chapter covers a modeling domain in which data and behavior are tightly interlinked: object-oriented modeling.

Although data modeling and object modeling are both well-established fields, the vast majority of practitioners conduct their modeling implicitly, following semantic leads, without being aware that they are doing so. To make this modeling more explicit, and thus more accessible, we begin this discussion by examining the semantic differences between documents and databases. From there we investigate where the semantics in a database application reside. We review what schemas are and how they are defined, and then discuss how semantics relate to normalization. We conclude by looking at the normalization of logic and how object-oriented modeling has shaped semantic understanding.

Semantic Differences between a Database and a Document

At one semantic level, documents and databases are alike. Both contain information, not physical objects or events. (By "document" I mean the information content and not the rendering onto paper, which, once rendered, is a physical object.) Once you get past this abstract similarity, though, databases and documents appear to be very different. Let's explore this with a thought experiment.

Imagine a simple document such as the letter in Figure 5.1.

start figure

 August 1, 2002 Mr. Smith Perpetual Inc. 1 Main ST NYC, NY 10000 Dear Mr. Smith, Enclosed is the gum you ordered. Please remit $9.95 to include postage and handling. Sincerely, Ms Jones House of Gum 123 Broadway SFO, CA 99999 

end figure

Figure 5.1: Business correspondence.

Data

Data is intellectual property. It shares with other types of property the fact that it can be owned and ownership rights can be transferred. As intellectual property it was created by humans or by a device created by humans. It is distinct from other forms of intellectual property, such as ideas, inventions, or performances, in that it has been rendered into symbolic representation. We are primarily interested in data that has been rendered to electronic or magnetic media, because that makes it easier to process by computer. Information is a type of data made relevant through aggregation or conversion.

The letter, as data, could be stored in a database. But we typically don't think of this as the product of a database. Practitioners typically refer to this as "unstructured data." The data referred to in the definition box is data stored in a business system. The act of storing it is what elevates its status to intellectual property.

Figure 5.2 is a three-table database that has data similar to that contained in the letter in Figure 5.1.

click to expand
Figure 5.2: A "structured data" version of Figure 5.1, as cast in a database.

Not Two Views of the Same Thing

First, let's dispense with the obvious. These are not (as they currently stand) two views of the same thing.

Database-centric people will look at this and say, "We can generate the letter from the tables." What they mean is that they can write a program that would read the database and insert the appropriate fields into a document, filling in the rest with boilerplate and layout information supplied by the program.

But what if the letter were not generated? What if the letter was the record of the transaction and the database was something we populated from the letter?

Both the database and the letter imply several things:

  • That "we" are Ms. Jones, affiliated with House of Gum

  • That we have some knowledge of a Mr. Smith (enough, we believe, to ship gum and requests for payment)

  • That we offered gum for sale

  • That Mr. Smith ordered a pack of gum

  • That we shipped it to him

  • That we expect to be paid

What is interestingly different is that in the letter we can tell (but only by being human and interpreting) that the gum is traveling with the notification, which is also the invoice. In the database example we can't tell where in the process we are. Has the gum been picked? Shipped? Is the database updated before or after the shipment? And so on.

If the letter were generated from the database, what is the difference between automatically generating the letter without allowing subsequent changes, versus generating the letter and then allowing someone to edit the generated letter? As we will see, this seemingly innocuous question has a profound impact on what can be inferred from the structured data we have after the fact.

Although the two versions cover approximately the same semantic scope, it is more accurate to say that these are two different ways of recording the same event. However, they are not equivalent.

Another interesting area of investigation is the hybridization of documents and databases. Before we begin to examine the hybrids between databases and documents, let's clarify their distinctness. In particular, there are two key dimensions to their distinction:

  • Timing of the semantic interpretation

  • Ability to be used by programs

Timing of the Semantic Interpretation

A major difference between a database and a document is the timing of semantics evaluation and enforcement. We can write whatever we want in a document. Whether it makes sense, is true, conveys meaning, or memorializes some event is up to a person or potentially a program to interpret when the document is read.

In a database (really a database application) the semantics are enforced as the data is entered. As we enter the data, the semantic rules (implemented in application code or in database constraints) ensure that the data in the database semantically conforms.

If we go back to our discussion of contracts in Chapter 2, we will note that a contract is a document. Although there is some interpretation going on while the contract is being drafted and reviewed, the fact that it typically is not reduced to a schema suggests that much of the semantics of the document are left to be interpreted much later, often in a court of law. We might then notice that the contract did not describe the property to be transferred or that no consideration was indicated. If contracts were expressed as well-defined databases, such flaws would be evident when we wrote the contract.

Ability to Be Used by Programs

The other difference of note between a document and a database is that with a database, application programs are able to use the data in a way that is not available to applications that process documents. In the previous example, a database application programmer could write a program that would sum up the gum sales for a customer, a time period, or a particular type of gum.

Without doing some semantic interpretation, the document-based application programmer really can't do much of anything.

Hybrids between Documents and Databases

Life has become more interesting since we began "tagging" data in documents. "Tagged" languages are covered in more depth in Chapter 11, but you are probably familiar enough to follow the example in Figure 5.3.

start figure

 <letter> <Customer> Perpetual Inc   <Contact> Mr. Smith </ Contact>   <Address> 1 Main St NYC, NY 10000 </Address> </Customer> ... </letter> 

end figure

Figure 5.3: A tagged form of the letter in Figure 5.1.

This tagging essentially creates a hybrid between a document and a database. (The evidence that it is a hybrid is that a database person will say, "The tags are really just the schema," whereas an HTML programmer will say, "It's just a document.")

As we will discuss later in much more depth, the presence of these tags creates not only an interesting hybrid between document and database, but also a hybrid between transaction and document. However, before we discuss the fuzzy boundaries of the hybrids, let's take a longer look at some of the differences between a document and a database.




Semantics in Business Systems(c) The Savvy Manager's Guide
Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)
ISBN: 1558609172
EAN: 2147483647
Year: 2005
Pages: 184
Authors: Dave McComb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net