The Relational Model Versus Others | Database in Depth: Relational Theory for Practitioners

To repeat something I said in Chapter 4, it's my opinion that the relational model is rock solid, and "right," and will endure. A hundred years from now, I fully expect database systems still to be based on Codd's relational model. Why? Because the foundations of that model namely, set theory and predicate logic are themselves rock solid in turn. Elements of predicate logic in particular go back well over 2,000 years, at least as far as Aristotle (384-322 BCE).

So what about other data models the "object-oriented model," for example, or the "hierarchic model," or the CODASYL "network model," or the "semistructured model"? In my view, these other models just aren't in the same ballpark. Indeed, I seriously question whether they deserve to be called models at all.^[*] The hierarchic and network models in particular never really existed in the first place!--as abstract models, I mean, predating any implementations. Instead, they were invented after the fact; that is, commercial hierarchic and network products were built first, and the corresponding models were defined subsequently by a process of induction here just a polite term for guesswork from those products. As for the object-oriented and semistructured models, it's entirely possible that the same criticism applies, though it's hard to be sure. One problem is that there doesn't seem to be any consensus on what those models might consist of. It certainly can't be claimed, for example, that there's a unique, clearly defined, and universally accepted object-oriented model; and similar remarks apply to the semistructured model. (Actually, some people might claim there isn't a unique relational model, either! I'll deal with that argument a little later.)

^[*] Which is why I set them all in quotation marks. I'll drop the quotation marks from this point forward because I know how annoying they can be, but you should think of them as still being there in some virtual kind of sense.

Another important reason why I don't believe those other models really deserve to be called models at all is the following. First, I hope you agree it's undeniable that the relational model is indeed a model and is thus not, by definition, concerned with implementation issues. By contrast, the other models all fail, much of the time, to make a clear distinction between issues that truly are model issues and issues that are better regarded as implementation matters; at the very best, they muddy that distinction considerably (they're all much "closer to the metal," as it were). As a consequence, they're harder to use and understand, and they give implementers far less freedom far less than the relational model does, I mean to adopt inventive or creative approaches to questions of implementation.

So what of the claims to the effect that there are several "relational" models, too? For example, a recent (well, fairly recent) book^[] has a chapter titled "Different Relational Models," in which we find this:

] There is no such thing as the relational model for databases anymore [sic] than there is just one geometry.

And to bolster his argument, the author then goes to identify what he claims are six "different relational models."

Now, I wrote a response to these claims soon after I first encountered them. Here's an edited version of what I said at the time:

Of course it's true there are several different geometries (euclidean, elliptic, hyperbolic, and so forth). But is the analogy a valid one? That is, do those "different relational models" differ in the same way those different geometries differ? It seems to me the answer to this question is no. Elliptic and hyperbolic geometries are often referred to, quite explicitly, as noneuclidean geometries; for the analogy to be valid, therefore, it would seem that at least five of those "six different relational models" would have to be nonrelational models, and hence, by definition, not "relational models" at all. (Actually, I would agree that several of those "six different relational models" are indeed not relational. But then it can hardly be claimed at least, it can't be claimed consistently-- that they're "different relational models.")

And I went on to say this (again somewhat edited here):

But I have to admit that Codd did revise his own definitions of what the relational model was, somewhat, throughout the 1970s and 1980s. One consequence of this fact is that critics have been able to accuse Codd in particular, and relational advocates in general, of "moving the goalposts" far too much. For example, Mike Stonebraker has written^[*] that "one can think of four different versions" of the model:

Version 1: Defined by the 1970 CACM paper
Version 2: Defined by the 1981 Turing Award paper
Version 3: Defined by Codd's 12 rules and scoring system
Version 4: Defined by Codd's book
^[*] In his introduction to Chapter 1 ("The Roots"), Readings in Database Systems, Second Edition (Morgan Kaufmann, 1994).

Let me interrupt myself briefly to explain the references here. They're all by Codd. The 1970 CACM paper is "A Relational Model of Data for Large Shared Data Banks," CACM 13, No. 6 (June 1970). The 1981 Turing Award paper is "Relational Database: A Practical Foundation for Productivity," CACM 25, No. 2 (February 1982). The 12 rules and the accompanying scoring system are described in Codd's Computerworld articles "Is Your DBMS Really Relational?" and "Does Your DBMS Run By The Rules?" (October 14th and October 21st, 1985). Finally, Codd's book is The Relational Model For Database Management Version 2 (Addison-Wesley, 1990). Now back to my response:

Perhaps because we're a trifle sensitive to such criticisms, Hugh Darwen and I have tried to provide, in our book Foundation for Future Database Systems: The Third Manifesto (2nd edition, Addison-Wesley, 2000), our own careful statement of what we believe the relational model is (or ought to be!).^[] Indeed, wed like our Manifesto to be seen in part as a definitive statement in this regard. I refer you to the book itself for the details; here just let me say that we see our contribution in this area as primarily one of dotting a few i's and crossing a few t's that Codd himself left undotted or uncrossed in his own original work. We most certainly don't want to be thought of as departing in any major respect from Codd's original vision; indeed, the whole of the Manifesto is very much in the spirit of Codd's ideas and continues along the path that he originally laid down.
^[] That book is now superseded by our book To all of the preceding points I'd now like to add another, which I think clearly refutes the author's original argument. I agree there are several different geometries. But the reason why those geometries are all different is this: they start from different axioms. By contrast, we've never changed the axioms for the relational model. We have made a number of changes over the years to the model itself for example, we've added relational comparisons but the axioms (which are basically those of classical predicate logic) have remained unchanged ever since Codd's first papers. Moreover, what changes have occurred have all been, in my view, evolutionary, not revolutionary, in nature. Thus, I really do claim there's only one relational model, even though it has evolved over time and will presumably continue to do so. As I said in Chapter 1, the model can be seen as a small branch of mathematics; as such, it grows over time as new theorems are proved and new results discovered.

So what are those evolutionary changes? Here are some of them:

As already mentioned, we've added relational comparisons.
We've clarified the importance of the logical difference between relations and relvars.
We have a better understanding of the nature of relational algebra, including the relative significance of various operators and an appreciation of the importance of relations of degree zero, and we've identified certain new operators (for example, extend).
We also have a better understanding of updating, including view updating in particular.
We've clarified the concept of first normal form; as a consequence, we've embraced the concept of relation-valued attributes in particular.
We have a better understanding of the fundamental significance of integrity constraints in general, and we have many good theoretical results regarding certain important special cases.
We've clarified the nature of the relationship between the model and predicate logic.
Finally, we have a clearer understanding of the relationship between the relational model and type theory (more specifically, we've clarified the nature of domains).