What Remains to Be Done? | Database in Depth: Relational Theory for Practitioners

All of the above is not to say we won't continue to make progress or there isn't still work to be done. In fact, I see at least four areas, somewhat interrelated, where developments are either under way or are needed: implementation, foundations, higher-level abstractions, and higher-level interfaces.

Implementation

In some ways the message of this book can be summed up very simply:

Let's implement the relational model!

I think it's clear from earlier chapters that it's being extremely charitable to describe SQL as a relational language, and hence that SQL products can be considered relational only to a first approximation. The truth is, the relational model has never been properly implemented in commercial form, and users have never really enjoyed the benefits that a truly relational product would bring. Indeed, that's one of the reasons why Hugh Darwen and I have been working for so long on The Third Manifesto. The Third Manifesto-- the Manifesto for short is a formal proposal for a solid foundation for future DBMSs. And it goes without saying that what it really does, in as careful and precise a manner as the authors are capable of, is define the relational model and spell out some of the implications of that definition. (It also goes into a great deal of detail on the impact of type theory on that model; in particular, it proposes a comprehensive model of type inheritance as a logical consequence of that type theory.)

So we'd really like to see the ideas of the Manifesto implemented properly in commercial form ("we" here meaning Darwen and myself). We believe such an implementation would serve as a solid basis on which to build so many other things for example, "object/relational" DBMSs; spatiotemporal DBMSs; DBMSs used in connection with the World Wide Web; and "rule engines" (also known as "business logic servers"), which some people see as the next generation of general-purpose DBMS products. We further believe we would then have the right framework for supporting the other items that are suggested in the rest of this section as also being desirable. Personally, in fact, I would go further: I would suggest that trying to implement those items in any other kind of framework is likely to prove more difficult than doing it correctly. To quote the well-known mathematician Gregory Chudnovsky: "If you do it the stupid way, you will have to do it again" (from an article in The New York Times, December 24, 1997).

To repeat, I want the model to be properly implemented. And in the previous chapter, I tried to suggest that a promising new implementation technology called The TransRelational? Model looks as if it might be well suited to that task. This possibility is under active investigation.

Foundations

There's still much interesting work to be done on theoretical foundations (it's certainly not the case that all of the foundation problems have been solved). Here are three examples:

Let rx be some relational expression. By definition, the relation r denoted by rx satisfies a constraint rc that's derived from the constraints satisfied by the relations in terms of which rx is expressed. Can that constraint rc be computed?
Can we inject more science into the database design process? In particular, can we come up with a precise characterization of the notion of redundancy?
In the previous chapter I sketched an approach to the missing information problem based on 6NF. What are the implications of that approach?

Higher-Level Abstractions

One way we make progress in computer languages and applications is by raising the level of abstraction. For example, I pointed out in Chapter 4 that the familiar KEY and FOREIGN KEY specifications are really just shorthand for constraints that can be expressed more longwindedly using the general integrity features of any relationally complete language like Tutorial D. But those shorthands are useful: quite apart from the fact that they save us some writing, they also serve to raise the level of abstraction, by allowing us to talk in terms of certain bundles of concepts that naturally belong together. In a sense, they make it easier for us to see the forest as well as the trees.

By way of another illustration, consider the relational algebra. I showed in Chapter 5 that many of the operators of the algebra including ones we use all the time (even if we don't realize it), such as semijoin are really shorthand for certain combinations of other operators.^[*] Indeed, there are other useful operators that I didn't discuss in that chapter at all, for space reasons, for which these remarks might be regarded as "even more true," in a sense. Again, what's really going on here is a raising of the level of abstraction (rather like macros raise the level of abstraction in a conventional programming language).

^[*] As a matter of fact, Darwen and I show in our Manifesto that every algebraic operator can be expressed in terms of just two primitives, remove (which is basically project) and either nand or nor.

Raising the level of abstraction in the relational world can be regarded as a kind of building on top of the relational model; it doesn't change the model, but it does make it more directly useful for certain tasks. And one area where this approach looks as if it's going to prove really fruitful is temporal databases. In our book Temporal Data and the Relational Model (Morgan Kaufmann, 2003), Hugh Darwen, Nikos Lorentzos, and I building on original work by Lorentzos introduce interval types as a basis for supporting temporal data in a relational framework. For example, consider the "temporal relation" in Figure 8-1, which shows that certain suppliers supplied certain parts during certain intervals of time (you can read d04 as "day 4," d06 as "day 6," and so on; likewise, you can read [d04:d06] as "the interval from day 4 to day 6 inclusive," and so on). Attribute DURING in that relation is interval-valued.

Figure 8-1. A relation with an interval attribute

Support for interval attributes (and hence for temporal databases) involves, among other things, support for generalized versions of the regular algebraic operators. For reasons that aren't important here, we call those generalized operators U_ operators; thus, there's a U_restrict operator, a U_join operator, a U_union operator, and so on. But and here comes the point those U_ operators are all, in the last analysis, nothing but shorthand for certain combinations of regular algebraic operators. Once again, then, what's fundamentally going on is a raising of the level of abstraction.

Two more points on this topic. First, our relational approach to temporal data involves not just "U_" versions of the algebraic operators but also (a) "U_" keys and foreign keys, (b) "U_" comparison operators, and (c) "U_" versions of INSERT, DELETE, and UPDATE but, again, all of these constructs turn out to be essentially just shorthand. Second, it also turns out that the Manifesto's type inheritance model has a crucial role to play in that temporal support so once again we see an example of the interconnectedness of all of these issues.

Higher-Level Interfaces

There's another way in which we can build on top of the relational model, and that's by means of various kinds of applications that run above the relational interface and provide various specialized services. One example might be decision support; another might be data mining; another might be a natural-language frontend. For the users of such applications, the relational model obviously disappears under the covers, at least to some degree. (Though even if it does, and even if most users interact with the database only through some such frontend, I think database design and the like will still necessarily be based on solid relational principles.)

By the way, suppose it's your job to implement one of those frontend applications. Which would you prefer as a target a relational DBMS or some other kind, say an object-oriented DBMS? And if you opt for the former, as I obviously think you should, which would you prefer a DBMS that supports the relational model or one that supports SQL?

In case it's not clear, my point is this: we've come a long way from the early days when SQL was being touted as a language that end users could use for themselves, and I know many people will dismiss my numerous criticisms of SQL as mere carping for that very reason. Real users don't use it anyway, right? Only programmers use it. And in any case, much of the SQL code that's actually executed is never written by a human programmer at all but is generated by some frontend application. However, it seems to me that SQL is bad as a target language for all of the same reasons that it's bad as a source language. And it further seems to me, therefore, that my criticisms are still germane.

So What About SQL?

SQL is incapable of providing the kind of firm foundation we need for future growth and development. Instead, it's the relational model that has to provide that foundation. In The Third Manifesto, therefore, Darwen and I reject SQL as such; in its place, we argue that some truly relational language like Tutorial D should be implemented as soon as possible. Of course, we aren't so naïve as to think that SQL will ever disappear. Rather, we hope that Tutorial D, or some other true relational language, will be sufficiently superior that it will become the database language of choice (by a process of natural selection), and SQL will become "the database language of last resort." In fact, we see a parallel with the world of programming languages, where COBOL has never disappeared (and never will); but COBOL has become "the programming language of last resort" for developing applications, because better alternatives exist. We see SQL as a kind of database COBOL, and we would like to see some other language become available as a better alternative to it.

Of course, we do realize that SQL databases and applications are going to be with us for a long time to think otherwise would be quite unrealistic and so we do have to pay some attention to the question of what to do about today's SQL legacy. The Manifesto therefore does include some specific proposals in this regard. In particular, it offers some suggestions for implementing SQL on top of a true relational language, so that existing SQL applications can continue to work. Detailed discussion of those proposals would be out of place here, however.