Before going any further, there's one very important point I need to explain, because it underpins everything else in this book. The relational model is, of course, a data model. Unfortunately, however, this latter term has two quite distinct meanings in the database world. The first and more fundamental meaning is this:
This is the meaning we have in mind when we talk about the relational model in particular. And, armed with this definition, we can usefully (and importantly) go on to distinguish a data model in this first sense from its implementation, which can be defined as follows:
I'll illustrate these definitions in terms of the relational model specifically. First, and obviously enough, the concept of relation is itself part of the model: users have to know what relations are, they have to know they're made up of tuples and attributes, they have to know how to interpret them, and so on. All that is part of the model. But they don't have to know how relations are physically stored on the disk, or how individual data values are physically encoded, or what indexes or other access paths exist; all that is part of the implementation, not part of the model.
Or consider the concept join: users have to know what a join is, they have to know how to invoke a join, they have to know what the result of a join looks like, and so on. Again, all that is part of the model. But they don't have to know how joins are physically implemented, or what expression transformations take place under the covers, or what indexes or other access paths are used, or what physical I/O's occur;[*] all that is part of the implementation, not the model.
In a nutshell, then:
(Of course, I'm not saying users aren't allowed to know about the implementation; I'm just saying they don't have to. In other words, everything to do with the implementation should be, at least potentially, hidden from the user.)
Here are some important consequences of the foregoing definitions. First, note that performance is fundamentally an implementation issue, not a model issue despite extremely common misconceptions to the contrary. We're often told, for example, that "joins are slow." But such remarks make no sense! Join is part of the model, and the model as such can't be said to be either fast or slow; only implementations can be said to possess any such quality. Thus, we might reasonably say that some specific product X has a faster or slower implementation of some specific join than some other specific product Y but that's all.
I don't want to give the wrong impression here. It's true that performance is basically an implementation issue; but that doesn't mean a good implementation will perform well if you use the model badly! Indeed, this is precisely one of the reasons why you need to know the model (I mean, so that you don't use it badly). If you write an expression such as S JOIN SP, you're within your rights to expect the implementation to do a good job on it; but if you insist on (in effect) hand-coding the join yourself, perhaps like this:
do for all tuples in S ; fetch S tuple into TNO, TN, TS, TC ; do for all tuples in SP with SNO = TNO ; fetch SP tuple into TNO, TP, TQ ; emit tuple TNO, TN, TS, TC, TP, TQ ; end ; end ;
then there's no way you're going to get good performance. Relational systems should not be used like simple access methods.
Second, as you probably realize, it's precisely the fact that model and implementation are logically distinct that enables us to achieve data independence. Data independence (not a great term, by the way, but we're probably stuck with it) means we have the freedom to change the way the data is physically stored and accessed without having to make corresponding changes in the way the data is perceived by the user. The reason we might want to change those storage and access details is, of course, performance; and the fact that we can make such changes without having to change the way the data looks to the user means that existing application programs, queries, and so on can still work. Very importantly, therefore, data independence means protecting your investment in user training and applications.
As you can see from the foregoing definitions, the distinction between model and implementation is really just a special case (a very important special case) of the familiar distinction between logical and physical. Sadly, however, most of today's database systems, even those that claim to be relational, don't make those distinctions as clearly as they should. As a direct consequence, they deliver far less data independence than they should, and far less than relational systems are theoretically capable of. I'll come back to this issue in the next section, as well as in Chapter 7.
Now I want to turn to the second meaning of the term data model, which I dare say you're very familiar with. It can be defined thus:
In other words, a data model in the second sense is just a (possibly somewhat abstract) database design. For example, we might speak of the data model for some bank, or some hospital, or some government department.
Having now explained these two different meanings, I'd like to draw your attention to an analogy that I think nicely illuminates the relationship between them:
By the way, it follows from all of the above that if we're talking about data models in the second sense, we might reasonably speak of "relational models" in the plural or "a" relational model (with an indefinite article). But if we're talking about data models in the first sense, then there's only one relational model, and it's the model (with the definite article). I'll have more to say on this issue in Chapter 8.
For the rest of this book I'll use the term data model or usually just model for short exclusively in its first sense.