The Relational Model Defined | Database in Depth: Relational Theory for Practitioners

The relational model consists of five components:

An open-ended collection of scalar types, including in particular the type BOOLEAN (truth values)
A relation type generator and an intended interpretation for relations of types generated thereby
Facilities for defining relation variables of such generated relation types
A relational assignment operator for assigning relation values to such relation variables
An open-ended collection of generic relational operators for deriving relation values from other relation values

The following subsections elaborate on each of these components in turn.

Scalar Types

Scalar types can be either system-defined (built-in) or user-defined, in general; thus, a means must be available for users to define their own scalar types (this requirement is implied, partly, by the fact that the set of scalar types is open-ended). A means must therefore also be available for users to define their own scalar operators, since types without operators are useless. The only required system-defined scalar type is type BOOLEAN the most fundamental type of all but a real system will surely support others as well (INTEGER, CHAR, and so on).

Support for type BOOLEAN implies support for the usual logical operators NOT, AND, OR, and so on as well as other operators (system- or user-defined) that return boolean values. In particular, the equality comparison operator "=" (equality comparison) must be available in connection with every type, nonscalar as well as scalar, because without it we couldn't even define the values that constitute the type in question. What's more, the model prescribes the semantics of that operator, too. To be specific, if v1 and v2 are values of the same type, v1 = v2 evaluates to TRUE if v1 and v2 are the very same value and FALSE otherwise.

By the way, SQL fails seriously on this requirement (that "=" be supported for every type, with prescribed semantics). For example:

For the built-in types CHAR and VARCHAR, it's possible in fact, common for "=" to give TRUE even if the comparands are distinct.
For the built-in type XML (and the built-in types BLOB and CLOB, in certain products), "=" isn't defined at all.
For user-defined types, "=" can be defined only when "<" is defined as well, and for some types "<" makes no sense; and even when "=" is defined, the semantics are arbitrary, in the sense that they're left to the type definer.
For tables, no comparison operators are defined at all.
Last, it's quite common for "=" not to give TRUE even if the comparands are indistinguishable; in particular, this situation arises if both comparands are null.

One consequence of such deficiencies is that SQL violates The Assignment Principle (see the later section "Some Database Principles")--ubiquitously so, in fact.

Relation Types

The relation type generator allows users to specify individual relation types as desired: in particular, as the type for some relation variable or some relation-valued attribute (see Chapter 2 for further explanation). The intended interpretation for a given relation of a given type in a given context is as a set of true propositions; each such proposition constitutes an instantiation of some predicate that (a) corresponds to the relation heading and (b) is represented by a tuple in the relation body. If the context in question is some relvar that is, if we're talking about the relation that happens to appear as the current value of some relvar then the predicate in question is the relvar predicate for that relvar. If a tuple plausibly could appear in that relvar at some time but doesn't, the corresponding proposition is assumed to be false at that time.

Since the equality comparison operator "=" is available in connection with every type, it's available in connection with every relation type in particular.

Relation Variables

As noted in the previous subsection, a particularly important use for the relation type generator is in specifying the type of a relation variable, or relvar, when that relvar is defined. The only kind of variable permitted in a relational database is the relvar (in particular, scalar and tuple variables are prohibited, even though they're not prohibited in fact, they're probably required in programs that access such a database).

The statement that the database contains nothing but relvars is one possible formulation of what Codd originally called The Information Principle, though I don't think it's a formulation he ever used himself. Instead, he usually stated the principle like this:

The entire information content of the database (at any given time) is represented in one and only one way: namely, as explicit values in attribute positions in tuples in relations.

I heard Codd refer to this principle on more than one occasion as the fundamental principle underlying the relational model. Why is it so important? The answer is bound up with the observations I made in Chapter 4 to the effect that, along with types, relations are both necessary and sufficient to represent any data whatsoever at the logical level. In other words, the relational model gives us exactly what we do need in this respect, and it doesn't give us anything we don't need.

I'd like to pursue this point a moment longer. In general, it's axiomatic that if we have n different ways of representing data, then we need n different sets of operators. For example, if we had arrays as well as relations, we'd need a full complement of array operators as well as a full complement of relational ones. If n is greater than one, therefore, we have more operators to implement, document, teach, learn, remember, and use. But those extra operators add complexity, not power! There's nothing useful that can be done if n is greater than one that can't be done if n equals one (and in the relational model, of course, n does equal one).

What's more, not only does the relational model give us just one construct, the relation itself, for representing data, but that construct is to quote Codd himself (see the next section, "Objectives of the Relational Model")--of spartan simplicity: it has no ordering to its tuples, no ordering to its attributes, no duplicate tuples, no pointers, and (at least as far as I'm concerned) no nulls. Any contravention of these properties is tantamount to introducing another way of representing data, and therefore to introducing more operators as well. In fact, SQL is living proof of this observation; for example, SQL has eight different union operators,^[*] while (as we know) the relational model has just one.

^[*] By rights that eight ought really to be twelve-- SQL's so-called "multiset union," which applies to the "multisets of rows" that are permitted as values within columns of tables, supports only two of the six options that are supported for its regular table union. To make matters worse, SQL doesn't support the true multiset union operator at all; the operator it calls "multiset union" corresponds to what's called "union plus" in the literature. See Donald E. Knuth's The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition (Addison-Wesley, 1997) for further explanation.

As you can see, The Information Principle is certainly important but it has to be said that its name hardly does it justice. Other names that have been proposed, mainly by Hugh Darwen or myself or both, include The Principle of Uniform Representation and The Principle of Uniformity of Representation. (This latter is clumsy, I admit, but at least it's accurate.)

Relational Assignment

Like the equality comparison operator "=", the assignment operator ":=" must be available in connection with every type, for without it we would have no way of assigning an arbitrary value to a variable of the type in question and again, relation types are no exception to this rule. INSERT, DELETE, and UPDATE shorthands are permitted and indeed useful, but strictly speaking they're only shorthands. What's more, support for relational assignment must include support for multiple relational assignment in particular.

Relational Operators

The "generic relational operators" are the operators that make up the relational algebra, and they're therefore built-in (though there's no inherent reason why users shouldn't be allowed to define additional operators of their own, if desired).

Now, there seems to a widespread misconception concerning the purpose of the algebra. To be specific, many people seem to think it's meant just for writing queries but it's not; rather, it's for writing relational expressions. Those expressions in turn serve many purposes, including but certainly not limited to query. Here are some other important ones:

Defining views and snapshots
Defining the set of tuples to be inserted into, deleted from, or updated in some relvar (or, more generally, defining the set of tuples to be assigned to some relvar)
Defining constraints (though here the relational expression is just a subexpression of some boolean expression, frequently though not invariably an IS_EMPTY invocation)

And so on (this isn't an exhaustive list).

The algebra also serves as a kind of yardstick against which the expressive power of database languages can be measured. Essentially, a language is said to be relationally complete if and only if it's at least as powerful as the algebra, meaning its expressions permit the definition of every relation that can be defined by means of expressions of the algebra. Relational completeness is a basic measure of the expressive capability of a language; if a language is relationally complete, it means (among other things, and speaking a trifle loosely) that queries of arbitrary complexity can be formulated without having to resort to loops or recursion. In other words, it's relational completeness that allows end users at least in principle, though possibly not in practice to access the database directly, without having to go through the potential bottleneck of the IT department.