The relational model consists of five components:
The following subsections elaborate on each of these components in turn.
Scalar types can be either system-defined (built-in) or user-defined, in general; thus, a means must be available for users to define their own scalar types (this requirement is implied, partly, by the fact that the set of scalar types is open-ended). A means must therefore also be available for users to define their own scalar operators, since types without operators are useless. The only required system-defined scalar type is type BOOLEAN the most fundamental type of all but a real system will surely support others as well (INTEGER, CHAR, and so on).
Support for type BOOLEAN implies support for the usual logical operators NOT, AND, OR, and so on as well as other operators (system- or user-defined) that return boolean values. In particular, the equality comparison operator "=" (equality comparison) must be available in connection with every type, nonscalar as well as scalar, because without it we couldn't even define the values that constitute the type in question. What's more, the model prescribes the semantics of that operator, too. To be specific, if v1 and v2 are values of the same type, v1 = v2 evaluates to TRUE if v1 and v2 are the very same value and FALSE otherwise.
By the way, SQL fails seriously on this requirement (that "=" be supported for every type, with prescribed semantics). For example:
One consequence of such deficiencies is that SQL violates The Assignment Principle (see the later section "Some Database Principles")--ubiquitously so, in fact.
The relation type generator allows users to specify individual relation types as desired: in particular, as the type for some relation variable or some relation-valued attribute (see Chapter 2 for further explanation). The intended interpretation for a given relation of a given type in a given context is as a set of true propositions; each such proposition constitutes an instantiation of some predicate that (a) corresponds to the relation heading and (b) is represented by a tuple in the relation body. If the context in question is some relvar that is, if we're talking about the relation that happens to appear as the current value of some relvar then the predicate in question is the relvar predicate for that relvar. If a tuple plausibly could appear in that relvar at some time but doesn't, the corresponding proposition is assumed to be false at that time.
Since the equality comparison operator "=" is available in connection with every type, it's available in connection with every relation type in particular.
As noted in the previous subsection, a particularly important use for the relation type generator is in specifying the type of a relation variable, or relvar, when that relvar is defined. The only kind of variable permitted in a relational database is the relvar (in particular, scalar and tuple variables are prohibited, even though they're not prohibited in fact, they're probably required in programs that access such a database).
The statement that the database contains nothing but relvars is one possible formulation of what Codd originally called The Information Principle, though I don't think it's a formulation he ever used himself. Instead, he usually stated the principle like this:
I heard Codd refer to this principle on more than one occasion as the fundamental principle underlying the relational model. Why is it so important? The answer is bound up with the observations I made in Chapter 4 to the effect that, along with types, relations are both necessary and sufficient to represent any data whatsoever at the logical level. In other words, the relational model gives us exactly what we do need in this respect, and it doesn't give us anything we don't need.
I'd like to pursue this point a moment longer. In general, it's axiomatic that if we have n different ways of representing data, then we need n different sets of operators. For example, if we had arrays as well as relations, we'd need a full complement of array operators as well as a full complement of relational ones. If n is greater than one, therefore, we have more operators to implement, document, teach, learn, remember, and use. But those extra operators add complexity, not power! There's nothing useful that can be done if n is greater than one that can't be done if n equals one (and in the relational model, of course, n does equal one).
What's more, not only does the relational model give us just one construct, the relation itself, for representing data, but that construct is to quote Codd himself (see the next section, "Objectives of the Relational Model")--of spartan simplicity: it has no ordering to its tuples, no ordering to its attributes, no duplicate tuples, no pointers, and (at least as far as I'm concerned) no nulls. Any contravention of these properties is tantamount to introducing another way of representing data, and therefore to introducing more operators as well. In fact, SQL is living proof of this observation; for example, SQL has eight different union operators,[*] while (as we know) the relational model has just one.
As you can see, The Information Principle is certainly important but it has to be said that its name hardly does it justice. Other names that have been proposed, mainly by Hugh Darwen or myself or both, include The Principle of Uniform Representation and The Principle of Uniformity of Representation. (This latter is clumsy, I admit, but at least it's accurate.)
Like the equality comparison operator "=", the assignment operator ":=" must be available in connection with every type, for without it we would have no way of assigning an arbitrary value to a variable of the type in question and again, relation types are no exception to this rule. INSERT, DELETE, and UPDATE shorthands are permitted and indeed useful, but strictly speaking they're only shorthands. What's more, support for relational assignment must include support for multiple relational assignment in particular.
The "generic relational operators" are the operators that make up the relational algebra, and they're therefore built-in (though there's no inherent reason why users shouldn't be allowed to define additional operators of their own, if desired).
Now, there seems to a widespread misconception concerning the purpose of the algebra. To be specific, many people seem to think it's meant just for writing queries but it's not; rather, it's for writing relational expressions. Those expressions in turn serve many purposes, including but certainly not limited to query. Here are some other important ones:
And so on (this isn't an exhaustive list).
The algebra also serves as a kind of yardstick against which the expressive power of database languages can be measured. Essentially, a language is said to be relationally complete if and only if it's at least as powerful as the algebra, meaning its expressions permit the definition of every relation that can be defined by means of expressions of the algebra. Relational completeness is a basic measure of the expressive capability of a language; if a language is relationally complete, it means (among other things, and speaking a trifle loosely) that queries of arbitrary complexity can be formulated without having to resort to loops or recursion. In other words, it's relational completeness that allows end users at least in principle, though possibly not in practice to access the database directly, without having to go through the potential bottleneck of the IT department.