Chapter Five. Relational Algebra | Database in Depth: Relational Theory for Practitioners

I'll begin this chapter by briefly reviewing a few important points from Chapter 1. First, I remind you that each algebraic operator takes at least one relation as input and produces another relation as output. Second, I remind you that the fact that the output is the same kind of thing as the input they're all relations is the closure property of the algebra, and it's that property that lets us write nested relational expressions. Third, I gave outline descriptions in Chapter 1 of what I called "the original eight operators" (restrict, project, product, intersect, union, difference, join, and divide); now I want to define those operators, as well as a number of others, much more carefully. Before I can do so, however, I need to make a few general points:

First, the operators are generic: they apply, in effect, to all possible relations. For example, we don't need one specific join operator to join employees and departments and another, different, join operator to join suppliers and shipments. (Incidentally, do you think an analogous observation applies to object-oriented systems?)
Second, the operators are read-only: they "read" their operands and return a result, but they don't update anything. In other words, they operate on relations, not relvars.
Of course, the previous point doesn't mean that relational expressions can't refer to relvars. For example, R UNION S, where R and S are relvar names, is certainly a valid relational expression in Tutorial D. In the context of such an expression, however, a relvar name doesn't denote the corresponding relvar as such; rather, it denotes the relation that happens to be the current value of that relvar at that time. In other words, we can certainly use a relvar name to denote a relation operand, and such a relvar reference thus constitutes a valid relational expression^[*] but in principle we could equally well denote the very same operand by means of an appropriate relation literal instead. (An analogy might help clarify this point. Suppose N is a variable of type INTEGER, and at time t it has the value 3. Then N+2 is certainly a valid expression, but at time t it means exactly the same as 3+2, no more and no less.)
^[*] Not in SQL, though! For example, if R and S are SQL table names, we typically can't write things like R UNION S; instead, we have to write something like SELECT R.* FROM R UNION SELECT S.* FROM S.
Last, given that the operators of the algebra are all read-only, it follows that INSERT, DELETE, and UPDATE (and relational assignment), though they are relational operators, aren't part of the algebra as such.

I also need to say something about the design of Tutorial D, because its support for the algebra differs in certain significant respects from that of SQL (perhaps it would be more accurate to say that its support for the algebra is deliberately, of course much more direct than that of SQL). The overriding point is that, in operations such as UNION or JOIN that need some correspondence to be established between operand attributes, Tutorial D does so by requiring the attributes in question to have the same names (as well as, necessarily, the same types). For example, here's a Tutorial D expression for the join of parts and suppliers on cities:

   P JOIN S

By definition, this join is performed on the basis of part and supplier cities, P and S having just the CITY attribute in common. Here by contrast is the same operation in SQL (note the last line in particular):

   SELECT P.PNO, P.PNAME, P.COLOR, P.WEIGHT, P.CITY,        S.SNO, S.SNAME, S.STATUS   FROM   P,S   WHERE  P.CITY = S.CITY

Note: Actually, this example can be formulated in many different ways in SQL, and here are three more. As you'll observe, the second and third are a little closer to the spirit of Tutorial D (note in particular that the result of the join in those two formulations has a column called just CITY and no columns called either S.CITY or P.CITY):

   SELECT P.PNO, P.PNAME, P.COLOR, P.WEIGHT, P.CITY,        S.SNO, S.SNAME, S.STATUS   FROM   P JOIN S   ON   P.CITY = S.CITY   SELECT P.PNO, P.PNAME, P.COLOR, P.WEIGHT, CITY,        S.SNO, S.SNAME, S.STATUS   FROM   P JOIN S   USING  ( CITY )   SELECT P.PNO, P.PNAME, P.COLOR, P.WEIGHT, CITY,        S.SNO, S.SNAME, S.STATUS   FROM   P NATURAL JOIN S

I chose the particular formulation I did partly because it was the only one supported in SQL as originally defined and partly, and more importantly, because it allows me to make a number of additional points concerning differences between SQL and the algebra (at least as realized in Tutorial D):

SQL permits (and sometimes requires) dot-qualified names. Tutorial D doesn't.
Tutorial D sometimes needs to rename attributes in order to avoid what would otherwise be naming clashes or mismatches. SQL usually doesn't (though it does support an analog of the RENAME operator that Tutorial D uses for the purpose, as we'll see in the next section).
Partly as a consequence of the previous point, Tutorial D has no need for SQL's "correlation names" (in effect, it replaces SQL's correlation-name concept by the idea that attributes occasionally need to be renamed in order to "disambiguate" what would otherwise be a syntactically invalid expression).
In addition to either explicitly or implicitly supporting certain features of the relational algebra, SQL also explicitly supports certain features of the relational calculus (EXISTS is a case in point see Appendix A). Tutorial D doesn't. One consequence of this difference is that SQL tends to be a rather redundant language, in that it often provides numerous different ways of formulating the same query (a fact that can have serious negative implications for the optimizer).^[*]
^[*] I once wrote a paper on this topic called "Fifty Ways to Quote Your Query" (www.dbpd.com, July 1998), in which I showed that even a query as simple as "Get names of suppliers who supply part P2" can be expressed in well over 50 different ways in SQL.
SQL requires most queries to conform to its SELECT - FROM - WHERE template. Tutorial D has no analogous requirement. I'll have more to say on this particular issue in the section Extend and Summarize," later in this chapter.

In what follows, I'll show most examples in both Tutorial D and SQL.