Miscellaneous Issues | Database in Depth: Relational Theory for Practitioners

In this section, I want to address a few integrity-related issues that don't fit very well in any of the preceding sections.

First, since it's basically a boolean expression that must evaluate to TRUE, it follows that from a formal perspective a constraint is a proposition. Here for example is constraint C1 from the earlier section "Database Constraints":

   CONSTRAINT C1 IS_EMPTY ( S WHERE STATUS < 1 OR STATUS > 100 ) ;

Given a particular value for relvar S, the boolean expression:

   IS_EMPTY ( S WHERE STATUS < 1 OR STATUS > 100 )

(which might be thought of, loosely, as the constraint proper) is certainly either true or false, unconditionally, and that's the definition of what it means to be a proposition (see Chapter 4).

Second, suppose relvar S already contains a tuple that violates constraint C1 when the CONSTRAINT statement just shown is executed. Then that execution must fail, of course. More generally, whenever we try to define a new database constraint, the system must first check to see whether that constraint is satisfied by the database at that time. If it isn't, the constraint must be rejected; otherwise, it must be accepted and enforced from that point forward.

Third, recall from Chapter 1 that the relational model includes what I called a "generic" integrity rule: namely, the referential integrity rule (I deliberately ignore the entity integrity rule). But it should be clear that the referential integrity rule is different in kind from the constraints we've been examining in this chapter. It's really a metaconstraint, in a sense; it says, for example, that in the specific database containing relvars S, P, and SP, there must be certain specific constraints (foreign key constraints) between SP and S and between SP and P because if there aren't, then that database might violate the referential integrity metaconstraint. Likewise, in the specific database containing relvars EMP and DEPT (see Chapter 1), there must be a specific foreign key constraint between EMP and DEPT, because if there isn't, then again that database might violate the referential integrity metaconstraint.

Fourth, I never mentioned the point explicitly, but I trust it's obvious that we want constraints to be stated declaratively. Although the SQL standard does include fairly extensive support for declarative constraints, at least some of the major SQL products don't; instead, they assume you'll use triggered procedures also known as just triggers to enforce integrity. (The standard includes explicit support for triggers, too.) As I pointed out in Chapter 1, however, declarative solutions are always to be preferred over procedural ones, if they're available. Also, declarative constraints in particular open the door to the possibility of doing semantic optimization, which triggers don't.

Another issue I didn't mention previously is the possibility of supporting transition constraints. A transition constraint is a constraint on the legal transitions that variables of some kind (relvars in particular) can make from one value to another. For example, a person's marital status can change from "never married" to "married" but not the other way around. Here's an example ("No supplier's status must ever decrease"):

   CONSTRAINT C8 IS_EMPTY      ( ( ( S' { SNO, STATUS } RENAME ( STATUS AS STATUS' ) )        JOIN        ( S { SNO, STATUS } ) )      WHERE STATUS' > STATUS ) ;

Explanation: I'm adopting the convention that a primed relvar name such as S' refers to the indicated relvar as it was prior to the update under consideration. Constraint C8 thus says: "If we join the old value of S and the new one and restrict the result to just those tuples where the old status is greater than the new one, the final result must be empty." (Since the join is on SNO, any tuple in the join for which the old status is greater than the new one would represent a supplier whose status had decreased.)

Last, I hope you agree from everything we've covered in this chapter that constraints are vital and yet they seem to be very poorly supported in commercial products; indeed, they seem to be underappreciated at best, if not completely misunderstood. The emphasis in the commercial world always seems to be on performance, performance, performance; other objectives, such as ease of use, data independence, and in particular integrity, seem so often to be sacrificed to or at best to take a back seat to that overriding goal.^[*] But what's the point of a system performing well if we can't be sure the information we're getting from it is correct? Frankly, I don't care how fast a system runs if I don't feel I can trust it to give me the right answers to my queries.

^[*] I don't mean to suggest that proper support for integrity checking implies bad performance; in fact, I believe quite strongly that it should improve performance. All I mean is that there tends to be a huge emphasis in vendor development effort on performance issues, to the exclusion of other matters such as data integrity.