The main focus of this chapter has been on logical database design theory, by which I mean, essentially, normalization and orthogonality (the scientific part of the design discipline). The point is that logical design, unlike physical design, is or should be quite DBMS-independent, and as we've seen there are some solid theoretical principles that can usefully be applied to the problem. One point I didn't call out explicitly in the body of the chapter is that logical design should generally aim to be application-independent, too, as well as DBMS-independent. The aim is to produce a design that concentrates on what the data means, rather than on how it will be used and I emphasized the significance of constraints and predicates ("business rules") in this connection. The database is supposed to be a faithful representation of the semantics of the situation, and constraints are what represent semantics. Abstractly, then, the logical design process goes like this:
Incidentally, one reason application independence is desirable is that we never know all of the uses to which the data will be put. It follows that we want a design that will be robust, one that won't be invalidated by new processing requirements. Much of design theory has to do with reducing redundancy: normalization reduces redundancy within relvars, orthogonality reduces it across relvars. My discussion of normalization concentrated on BCNF and 5NF, which are the normal forms with respect to FDs and JDs, respectively. (However, I did at least mention other normal forms, including 6NF in particular.) I pointed out that normalization makes certain constraints easier to state (and perhaps enforce); equivalently this is something else I didn't say earlier it makes more single-tuple updates logically acceptable than would otherwise be the case (because unnormalized designs imply redundancy, and redundancy implies that sometimes we have to update several things at once). I explained that normalization is really formalized common sense. I also gave a logical and possibly unfamiliar argument, having to do with retrieval rather than update, for not denormalizing; here let me add that although the cry is always "denormalize for performance," denormalization can actually be bad for performance (both retrieval and update performance, too), as you probably know. In fact, "denormalizing for performance" usually means improving the performance of one application at the expense of others. I also described The Principle of Orthogonal Design (more formalized common sense), and I offered a few remarks on physical design. First, the physical design should be derived from the logical design and not the other way around. Second, it would be nice to get away from the currently ubiquitous direct-image style of implementation. Third, it would also be nice if physical design could be fully automated, and I held out some hope in this regard. One last point: I want to stress that the principles of normalization and orthogonality are always, in a sense, optional. They aren't hard and fast rules, never to be broken. As we know, sometimes there are sound reasons for not normalizing "all the way" (sound logical reasons, I mean; I'm not talking about "denormalizing for performance"). Well, the same is true of orthogonality although, just as a failure to normalize all the way implies redundancy and can lead to certain anomalies, so too can a failure to adhere to orthogonality. Even with the design theory I've been describing in this chapter, database design usually involves trade-offs and compromises. |