Data Value Atomicity | Database in Depth: Relational Theory for Practitioners

I hope the previous section succeeded in convincing you that domains are indeed types, no more and no less. Now I want to turn to the issue of data value atomicity and the related notion of first normal form (1NF for short). In Chapter 1, I said that 1NF meant that every tuple in every relation contains just a single value (of the appropriate type, of course) in every attribute position and it's usual to add that those "single values" are supposed to be atomic. But this latter requirement raises the obvious question: what does it mean for data to be atomic?

Well, on page 6 of the book mentioned earlier, Codd defines atomic data as data that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)." But even if we ignore that parenthetical exclusion, this definition is a trifle puzzling, and not very precise. For example, what about character strings? Are character strings atomic? Every product I know provides several operators on such strings LIKE, SUBSTR (substring), "||" (concatenate), and so on that clearly rely on the fact that character strings in general can be decomposed by the DBMS. So are those strings atomic? What do you think?

Here are some other examples of values whose atomicity is at least open to question and yet we would certainly want to be able to include as attribute values in tuples in relations:

Integers, which might be regarded as being decomposable into their prime factors (I know this isn't the kind of decomposability we usually consider in this context I'm just trying to show that the notion of decomposability itself is open to a variety of interpretations)
Fixed-point numbers, which might be regarded as being decomposable into integer and fractional parts
Dates and times, which might be regarded as being decomposable into year/month/day and hour/minute second components, respectively

Now I'd like to move on to a potentially more startling example. Refer to Figure 2-1. Relation R1 in that figure is a reduced version of the shipments relation from our running example; it shows that certain suppliers supply certain parts, and it contains one tuple for each legitimate SNO-PNO combination. For the sake of the example, let's agree that supplier numbers and part numbers are indeed "atomic"; then we can presumably agree that R1, at least, is in 1NF.

Figure 2-1. Relations R1, R2, and R3

Now suppose we replace R1 by R2, which shows that certain suppliers supply certain groups of parts (attribute PNO in R2 is what some would call multivalued, and values of that attribute are groups of part numbers). Then most people would surely say that R2 is not in 1NF; in fact, it looks like an example of "repeating groups," and repeating groups are the one thing that almost everybody agrees 1NF is supposed to prohibit (because such groups are obviously not atomic, right?).

Well, let's assume for the sake of the argument that R2 isn't in 1NF. But suppose we now replace R2 by R3. Then I claim that R3 is in 1NF!^[*] For consider:

^[*] I don't claim it's well designed indeed, it probably isn't but that's a separate issue. I'm concerned here with what's legal, not with questions of good design. The design of R3 is legal.

First, note that of course deliberately I've renamed the attribute PNO_SET, and I've shown the groups of part numbers that are PNO_SET values enclosed in set braces "{" and "}", to emphasize the fact that each such group is indeed a single value: a set value, to be sure, but a set is still, at a certain level of abstraction, a single value.
Second (and regardless of what you might think of my first argument), the fact is that a set like {P2,P4,P5} is no more and no less decomposable by the DBMS than a character string is. Like character strings, sets do have some inner structure; as with character strings, however, it's convenient to ignore that structure for certain purposes. In other words, if character strings are compatible with the requirements of 1NF that is, if character strings are atomic then sets must be, too.

The real point I'm getting at here is that the notion of atomicity has no absolute meaning;it just depends on what we want to do with the data. Sometimes we want to deal with an entire set of part numbers as a single thing, and sometimes we want to deal with individual part numbers within that set but then we're descending to a lower level of detail (a lower level of abstraction). The following analogy might help. In physics (which after all is where the terminology of atomicity comes from) the situation is exactly parallel: sometimes we want to think about individual physical atoms as indivisible things, and sometimes we want to think about the protons, neutrons, and electrons that go to make up those atoms. What's more, protons and neutrons, at least, aren't really indivisible, either they contain a variety of "subsubatomic" particles called quarks. And so on, possibly.

Let's return for a moment to relation R3. In Figure 2-1, I showed PNO_SET values as general sets. But it would be more useful in practice if they were, more specifically, relations (see Figure 2-2, where I've changed the attribute name to PNO_REL). Why would it be more useful? Because relations, not general sets, are what the relational model is all about.^[*] As a consequence, the full power of the relational algebra immediately becomes available for the relations in question they can be restricted, projected, joined, and so on. By contrast, if we use general sets instead of relations, then we need to introduce new operators (set union, set intersection, and so on) for dealing with those sets. Much better to get as much mileage as we can out of the operators we already have!

^[*] In case you're wondering, the crucial difference is that general sets can contain anything, whereas relations contain tuples specifically. Note, however, that a relation certainly resembles a general set in that it too can be regarded as a single value.

Figure 2-2. Relation R4 (a revised version of R3)

Attribute PNO_REL in Figure 2-2 is an example of a relation-valued attribute (RVA). Of course, the underlying domain is relation-valued too (that is, the values it's made up of are relations). I'll have more to say about RVAs in Chapters 5 and 7; here let me just note that SQL doesn't support them. (More precisely, it doesn't support what would be its analog of RVAs, table-valued columns though oddly enough it does support both (a) columns whose values are arrays and (b) columns whose values are "multisets of rows." A multiset, also known as a bag, is like a set except that it permits duplicates. Columns whose values are multisets of rows thus do look a bit like "table-valued columns" in some respects; however, they aren't table-valued columns, because the values they contain can't be operated upon by means of SQL's regular table operators.)

Now, I chose the foregoing example deliberately, for its shock value. After all, relations with RVAs do look rather like relations with repeating groups, and you've probably always heard that repeating groups are a no-no in the relational world. But I could have used any number of different examples to make my point: I could have shown attributes (and therefore domains) that contained arrays; or bags; or lists; or photographs; or audio or video recordings; or X rays; or fingerprints; or XML documents; or any other kind of value, "atomic" or "nonatomic," that you might care to think of. Attributes, and therefore domains, can contain anything (any values, that is). All of which goes a long way, incidentally, toward explaining why a true "object/relational" system would be nothing more nor less than a true relational system which is to say, a system that supports the relational model, with all that such support entails. After all, the whole point of an "object/relational" system is precisely that we can have attribute values in relations that are of arbitrary complexity. Perhaps a better way to say it is this: a proper object/relational system is just a relational system with proper type support which just means it's a proper relational system, no more and no less.