Clusters | SQL Performance Tuning

In the last section, we discussed affairs that relate to heap storage structures. We won't talk about the other important kindclustersin detail until we've given you a good grounding in indexes, in Chapter 9. We will take a moment, though, to mention clusters here.

A cluster is a structure for storing data in a specific order. ^[5] The main idea behind clusters is that rows should be kept in order according to some column value, such as the primary keythis is called the cluster key. A sequenced file has obvious advantages for access, so let's just consider the impacts on file organization.

^[5] In this section, we are talking about a type of cluster that we call "weak" clustering; see Chapter 9, "Indexes."

To begin with, it looks like INSERTs are harder with a cluster (because you have to make room between existing rows to fit in the new value according to cluster key order). But there is an offsetting effect: Two subsequent INSERTs probably won't go to the same place, so the contention problem that heaps have isn't as great.

Consider Oracle's clusters. Besides the two effects we've already mentioned, Oracle throws in a mixing feature. If two tables have columns with values in the same rangethink of primary key and foreign key tablesthen you can order them both in the same file. The upshot is that a primary key row and its matching foreign key row will probably end up in the same page, because the primary key and the foreign key have the same value and therefore sort to the same place. So when you access two joined rows, there is only one disk access. This destroys the claim that joins are inherently slow. Other DBMSs (e.g., IBM) allow mixing, but mixing makes no sense unless you can combine it with a feature that allows rows to be clustered near each other when they have similar contents.

You would expect Oracle clustering to be popular. It is not. It has a severe flaw: To use clustering, you must allocate a fixed size (e.g., two pages) for each cluster key. If the size is too big, space is wasted . If the size is too small, overflows happen. (Oracle calls this problem "collisions.") Oracle clustering is thus inappropriate except for a narrow set of applications. Still, this is an implementation problem onlythe idea itself is excellent . We regard the concept as worth mentioning because eventually Oracle gets it right, and if they don't, then somebody else will.

The Bottom Line: Clusters

INSERTs are harder with a cluster because you have to make room between existing rows to fit in the new value according to cluster key order. But two subsequent INSERTs probably won't go to the same place, so clusters reduce contention.