Section 5.7. Partitioning and Data Distribution

5.7. Partitioning and Data Distribution

You may be tempted to believe that if we have a very large table and want to avoid contention when many sessions are simultaneously writing to the database, then we are necessarily better off partitioning the data in one way or another. This is not always true.

Suppose that we have a large table storing the details of orders passed by our customers. If, as sometimes happens, a single customer represents the bulk of our activity, partitioning on the customer identifier is not going to help us very much. We can very roughly divide our queries into two families: queries relating to our big customer and queries relating to the other, smaller customers. When we query the data relating to one small customer, an index on the customer identifier will be very selective and therefore efficient, without any compelling need for partitioning. A clever optimizer fed with suitable statistics about the distribution of keys will be able to detect the skewness and use the index. There will be little benefit to having those small customers stored into smallish partitions next to the big partition holding our main customer.

Conversely, when querying the data attached to our major customer, the very same clever optimizer will understand that scanning the table is by far the most efficient way of proceeding. In that case, fully scanning a partition that comprises, for example, 80% of the total volume will not be much faster than doing a full table scan. The end users will hardly notice the performance advantage, whereas the purchasing department will most certainly notice the extra cost of the separately priced partitioning option.

The biggest benefits to queries of table partitioning are obtained when data is uniformly spread in respect to the partitioning key.