The natural question is which of these many techniques is best? Well, that all depends. You must weigh the options with regard to the nature of your data. What works best in one case may not work at all in another. That said, here's what I found on the 7-Eleven data warehouse (shown in Table 8-1). Table 8-1. Performance Charcteristics for Various Table Implementation Options
From these results, we see that simple partitioning gave the best results. But, let me reiterate that these results are specific to a particular data warehouse's data and the nature of the end-users' queries. You should perform similar benchmarks against your data to be absolutely sure. Remember that what often looks good on paper may well under-perform in reality. So don't go into this with any preconceived favorites or other prejudices. Let the chips fall where they will, and implement the choice that works best for your data. When in doubt, or if you don't have the time to benchmark, just go with simple range-based partitioning along a time dimension. In most cases, range partitioning will be a safe and near optimal choice. |