1292-1294 | Oracle Development Unleashed (3rd Edition)

Previous Table of Contents Next

Page 1292

Frequent Need for Repartitioning

When data access requirements change over time, as they frequently do in real life, data has to be repartitioned to achieve acceptable performance levels. Further, repartitioning is also required with any of the following:

Significant growth in data volumes
Addition of processing nodes
Shifts in data distribution patterns resulting in skewed partitions

Such frequent repartitioning of data adds substantial administrative costs and application downtime, limiting the practical applicability of such a scheme. Partitioning-based parallel execution is an inappropriate approach on shared-everything SMP systems. Some level of data partitioning may be desirable on these systems from a manageability point of view as data volumes grow. Oracle8 delivers partitioned tables and indexes to meet these challenges. However, the need to partition all data, irrespective of the administrative needs, is an artificial requirement imposed solely based on the architectural limitation of pure shared-nothing database systems. Such a requirement introduces unneeded administrative complexities and prevents a vast majority of users on SMP systems from realizing the benefits of parallel technology. In short, the static parallel execution strategy requires frequent data reorganizations at substantial cost and downtime, just to achieve reasonable performance for a limited range of queries.

In contrast, Oracle's dynamic parallel execution strategy delivers the full potential of parallel processing for a much broader range of access patterns, including ad hoc queries, without any need for frequent, expensive repartitioning of data. The manageability benefits of data partitioning can be achieved today using a manual partitioning scheme involving multiple underlying physical tables combined using a UNION ALL view. Comprehensive support for static data partitioning will be available in the next version of the Oracle server. The key difference, however, is that Oracle's dynamic parallel execution strategy will provide all the benefits of data partitioningimproved VLDB manageability and optimization without incurring the costs of a strategy that's exclusively based on partitioning.

Leverage of Processing Power

The tight data ownership scheme that is central to the shared-nothing architecture prevents the exploitation of all available processing resources. There are many cases during normal operation where the available processing power of a pure shared-nothing system is limited to a subset of nodes. Data distribution across partitions typically tends to get skewed over time, with additions and purges. In such cases, the largest partition, or equivalently the slowest processing node, dominates the response time, resulting in suboptimal resource utilization. Even in cases where data is fairly uniformly spread across partitions, each query may not access all partitions. An obvious worst-case example is a query that does a full scan of a single partitionthis will be exclusively processed by the owning node, resulting in serial execution, even on massively parallel hardware!

Page 1293

This fundamental limitation of pure shared-nothing systems is particularly significant in the following common situation. Data warehouse environments frequently maintain rolling windows of data partitioned by some time unit, say one month. For example, sales data from a fixed number of months may be kept in a table; new data is rolled in on a monthly basis while the oldest month's data is rolled out. Drill-down queries against such data typically involve a particular month or subset of months. For example, a query might generate a report of revenue and profit by product line for a particular fiscal quarter. In a pure shared-nothing system, the processing power available for this query would be restricted to just the nodes that own the three months' data, leveraging only a fraction of the full processing power of the system.

Some shared-nothing systems attempt to rectify their lack of intrapartition parallelism using a multilevel partitioning scheme, sometimes called "hybrid partitioning." In this scheme, data is initially distributed across all nodes using a hash function. At the next level, data within each node is partitioned by key range into a number of subpartitions. The idea is that the hash step will ensure the participation of all nodes in processing a query because the distribution is somewhat random. Within each node, if a significant number of partitions can be eliminated based on the query predicate, each node will process only a subset of the range partitions, improving response time. This may seem like a good idea in principle until you look at the underlying administration issues. On an MPP system with 32 nodes, for example, if each hash partition is subdivided into 25 ranges, there will be a total of 800 partitions to create and manage! What's worse , as the partitions get skewed over time, as they often do, the administrator has to repartition all these 800 pieces all over again. This results in nothing short of an administrative nightmare. Furthermore, if these 800 partitions relate only to a single table there will likely be several thousand of these partitions to manage in a typical system.

Oracle's parallel architecture, with its intelligent combination of data locality and shared access strategies, is far less prone to these data skew problems. Initial partitioning of data is based on maximizing locality. In cases of data skew, however, idle or underutilized nodes can transparently compensate for busy nodes, utilizing shared-disk access, to deliver optimal performance.

Ease of Administration

The excessive dependence of shared-nothing systems on static data partitioning introduces substantial administrative complexities. Users have just one controlstatic data partitioningfor achieving two potentially conflicting objectives: manageability and parallel execution performance. What's desirable from a manageability and availability point of view may not yield acceptable performance and vice versa, forcing users to compromise one or the other. Further, the frequent need to perform expensive offline data repartitioning imposes further administrative burden , potentially rendering this approach impractical in a number of situations. The rigid data ownership model in shared-nothing systems also prevents incremental system growthdata has to be repartitioned before any additional processing resources can be utilized negating one of the major benefits of parallel hardware systems. In contrast, Oracle's real-world architecture delivers the true potential of parallel processing with minimal administrative complexity.

Page 1294

Robust Scalability

In a pure shared-nothing architecture, the ideal speedup goal of processing a query in time (T/N) is seriously compromised by the failure of a single processor. In the simplest pure shared-nothing configuration, each processing unit masters a set of disks and no other processing unit has direct access to that set of disks. If the work of executing a query has truly been spread over the N processors, then the failure of a single processor stops the execution of the query for the duration of the outage because no other processing unit has access to the failed processor's set of disks.

To get around this problem, some shared-nothing implementations utilize dual-ported disks, and each processing unit serves as both a master to a particular set of disks and as a backup for another processor unit. If a processor fails, the associated backup does double duty, taking over the work of the failed processor as well as processing its own normal workload. Therein lies the rub. Because the backup now has 2/N of the total work to do, it will finish in 2T/N the time. The completion time for the query is really the completion time of the slowest processing unit, so the entire query will finish in 2T/N the time instead of T/N. In other words, during periods of outage of even a single processor, one gets only half the performance that one paid for. To make matters worse, the time that some processing unit in the system is down is roughly proportional to the number of processing units, so this problem actually gets worse as one adds processorsa kind of reverse scalability feature. Oracle's unrestricted access implementation does not have this problem. Because every processing unit has logical access to every disk, work associated with a failed processor can be spread among all the remaining processors, providing robust scalability and virtually no degradation in performance.

Emerging Trends in Technology

Oracle's parallel technology, with its ability to provide transparent shared data access, is consistent with emerging trends in hardware technology. The next generation of mass storage products in the open systems market will most likely be based on high-performance switched protocols such as Fiber Channel, capable of supporting 100M per second bidirectional bandwidth (200M/sec total) per connection. Fiber Channel is gaining considerable momentum in the mass storage world, and large-scale, shared-disk physical connectivity is likely to be available in the near future.

Arbitrated-loop Fiber Channel configurations connecting large numbers of processors to a shared set of disks are being developed even now. Switch-based Fiber Channel configurations will offer even more flexibility and scalability. The switched configuration will be proportional to the number of ports on the switch. This greatly enhanced physical connectivity can be transparently exploited by the unrestricted access solution provided by Oracle's parallel architecture. In fact, in the face of ever-increasing shared-disk physical connectivity, it will become increasingly difficult to defend rigid shared-nothing implementations in which each processor can only get to the vast majority of the data on the system via a networked request to another processor.

Previous Table of Contents Next