| < Day Day Up > |
|
Running queries in parallel, as opposed to running them serially, can have benefits on performance. However when defining parallel queries, the following should be taken into consideration:
Parallel query has a significant startup cost.
Multiple slave processes consume more CPU than single processes.
Each slave process must have its own address space (memory allocation).
Parallel query was designed to reduce execution time for queries that had no option but to read large quantities of data. It maximizes the data throughput by spreading the read workload across multiple processes or slaves. Sort operations are also managed using the slaves' (TQ) structures. This has the effect of increasing the CPU load on the system as a whole. If the system is running at maximum CPU utilization, parallel query will not get any more out of the query. If no more CPU is available, optimizer will, based on the resources available, make the decision to serialize the parallel operations.
The additional I/O requirements of the slave processes reading the data can also stress a heavily loaded I/O subsystem. Distributing data across multiple disk spindles may help alleviate situations where disk I/O may become a bottleneck. Additionally, queries may just run quicker serially. Typically queries that use index lookups are not suited for PQO.
Nested loops vs. hash/sort merge joins: Typically parallel query tries to use full table scan (FTS) to access data, while lookups are divided between slaves on a ROWID range basis. Characteristically, nested loop joins are not really efficient with FTS. Hash joins and sort merge options are usually much more efficient at processing large amounts of data. However, there is a downside as hash join and sort merge join do not allow row elimination based on data from a driving row source. This elimination can drastically reduce the data sets involved and can mean that a serial access path using index lookups is quicker than a parallel operation simply due to the volume of data eliminated.
Data splitting: Dividing the data up, passing it through multiple processes and collating the results may make the cost of slave generation greater than retrieving the data serially.
Data skew: Parallel queries divide data between reader slaves on a ROWID range basis. Although the same number of blocks is given to each slave it does not mean that these blocks contain identical numbers of rows. In fact, some of the blocks could be completely empty. This can be especially problematic where large quantities of data are archived and deleted as this results in many empty, or sparsely filled, blocks. The effect of this non-uniform data distribution is to cause queries to run slower than they might otherwise. This is because one slave does all the work. Under such situations the only option that is available is to reorganize the data.
In this chapter the details of parallel processing and the various options and features of parallel execution available under Oracle RDBMS were explored. We demonstrated that parallel processing, with respect to RAC, is not much different from the traditional stand-alone implementation; it is just an extension to the functionality on a single stand-alone instance. The major difference, or advantage, in a RAC environment is that the parallel operation not only can be accomplished on the single instance but also can be processed on the other nodes that participated in the cluster configuration.
Parallel processing in RAC is done not by using the cluster interconnect to transfer information back and forth but rather by using the technology of transferring functional packets across the nodes participating in a cluster.
In the next chapter we will discuss some of the criteria to be considered while designing databases for a RAC implementation. As part of these discussions we will also cover the various features of Oracle that should be considered during the design process to take advantage of the availability and scalability features of RAC.
| < Day Day Up > |
|