Parallelism | Oracle9iR2 Data Warehousing

3.8 Parallelism

Many operations in a data warehouse involve processing large amounts of data. Bulk loads, creating indexes and summaries, sorting, and joining data from multiple tables can all take a considerable amount of time. Parallel execution can be used to reduce the time it takes to execute these operations.

Parallelism allows certain SQL statements to be divided transparently into several concurrently executing operations. By dividing the work among several processes on different processors, the statement can be completed faster than with only a single process.

In Oracle, parallel execution is performed using a parallel execution coordinator process and a pool of parallel execution servers. The Oracle process that handles the user's query becomes the coordinator process for that query. The coordinator process partitions the work to be done among the required number of parallel execution servers. It ensures that the load is balanced among the processes and redistributes work to any process that may have finished before the others. The coordinator receives the results from the parallel execution servers and assembles them into the final result.

The type of query whose performance improves the most by executing it in parallel is one that processes large amounts of data and returns a small result, such as an aggregation. For example, obtaining the total sales for the first four months of the year is shown in Figure 3.10.

click to expand
Figure 3.10: Parallel query.

In Oracle 9i, many operations can benefit from parallelism, including DDL statements (such as CREATE TABLE, CREATE INDEX, and CREATE MATERIALIZED VIEW), partition maintenance operations (such as TRUNCATE, EXCHANGE, MERGE, and SPLIT PARTITION), SELECT statements, and DML statements. Various operations such as joins, aggregation, sorting, and DISTINCT within a SELECT statement, can be parallelized. Partition pruning and partition-wise join, discussed earlier, greatly improve performance when used with parallel execution.

With Oracle 9i Release 2, parallel DML is now possible on partitioned as well as nonpartitioned tables. In prior releases, all DML types were supported for partitioned tables; however, only parallel INSERT was supported on nonpartitioned tables.

You must use the ALTER SESSION ENABLE PARALLEL DML in the session to allow DML statements to be parallelized. PL/SQL functions and table functions can also be parallelized in some cases.

The operation to be executed in parallel can be broken up in various ways. If a table or index is partitioned, as in the previous example, the coordinator may assign each parallel execution server to work on one or more partitions. Alternatively, a table or index may be divided up into ranges of physical data blocks on disk. Each parallel execution server works in parallel on different ranges of blocks.

You can set the degree of parallelism for a table or an index using the PARALLEL clause on CREATE or ALTER TABLE, for a SQL statement using optimizer hints and for a session using the ALTER SESSION FORCE PARALLEL statement. The following statement sets the degree of parallelism to four for the purchases table.

 ALTER TABLE purchases PARALLEL 4;

If not specified using these methods, Oracle will determine the degree of parallelism using the number of CPUs or number of partitions. For good performance, the number of partitions should be a multiple of the degree of parallelism. Otherwise, some parallel execution servers may remain idle waiting for others to complete a certain operation before beginning the next one. The query optimizer will try to use the specified degree of parallelism, but if several queries are competing for resources, it may adjust it to a lower value accordingly.

At database startup, Oracle will start as many parallel execution servers as specified by the PARALLEL_MIN_SERVERS initialization parameter. These are available for use by any parallel operation. The query coordinator obtains the required number of parallel execution servers from the pool when needed to execute a parallel operation. When processing is complete, the coordinator returns the parallel execution servers to the server pool.

If there are a large number of concurrent users executing parallel statements, additional parallel execution servers can be created. The initialization parameter, PARALLEL_MAX_SERVERS, specifies the maximum number of server processes to create. When they are no longer needed, the parallel execution servers that have been idle for a period of time are terminated. The pool is never reduced below the PARALLEL_MIN_SERVERS parameter. If parallel execution servers are unavailable, the operation will execute in serial. Sometimes a parallel operation cannot be executed efficiently unless a certain number of servers are available. The initialization parameter, PARALLEL_MIN_PERCENT, is used to specify the minimum percentage of requested parallel execution servers needed for the operation to succeed. If this percentage is not met, Oracle will return an error and you can try the operation again later.

In Oracle 9i, a new initialization parameter, PARALLEL_AUTOMATIC_TUNING, has been added to set default values for several parameters for tuning parallel execution. The new automatic memory management feature (section 3.6) eliminates the need to manually tune the work area sizes for many operations such as a hash join and sorting. This greatly eases the use of the parallel execution feature.

3.8.1 Hardware requirements for parallel execution

The data throughput requirements of a warehouse can often only be met with parallel execution. It is important to have the right hardware and system characteristics to get the benefits of parallel execution. The hardware selected for the data warehouse or data mart must be a multiprocessor machine, either a symmetric multiprocessor (SMP), cluster, or massively parallel processor (MPP) system. Uniprocessors have only a single CPU and are, therefore, appropriate only for small data marts.

MPP systems have several nodes connected together by a high-speed interconnect. In this "shared nothing" architecture, each node consists of a CPU, memory, and local disks and runs its own version of the operating system. To run on an MPP system, data must be partitioned appropriately across the disks. MPP systems can provide good performance but are not fault tolerant, and loss of a node can cause part of the data to be unavailable. They are also hard to manage, since addition of new nodes may require repartitioning of the data. Further, since all queries may not access all the data, the nodes may not be utilized uniformly.

SMP systems have multiple CPUs that share memory and disks via a common bus. There is only one version of the operating system running, which schedules tasks to execute on each CPU. Shared-disk systems require a distributed lock manager to coordinate data access by various nodes, which adds some overhead. SMP systems, however, do not suffer from the maintenance and fail-over problems of MPP systems and are a good choice for a warehouse. However, SMP systems are only available in configurations up to a certain number of processors, usually 64.

Clusters combine multiple nodes of a uniprocessor, or SMP system, together. Clusters are a good way to scale your data warehouse with increasing data needs. Clusters also provide excellent reliability, since failure of a node does not make data unavailable. Oracle 9i introduced Real Application Clusters (RAC) technology, which provides improved cluster support for the Oracle database. In a clustered database system, each node of the cluster runs an instance of the Oracle database server, all accessing the same underlying data. In addition to sharing the data on disk, RAC also provides sharing of buffer caches of the instances, thereby reducing disk I/O and improving performance. If parallel execution is desired, a single SQL query can be distributed across slave processes running on multiple database instances. Oracle will determine at run time whether to run the query on a single instance or multiple instances to keep internode communication to a minimum.

Parallelism improves performance only if you have the necessary hardware resources available. If your CPUs or disks are already heavily loaded, upgrade your hardware configuration before enabling parallelism.