Query Parallelism

Team-Fly    

 
DB2 Universal Database for OS/390 v7.1 Application Certification Guide
By Susan Lawson
Table of Contents
Chapter 17.  Application Performance and Optimization

Query Parallelism

In order to reduce elapsed time for a query, DB2 can provide a query with parallel resources, such as several I/O paths or processors. By taking advantage of these resources, queries can run in a shorter period of time, allowing more work to be pushed through the system. Parallelism can help improve the performance of I/O and CPU bound read-only queries. It can help queries that are reading large amounts of data, regardless of the filtration.

There is some overhead associated with the use of parallelism in terms of CPU. DB2 scales processor- intensive work across all available processors. Parallelism can average less than 1 percent additional CPU overhead for long-running queries and less than 10 percent for short-running queries.

I/O and CPU Parallelism

DB2 can utilize two different methods for achieving query parallelism: I/O or CPU. With I/O parallelism the goal is to move elapsed time towards CPU time by splitting data access into equal, sequential prefetch streams to bring I/O time down to estimated CPU time. If CPU is estimated at 1 second, and I/O at 3 seconds, the three I/O parallel streams of approximately equal size will be started. Each I/O stream should cost about 1 second. This is implemented with a round- robin type of GET paging. With current releases of DB2, I/O parallelism is very infrequently chosen , instead the more preferred method is CPU parallelism.

The goal of CPU parallelism is to move elapsed time towards CPU time, by splitting queries into equal multiple smaller queries and process those queries in multiple execution units, or parallel tasks . At execution time, DB2 will take into consideration the number of CPUs available, and if there are not enough CPUs to support the degree of parallelism initially chosen by the optimizer, the degree will be degraded.

When Parallelism Is Chosen

DB2 parallelism is decided at both the time of the bind and at runtime. If parallelism is not chosen at bind time, there is no possibility of it being chosen at runtime. Even if parallelism is chosen at bind time, it may not be used at runtime due to several factors. If there is not enough space in the virtual buffer pool to support the requested degree of parallelism, the degree can be reduced from that chosen at bind time or parallelism can be turned all together. If host variables are used in the SQL query, this can prevent DB2 from determining which partitions will qualify in a query; therefore the degree chosen for parallelism will be decided at runtime. If the machine on which DB2 is running does not have hardware sort at runtime, parallelism will be disabled. If DB2 determines that an ambiguous cursor can be updated, then parallelism will be disabled. If parallelism is disabled, the query does not failDB2 simply uses a sequential plan for access to the data.

During BIND or PREPARE, DB2 chooses the access path best suited for the query and then does a postoptimization step to identify the sections of the access path that will benefit most from parallelism. It then identifies the part of the actual query that can be executed in parallel, and then determines the degree of parallelism to be used.

Sysplex Query Parallelism

Sysplex query parallelism works in much the same multitasking way as CPU parallelism; in addition, it gives us the ability to take a complex query and run across multiple members in a data sharing group .

Sysplex query parallelism is best used with isolation level UR (uncommitted read) to avoid excess lock propagation.

A query is issued by a coordinator, who sends the query to the assistant members in the group. The data is then processed and returned to the coordinator either by a work file (the coordinator will read each assistant's work files) or by cross-system coupling facility (XCF) links when a work file is not necessary.

Queries Best Suited for Parallelism

Queries with the following characteristics will be able to take advantage of parallelism:

  • Long-running, read-only queries, both static and dynamic SQL, from both local and remote sites, and when using either private or DRDA protocols

  • Tablespace scans and index scans

  • Joins

  • Nested loop

  • Merge scan

  • Hybrid without sort on new tables

  • Sorts

  • Aggregate functions

There are only a few places where parallelism will not be considered :

  • Queries that use materialization of views

  • Queries that perform materialization because of nested table expressions

  • Queries performing a merge scan join of more than one column

  • Queries using direct row access

The following will cause only Sysplex parallelism not to be considered:

  • Queries with list prefetch and multiple index access

  • Queries accessing LOB data

Parallelism should not be used if a system is already CPU-constrained, because parallelism would only add to the problem in most situations.

NOTE

graphics/note_icon.jpg

CPU parallelism cannot be used when a cursor is defined WITH HOLD, since this cursor's use is potentially interrupted by a commit, which causes a stoppage in processing.


Short-running queries are usually not going to see a great benefit from parallelism ( generally , a short-running query is one that is subsecond). But how often are long-running queries separated from short-running queries? Well, if you are trying to get the benefits from parallelism without placing unnecessary overhead where it does not belong, consideration for this type of granularity of processing will need to be done.

There are two potential options: You could separate the long-running queries into a separate package and bind it DEGREE(ANY) in a different collection, and then use the SET CURRENTPACKAGE statement to switch between it and a program bound with DEGREE(1) for shorter queries that are better to let run sequentially. Or, you can set the macro SPRMPTH to disable parallelism for short-running queries. The default for this value is 120; thus, any query with an estimated cost of less than 120 milliseconds will have parallelism disabled. Parallelism is enabled for any query above this threshold.


Team-Fly    
Top


DB2 Universal Database for OS. 390 v7. 1 Application Certification Guide
DB2(R) Universal Database for OS/390 V7.1 Application Certification Guide (IBM DB2 Certification Guide Series)
ISBN: 0131007718
EAN: 2147483647
Year: 2002
Pages: 163
Authors: Susan Lawson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net