Distributed Optimization

 <  Day Day Up  >  

Optimization in DB2 is usually a clear-cut matter. The DB2 optimizer is a state-of-the-art optimizer that, more often than not, can be relied upon to produce properly optimized access paths for SQL statements. The rule of thumb is to code as much work as possible into the SQL and let the optimizer figure out the best way to access the data. However, in a distributed environment, optimization is not quite so simple.

To understand this difference, consider a distributed implementation of the DB2 sample tables PROJ , PROJACT , and ACT . A project ( PROJ ) can have many activities, and each activity ( ACT ) can be a part of many projects. The PROJACT table resolves the many-to-many relationship. For more information on these tables, refer to Appendix D, "DB2 Sample Tables."

Assume that the PROJ and PROJACT tables exist at one location (say, Pittsburgh), and the ACT table exists at a different location (say, Chicago).

The task at hand is to retrieve a list of documentation activities for projects started after January 1, 2000. If DB2 provides distributed request support, the following query would satisfy this request:

 

 SELECT   A.ACTNO, A.ACTDESC FROM     ACT      A, PROJ     P, PROJACT  J WHERE    A.ACTNO = J.ACTNO AND      J.PROJNO = P.PROJNO AND      A.ACTKWD = "DOC" AND      P.PRSTDATE > "01/01/2000"; 

However, DB2 does not provide distributed request. Therefore, issuing this particular join is not possible. Lacking distributed request, what is the best way to satisfy this request? You can optimize this three-table join in (at least) six different ways:

  • Join PROJ and PROJACT at Pittsburgh, selecting only projects starting after January 1, 2000. For each qualifying row, move it to Chicago to be joined with ACT to see whether any design activities exist.

  • Join PROJ and PROJACT at Pittsburgh, selecting only projects starting after January 1, 2000. Then move the entire result set to Chicago to be joined with ACT , checking for design activities only.

  • At Chicago, select only design activities from ACT . For each of them, examine the join of PROJ and PROJACT at Pittsburgh for post-January 1, 2000 projects.

  • Select only design activities from ACT at Chicago. Then move the entire result set to Pittsburgh to be joined with PROJ and PROJACT , checking for projects started after January 1, 2000 only.

  • Move ACT to Pittsburgh and proceed with a local three-table join.

  • Move PROJ and PROJACT to Chicago and proceed with a local three-table join.

Determining which of these six optimization choices will perform best is a difficult task. Usually, performing multiple smaller requests to a remote location is worse than making a single larger request to the remote location. In general, the fewer messages, the better performance will be. However, this rule of thumb is not always true. Try different combinations at your site to arrive at the optimal method of performing distributed queries. The optimal choice will depend on the following:

  • The size of the tables

  • The number of qualifying rows

  • The type of distributed request being made

  • The efficiency of the network

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net