7.2 Broad Categories of Parallel Algorithms

Table of content



		sorting integers, and its sequential performance could be improved (for this special case) by emulating the behavior of the parallel program, i.e., first pre-sorting the data into bins of fixed width and then sorting the bins.



		9.4 Example: Sorting with User-supplied Comparator



		The integer sort function discussed in the previous section performs reasonably well on a Beowulf. Performance improvements are most likely to come from use of a faster sequential sort routine. While this is a fascinating topic, it is not particularly relevant to programming Beowulfs. The interested reader can consult any of the standard texts on algorithms.⁵'⁶



		Another way to improve the sort routine would be to relax some of the restrictions that it imposes on the input data. In particular, an interface more like the standard ANSI C qsort function would be far more flexible. The caller can still specify the input and output data with a parray, but now one also specifies a comparison function similar to that passed to qsort to define the order relationship between objects.



		Since it was successful for integers, it seems reasonable to adopt the same overall strategy for sorting more general objects. There will be an initial scan in which objects are categorized only to the extent of deciding in which process they belong. Then a call to MPI_Alltoallv will shuffle the data between processes so that every process has its final compliment of objects. Finally, a call to qsort will finish the procedure. The crucial difference between this problem and the previous one is that it is no longer obvious how to decide which objects belong to which processors. We cannot simply divide an integer key by a range to get a target processor number. In fact, there is no integer key at all. The only way to determine the ordering of objects is to call the compar function. This suggests an approach based on somehow selecting a set of ordered, uniformly spaced elements, "fenceposts," and using them to define the range of elements destined for each processor. When each element is tested, we perform a binary search over the fenceposts to determine which processor the element is destined for.



		We are still left with the problem of choosing the fenceposts. We could just choose one element at random from each processor, but that runs the risk of introducing very bad load imbalance. There is no guarantee (or even likelihood) that the

⁵D. E. Knuth, The Art of Computer Programming volume 3: Sorting and Searching, Addison Wesley, 1973



		⁶Aho, Hopcroft and Ullman, Design and Analysis of Computer Algorithms, Adison-Wesley, 1974

Table of content

How to Build a Beowulf

How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)

ISBN: 026269218X
EAN: 2147483647

Year: 1999
Pages: 134

Authors: Thomas Sterling, Daniel F. Savarese, Donald J. Becker, John Salmon

Interprocess Communications in Linux: The Nooks and Crannies

Identifying and Managing Project Risk: Essential Tools for Failure-Proofing Your Project

WebLogic: The Definitive Guide

802.11 Wireless Networks: The Definitive Guide, Second Edition

Special Edition Using FileMaker 8

Java All-In-One Desk Reference For Dummies

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net