7.2 Broad Categories of Parallel Algorithms

sorting integers, and its sequential performance could be improved (for this special case) by emulating the behavior of the parallel program, i.e., first pre-sorting the data into bins of fixed width and then sorting the bins.
9.4 Example: Sorting with User-supplied Comparator
The integer sort function discussed in the previous section performs reasonably well on a Beowulf. Performance improvements are most likely to come from use of a faster sequential sort routine. While this is a fascinating topic, it is not particularly relevant to programming Beowulfs. The interested reader can consult any of the standard texts on algorithms.5'6
Another way to improve the sort routine would be to relax some of the restrictions that it imposes on the input data. In particular, an interface more like the standard ANSI C qsort function would be far more flexible. The caller can still specify the input and output data with a parray, but now one also specifies a comparison function similar to that passed to qsort to define the order relationship between objects.
Since it was successful for integers, it seems reasonable to adopt the same overall strategy for sorting more general objects. There will be an initial scan in which objects are categorized only to the extent of deciding in which process they belong. Then a call to MPI_Alltoallv will shuffle the data between processes so that every process has its final compliment of objects. Finally, a call to qsort will finish the procedure. The crucial difference between this problem and the previous one is that it is no longer obvious how to decide which objects belong to which processors. We cannot simply divide an integer key by a range to get a target processor number. In fact, there is no integer key at all. The only way to determine the ordering of objects is to call the compar function. This suggests an approach based on somehow selecting a set of ordered, uniformly spaced elements, "fenceposts," and using them to define the range of elements destined for each processor. When each element is tested, we perform a binary search over the fenceposts to determine which processor the element is destined for.
We are still left with the problem of choosing the fenceposts. We could just choose one element at random from each processor, but that runs the risk of introducing very bad load imbalance. There is no guarantee (or even likelihood) that the
5D. E. Knuth, The Art of Computer Programming volume 3: Sorting and Searching, Addison Wesley, 1973
6Aho, Hopcroft and Ullman, Design and Analysis of Computer Algorithms, Adison-Wesley, 1974

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net