6.1.2 The Universally Accessible Machine

Irregular structures often require careful consideration for parallelization. Usually, an irregular data structure is chosen to exploit or accommodate some feature of the underlying problem - sparseness, hierarchies, a need for unpredictable updates, etc. It is crucial that the parallel implementation continue to exploit or accommodate these features or overall performance may suffer tremendously. On the other hand, it is often the case that the precise form of a data structure chosen by a sequential implementation is not important - e.g., a tree may be implemented sequentially with pointers, but it may be possible to implement parallel tree search without implementing the full semantics of distributed pointers. This can require significant programming effort, as well as a deep understanding of both the computational methods, and the underlying problem domain.
7.2.2 Synchronous and Asynchronous
In a synchronous algorithm, the parallel parts of the calculation are required to remain in ''lockstep." Conversely, in an asynchronous algorithm, different parts of the calculation may "get ahead" without adverse effect. Asynchronous algorithms often require less bookkeeping than synchronous. Because of the minimal bookkeeping, asynchronous algorithms are sometimes much easier to parallelize. They may be preferred, even when there is a slightly superior sequential alternative. If the performance penalty is not too great, this can allow one to save the most valuable commodity of all in parallel computing: programmer time. Although not always the case, genetic algorithms, data analysis, rendering of animations, and Monte Carlo methods often allow for asynchronous execution.
7.2.3 Coarse and Fine Grained
Grain size, when used in parallel computation, refers to how much work is performed by each process. There are often very elegant small-grain formulations of computational algorithms. For example, a vector dot-product can be formulated with one logical processor responsible for each pair of elements. The time required is proportional to the logarithm of the number of elements. While elegant, this formulation places very high demands on the communication network of the parallel computer. Alternatively, a large grain size formulation would follow the structure in Program 8.7 and assign many elements of the vectors to each logical process. If the grain size is large enough, then the interprocessor communication, MPI_Reduceall in this case, does not dominate the total time required by the algorithm.
Frequently, the relationship between grain size computation and communication can be understood in terms of the ratio of surface area to volume. Communication

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net