6.6 Job Scheduling

9.2 Analysis of Integer Sort
The conventional way to assess the quality of a parallel implementation is by measuring its speedup, or one of the algebraically related quantities, efficiency or overhead. If T(P) is the time to execute a particular problem on P processors, then
speedup s(P) = T(1)/T(P)
(9.2.1)
efficiency e(P) = T(1)/(P* T(P)) = s(P)/P
(9.2.2)
overhead v(P) = (P * T(P) - T(1))/T(1) = (1 -e)/e
(9.2.3)

In these formulae, T(1) should be understood to measure the best available implementation on a single processor. This is not necessarily the same as running the parallel code with, e.g., mpirun -np 1. Overhead is particularly useful because it is roughly additive. That is, the overhead in an implementation is approximately the sum of distinct contributions. Total overhead measures the difference between the time on P processors from the time one would expect from perfect-speedup of a single-processor implementation. If one can identify and measure a time interval in a parallel implementation that is not present in a sequential case, that time is a contribution to overhead. The sum of all of such times, normalized to the overall single-processor time is the total overhead. Thus, one can independently assess different kinds of overhead in a parallel implementation to develop a predictive model for how that implementation will perform. The most important sources of parallel overhead are:
9.2.1 Communication
Interprocessor communication does not take place in a sequential implementation, so any time spent communicating in the parallel code contributes to overhead. Fortunately, it is easy to estimate, and is also very likely the largest contribution to the overhead in the sort example. Each processor calls MPI_Alltoall to communicate an array of nproc integers, and then calls MPI_Alltoallv to transmit and receive arrays approximately equal to nelem_local in size. While the underlying implementation of MPI_Alltoall is hidden, the interested Beowulf users can pierce the veil and examine the source code. In MPICH at least, MPI_Alltoall and MPI_Alltoallv are implemented fairly naively as loops over point-to-point communication calls. Therefore, we can estimate the total time in communication as:
Tcomm = 2 * P * tlatency + sizeof(local arrays)/bandwidth

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net