6.6 Job Scheduling

Table of content



		9.2 Analysis of Integer Sort



		The conventional way to assess the quality of a parallel implementation is by measuring its speedup, or one of the algebraically related quantities, efficiency or overhead. If T(P) is the time to execute a particular problem on P processors, then

speedup

s(P)

=

T(1)/T(P)

(9.2.1)

efficiency

e(P)

=

T(1)/(P* T(P)) = s(P)/P

(9.2.2)

overhead

v(P)

=

(P * T(P) - T(1))/T(1) = (1 -e)/e

(9.2.3)



		In these formulae, T(1) should be understood to measure the best available implementation on a single processor. This is not necessarily the same as running the parallel code with, e.g., mpirun -np 1. Overhead is particularly useful because it is roughly additive. That is, the overhead in an implementation is approximately the sum of distinct contributions. Total overhead measures the difference between the time on P processors from the time one would expect from perfect-speedup of a single-processor implementation. If one can identify and measure a time interval in a parallel implementation that is not present in a sequential case, that time is a contribution to overhead. The sum of all of such times, normalized to the overall single-processor time is the total overhead. Thus, one can independently assess different kinds of overhead in a parallel implementation to develop a predictive model for how that implementation will perform. The most important sources of parallel overhead are:



		*9.2.1 Communication*



		Interprocessor communication does not take place in a sequential implementation, so any time spent communicating in the parallel code contributes to overhead. Fortunately, it is easy to estimate, and is also very likely the largest contribution to the overhead in the sort example. Each processor calls MPI_Alltoall to communicate an array of nproc integers, and then calls MPI_Alltoallv to transmit and receive arrays approximately equal to nelem_local in size. While the underlying implementation of MPI_Alltoall is hidden, the interested Beowulf users can pierce the veil and examine the source code. In MPICH at least, MPI_Alltoall and MPI_Alltoallv are implemented fairly naively as loops over point-to-point communication calls. Therefore, we can estimate the total time in communication as:



		Tcomm = 2 * P * t_latency + sizeof(local arrays)/bandwidth

Table of content

How to Build a Beowulf

How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)

ISBN: 026269218X
EAN: 2147483647

Year: 1999
Pages: 134

Authors: Thomas Sterling, Daniel F. Savarese, Donald J. Becker, John Salmon

Agile Project Management: Creating Innovative Products (2nd Edition)

Certified Ethical Hacker Exam Prep

The .NET Developers Guide to Directory Services Programming

Oracle Developer Forms Techniques

AutoCAD 2005 and AutoCAD LT 2005. No Experience Required

Python Programming for the Absolute Beginner, 3rd Edition

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net