7.2.3 Coarse and Fine Grained

9.6 Summary
The implementation of sorting functions in this chapter has exposed a number of important techniques and guidelines that are generally useful when working with high-performance computers. We may wish it were otherwise, but parallel computers are difficult and idiosyncratic. This is equally true of Beowulfs and their more expensive commercial cousins. The following is a synopsis of a few of the lessons exposed in this chapter.
Trust no one Libraries, compilers, device drivers and operating systems were all written by mere mortals. Occasionally, one finds genuine bugs, but more frequently, one encounters unexpectedly poor performance for certain inputs or environmental conditions. In this chapter, we saw that the performance of MPIAlltoallv could be improved by the counter-intuitive step of issuing a sequence of smaller requests. Presumably even better performance could be obtained by diagnosing and repairing the cause rather than the symptom. The existence of open source implementations of critical system components makes this latter course conceivable.
A performance model It is crucial to have a qualitative model of how the program should behave with different problem parameters, system parameters, etc. Without such a model, one cannot identify sources of inefficiency or effectively tune or improve performance.
Instrumentation and graphs  Program instrumentation allows one to compare the actual behavior of the program with the model. Plots of overhead with respect to problem size and machine size should be studied for deviations, anomalies, unexpected bumps, wiggles, etc. If available, quantitative comparisons should be made, e.g., we found that the bandwidth delivered by MPI_Alltoallv was far below what we expected based on previous measurements of point-to-point communication performance.
Graphical tools Graphical tools can be helpful in certain cases, but they are no substitute for a semi-analytic performance model. Graphical performance analysis tools can give a good overview of what is happening in a single run, but usually cannot offer assistance with trend analysis, e.g., how does the performance change as one changes problem parameters, machine size, etc.
Superlinear speedup It is not impossible, but it probably means there's an opportunity for further sequential optimization. The fact that we see super-linear speedup in sorting is indicative of the fact that our sequential sort is non-optimal.

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net