6.5.3 Restricting Host Access

Table of content



		9 Programming with MPI-A Detailed Example



		As discussed in the previous chapter, MPI presents the programmer with a distributed address space model. The programmer must therefore partition or decompose the data so that individual computational objects are assigned to individual processors. This decomposition can be either static, i.e., fixed once and for all, or dynamic, i.e., changing in response to the simulation or the system on which it is running.



		In the CA example in Chapter 8 the state of the system was described by a vector of integers. We partitioned that vector over a set of MPI processes so that each process was responsible for a contiguous block of data. The important feature of that partitioning was that it minimized communication. The only data that needed to be shared between processors were the end-points of every block.



		In this chapter we give a detailed example of an approach to implementing high performance, parallel applications with MPI. The basic approach to designing a good parallel application is as follows:



		Choose an algorithm with sufficient parallelism and a favorable ratio of communication to computation;



		Optimize a sequential version of the algorithm. This may mean finding library implementations of key steps, e.g., well tested and extremely fast implementations of Fourier transforms¹ and linear algebra² are widely available. High quality software archives³ are an excellent starting place for locating both sequential and parallel implementations of key algorithmic components;



		Implement the parallel version using the simplest possible MPI operations-usually blocking, standard mode procedures;



		Profiling and analysis. Find what operations or activities are taking the most time;



		Attack the most time-consuming components. Be selective. Remember the corollary to Amdahl's law: if something takes 10% of the time, then no matter how much you optimize it, your overall speed will never increase by more than 10%.



		Experience has shown that the most effective way to improve the performance of a parallel program is to improve the single-node performance. This is at odds with a

¹The FFTW library at http://theory.lcs.mit.edu/~fftw/ RPM at: ftp://contrib.redhat.com/libc6/i386/fftw-2.0.1-3.i386.rpm



		²LAPACK home page: http://www.netlib.org/lapack/ RPM at: ftp://rhcn. redhat. com/pub/rhcn/RPMS/i386/lapack-2.0-12. i386. rpm



		³http://netlib.org/ and http://www.nhse.org

Table of content

How to Build a Beowulf

How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)

ISBN: 026269218X
EAN: 2147483647

Year: 1999
Pages: 134

Authors: Thomas Sterling, Daniel F. Savarese, Donald J. Becker, John Salmon

Database Modeling with MicrosoftВ® Visio for Enterprise Architects (The Morgan Kaufmann Series in Data Management Systems)

Interprocess Communications in Linux: The Nooks and Crannies

AutoCAD 2005 and AutoCAD LT 2005. No Experience Required

Quantitative Methods in Project Management

Comparing, Designing, and Deploying VPNs

The Oracle Hackers Handbook: Hacking and Defending Oracle

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net