Parallel Computing on Heterogeneous Networks, by Alexey Lastovetsky
ISBN 0-471-22982-2 Copyright 2003 by John Wiley & Sons, Inc.
In Section 4.2 we presented MPI as the standard message-passing library used for parallel programming a homogeneous distributed memory architecture. In practice, MPI is very often used for parallel programming NoCs as well. There are a number of reasons behind the popularity of MPI among the programmers developing parallel applications for NoCs:
There are two free high-quality implementations of MPI, LAM MPI and MPICH, that support cross-platform MPI applications. For example, if an NoC consists of computers running different clones of Unix such as Linux, Solaris, HP/UX, and IRIX, then having installed such an MPI implementation on each computer of the network, the users can develop and execute MPI program running across the computers of the heterogeneous NoC.
The standard MPI encapsulates the problem of different data representations in processors of different architectures. The MPI can properly convert data communicated between processors of different architectures. On the sender side, MPI will convert the data to a machine-independent form. The data will be transferred to the receiver in this machine-independent form, where the data will be converted to the receiver’s machine-specific form.
While very well designed and easy to understand, the MPI communcation model is of a low enough level to write efficient code for any NoC.
However, the standard MPI does not address additional challenges posed by heterogeneous NoCs. We analyzed some of the challenges in Chapter 5. We repeat these challenges here:
Heterogeneity of processors. A good parallel application for heterogeneous NoCs must distribute computations unevenly in accord with the speeds of the processors. The efficiency of the parallel application also depends on the accuracy of estimation of the processors speeds. Estimating a processors speed is a difficult task because the processor may run at different speeds for different applications because of differences in instructions, the number of instruction execution units, the number of registers, the structure of its memory’s hierarchy, and so on.
Ad hoc communication network. The common communication network is heterogeneous, so the speed and bandwidth of communication links between different pairs of processors may differ significantly. This makes the problem of optimal distribution of computations and communications across the heterogeneous NoC more difficult than across a dedicated cluster of workstations interconnected with a homogeneous high-performance communication network. The other issue is that a common communication network can use multiple network protocols for communication between different pairs of processors. A good parallel application should be able to use multiple network protocols between different pairs of processors within the same application for faster execution of communication operations.
Multiple-user decentralized computer system. Unlike dedicated clusters and supercomputers, NoCs are not strongly centralized computer systems. A typical NoC consists of relatively autonomous computers, where each computer may be used and administered independently by its user. The first implication of the multiple user decentralized nature of NoCs is unstable performance during the execution of a parallel program as the computers may be used for other computations and communications. The second implication is the much higher probability of resource failures in NoCs compared to dedicated clusters of workstations, and this makes fault tolerance a desired feature for parallel applications running on NoCs.
These three main challenges posed by NoCs are not addressed by a standard MPI library. First, the standard MPI library does not employ multiple network protocols between different pairs of processors for efficient communication in the same MPI application. The only exception is the use of shared memory and TCP/IP in the MPICH implementation of MPI. Namely, if two processes of the MPI program run on the same SMP computer, they will communicate by shared memory. If the processes run on different computers, they will communicate by the TCP/IP protocol. There has been some effort made to address this challenge, such as the Nexus research implementation of MPI.
Second, the standard MPI library does not allow programmers to write fault-tolerant parallel applications for NoCs. In Section 5.3.2 we outlined the research efforts made to add the feature of fault tolerance to MPI applications. The most recent research result is FT-MPI, which is an explicit fault-tolerant MPI that extends the standard MPI’s interface and semantics. FT-MPI provides application programmers with different methods of dealing with failures within an MPI application than just checkpoint and restart. FT-MPI allows the semantics and associated modes of failures to be explicitly controlled by an application via the modified MPI API. FT-MPI allows for atomic communications, and the level of correctness can be varied for individual communicators. This enables users to fine-tune for coherency or performance as system and application conditions may dictate.
Third, the standard MPI library does not provide features that facilitate the writing of parallel programs that distribute computations and communications unevenly, taking into account the processor speeds and the speeds and bandwidths of communication links. In this chapter we present a research effort in this direction—a small set of extensions to MPI, called HMPI (Heterogeneous MPI), aimed at efficient parallel computing on heterogeneous NoCs. Actually HMPI is an of adaptation of mpC language to the MPI programming level.