6.2.1 Dynamically Assigned Addresses | How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)



		The time to send a message of length n is about half latency and half bandwidth. Much longer messages may be said to be bandwidth-dominated, while much shorter messages may be said to be latency dominated. For fast Ethernet networks in Beowulf systems, n is about 1500 bytes. It is no coincidence that this is close to the size of the fundamental units of transmission implemented by the underlying hardware.



		High latencies are probably the most conspicuous shortcoming of Beowulf systems, and hence successful algorithms are usually latency tolerant. Such algorithms "don't care" about the high latency for one reason or another. There are several approaches to tolerating latency. First, the total number of messages should be minimized. Many short messages (shorter than n ) are much more expensive than a few long ones (longer than n ). In addition, one can work on some other task while the long-latency operation is under way. For example, overlapping communication and computation is supported by the asynchronous communication functions of MPI. Finally, results may be recomputed, or computed redundantly rather than communicated. Time to solution may be reduced even if the operation count increases. With a communication latency of 60000 clock cycles, there is plenty of opportunity to recompute rather than obtain a result from a distant processor.



		*7.2.6 Distributed and Shared Address Spaces*



		The MPI programming model discussed in Chapter 8 defines a distributed address space model with message passing. The only way for separate processes to share data is for them to communicate via explicit message passing procedure calls. In a shared address space system, there is a common, unified address space which may be accessed by any of the processors. In some cases, this can greatly simplify the design of parallel programs. With a shared address space, processes need not explicitly agree to transfer data, but may simply read and write a common address. On the other hand, there is considerable danger from race conditions and non-determinacy special efforts must be made in both hardware and software to guarantee that when one processor writes to a location, and another reads from that location, that the desired ordering is preserved. Parallel compilers exist that can exploit (to some extent) shared address space architectures, while designing languages and compilers to exploit message passing systems has proven much more difficult.