6.1.3 The Guarded Beowulf

occurs on the boundaries (suitably defined) of the computational volume assigned to each logical processor. Operation counts, on the other hand, are proportional to the volume of data assigned to each logical processor. Therefore, the relative amounts of communication and computation are proportional to the ratio of surface area to volume. Since large grains have a more favorable, i.e., lower, ratio of surface area to volume, they reduce the relative importance of networking performance with respect to computational performance.
7.2.4 Bandwidth Greedy and Frugal
Communication of data over a network takes time. While variations in system load, interfering traffic, interrupt handlers, etc. make it impossible to exactly predict the amount of time required to deliver a message, the following formula provides a useful approximation:
tcomm = tlatency + message length/bandwidth
That is, the time to deliver a message is composed of a startup time, tlatency, after which data flows at a relatively constant rate, the bandwidth. Beowulf systems employing fast Ethernet deliver network bandwidths in the neighborhood of 10 MByte/s, and latencies in the neighborhood of 200 sec.
Now let's compare the rate at which data can be communicated with the rate at which data can be processed by the CPU. For concreteness, consider a 300 MHz Pentium II processor. If data is in registers, and if it is possible to keep pipelines full, then the processor can perform an arithmetic operation involving two 8-byte quantities every clock cycle. Such favorable situations are extremely rare in practice, and even the most carefully coded loops rarely exceed half this performance. Therefore, we estimate the maximum rate at which data can be consumed from registers to be about 0.5 x 300 MHz x 16bytes = 2400 MBytes/s, about 240 times faster than the network speed!
A processor's internal bandwidth is usually not the most important one for determining overall system performance. Modern microprocessors cannot deliver data from main memory at anywhere near the rates required by the CPU. This simple fact has driven a series of extremely complicated developments in microprocessor design over the last decade, including caches, wide memory buses, pipelined architectures, speculative execution, out-of-order execution, etc. While these developments have mitigated the effect of memory bandwidth on overall performance, there is no escaping the fact that memory bandwidth is often the limiting factor in overall system performance. Partly because of the presence of so many advanced

 



How to Build a Beowulf
How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters (Scientific and Engineering Computation)
ISBN: 026269218X
EAN: 2147483647
Year: 1999
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net