17.1 PVM History, Terminology, and Architecture | UNIX Systems Programming: Communication, Concurrency and Threads

Team-FLY

Grace Murray Hopper, a vocal early advocate of parallel computing, was fond of reminding her audiences that the way to pull a heavier load was not to grow a bigger ox but to hitch more oxen to the load. Seymour Cray, a pioneer in computer architecture, is reported to have later countered, "If you were plowing a field, which would you rather use, two strong oxen or 1024 chickens?" The chickens versus oxen debate continues to rage. IBM's Blue Gene Project involves the building of a 64,000-processor machine with petaflop capabilities (a thousand trillion operations per second) based on relatively low- powered , embedded PowerPC chips [14]. On the other hand, the NEC Earth-Simulator, which was rated as the world's fastest computer in 2002, uses only 640 nodes. Each "NEC oxen node" consists of 8 tightly coupled vector processors [135].

Another important development in the parallel/distributed computing arena is the move to harness cheap workstations to solve large problems. Programming libraries, such as PVM (Parallel Virtual Machine) [118] and MPI (Message Passing Interface) [43], allow groups of heterogeneous, interconnected machines to provide a transparent parallel-computing environment by providing a cross-platform message-passing facility with higher-level services built on top. These systems allow users to solve large problems on networks of workstations by providing the illusion of a single parallel machine. PVM operates at the task level and presents a message-passing abstraction that hides the details of the network and individual machines that make up the virtual machine . PVM/MPI libraries have become the mainstay of distributed scientific computing because they allow researchers to develop platform-independent software. However, programs based on this paradigm are hard for nonexperts to debug and optimize.

A new notion of "computing as a utility" has recently emerged in the form of grid computing [38]. The Open Grid Services Architecture provides a higher-level layer of services built over message-passing libraries and native host runtime systems. These higher-level abstractions are quickly bringing distributed computing into the mainstream.

This chapter project develops a PVM-like library for managing tasks . We begin by introducing PVM terminology and providing an overview of the PVM architecture.

The basic unit of computation in PVM is called a task and is analogous to a UNIX process. A PVM program calls PVM library functions to create and coordinate tasks. The tasks can communicate by passing messages to other tasks through calls to PVM library functions. Tasks that cooperate, either through communication or synchronization, are organized into groups called computations . PVM supports direct communication, broadcast and barriers within a computation.

Figure 17.1 shows a logical view of a typical PVM system. A PVM application generally starts with an input and partitioning task that controls the problem solution. The user specifies in this task how other tasks cooperate to solve the problem. The input and partitioning task creates several computations. Tasks within each computation share data and communicate with each other. The PVM application also has a dedicated task to handle output and user display. The other tasks in the PVM application forward their output to this task for display on the application's console.

Figure 17.1. Logical view of an application running on a PVM virtual machine.

graphics/17fig01.gif

To run a PVM application, a user first designates the pool of machines or hosts that make up the virtual machine and then starts the PVM control daemon, pvmd , on each of these hosts. The control daemon communicates with the user's console and handles communication and controls tasks on its machine. To send input to a particular task, PVM sends the data to the pvmd daemon on the destination host, which then forwards it to the appropriate task. Similarly, a task outputs by sending a message to its pvmd , which in turn forwards it to the console's pvmd and on to the application's output task. The underlying message passing is transparent, so the user sees only that a particular task has sent a message to the console.

Figure 17.2 shows how an application might be mapped onto the virtual machine. The tasks that make up a logical computation are not necessarily mapped to the same host but might be spread across all the hosts on the virtual machine. Host 1 of Figure 17.2 has three computations, one containing a single task, one with two tasks and one that is part of a computation that also has tasks on host 2.

Figure 17.2. Schematic of a PVM.

graphics/17fig02.gif

Team-FLY