- Figure 1.1: Serial scalar processor architecture.

- Figure 2.1: Vector processor architecture.
- Figure 2.2: Superscalar processor architecture.

- Figure 3.1: Shared memory multiprocessor.

- Figure 4.1: Distributed memory multiprocessor.
- Figure 4.2: Matrix-matrix multiplication with matrices
*A*,*B*, and*C*evenly partitioned in one dimension. The slices mapped onto a single processor are shaded in black. During execution, this processor requires all of matrix*B*(shown shaded in gray). - Figure 4.3: Matrix-matrix multiplication with matrices
*A*,*B*, and*C*evenly partitioned in two dimensions. The blocks mapped onto a single processor are shaded black. During execution, this processor requires corresponding rows of matrix*A*and columns of matrix*B*(shown shaded in gray). - Figure 4.4: Optimal broadcast tree
*P*= 8,*L*= 6,*g*= 4, and*o*= 2. The number shown for each node is the time at which it has received the data unit and can begin sending it. - Figure 4.5: Communication tree for optimal summing for
*T*= 28,*P*= 8,*L*= 5,*g*= 4, and*o*= 2. - Figure 4.6: Storage schemes used in the target message-passing program to store a row of blocks of matrix
*A*and a column of blocks of matrix*B*in local arrays Arows and Bcols, respectively.

- Figure 5.1: Local network of computers.
- Figure 5.2: Matrix-matrix multiplication with matrices
*A*,*B*, and*C*unevenly partitioned in one dimension. The area of the slice mapped to each processor is proportional to its speed. The slices mapped onto a single processor are shaded black. During execution this processor requires all of matrix*A*(shown shaded gray).

- Figure 6.1: Matrix operation
*C*=*A x**B*with matrices^{T}*A*,*B*, and*C*unevenly partitioned in one dimension. The slices mapped onto a single processor are shaded in black. - Figure 6.2: One step of parallel multiplication of matrices
*A*and*B*. The pivot row of blocks of matrix^{T}*B*(shown slashed) is first broadcast to all processors. Then each processor computes, in parallel with the others, its part of the corresponding column of blocks of the resulting matrix*C*.

- Figure 7.1: The system of bodies consists of large groups of bodies, with different groups at a good distance from each other. The bodies move under the influence of Newtonian gravitational attraction.
- Figure 7.2: The point of view of each individual process. The system of bodies includes all bodies of its group while each remote group is approximated by a single equivalent body.
- Figure 7.3: A star communication pattern.
- Figure 7.4: A fully connected network.
- Figure 7.5: A ring communication pattern.
- Figure 7.6: A four-level binary tree. Node 0 is a
*root*. Nodes 5, 6, 7, and 8 are*leaves*. Nodes 1, 2, 3, and 4 are*interior*. - Figure 7.7: A four-level complete binary tree. Level 3 is full.
- Figure 7.8: A four-level perfect binary tree, since all levels are full. An
*n*-level perfect binary tree has 2- 1 nodes.^{n} - Figure 7.9: Hierarchical model of a heterogeneous network of five computers.

- Figure 9.1: One step of the modified algorithm of parallel matrix-matrix multiplication based on two-dimensional block distribution of matrices
*A*,*B*, and*C*. First, the pivot column*a*of_{•k}*r*x*r*blocks of matrix*A*(shown shaded in gray) is broadcast horizontally, and the pivot row*b*of_{k•}*r*x*r*blocks of matrix*B*(shown shaded in gray) is broadcast vertically. Then each*r*x*r*block*c*of matrix_{ij}*C*(also shown shaded in gray) is updated,*c*=_{ij}*c*+_{ij }*a*x_{ik}*b*._{kj} - Figure 9.2: A matrix with 18 x 18 blocks is distributed over a 3 x 3 processor grid. The numbers at the left and at the top of the matrix represent indexes of a row of blocks and a column of blocks, respectively. (
*a*) The labeled squares represent blocks of elements, and the label indicates at which location in the processor grid the block is stored—all blocks labeled with the same name are stored in the same processor. Each shaded and unshaded area represents different generalized blocks. (*b*) Each processor has 6 x 6 blocks. - Figure 9.3: Example of two-step distribution of a 6 x 6 generalized block over a 3 x 3 processor grid. See below for complete explanation.
- Figure 9.4: A matrix with 18 x 18 blocks is distributed over a 3 x 3 processor grid. See below for complete explanation.
- Figure 9.5: One step of the algorithm of parallel matrix-matrix multiplication based on heterogeneous two-dimensional block distribution of matrices
*A*,*B*, and*C*. First each*r*x*r*block of the pivot column*a*of matrix_{•k}*A*(shown shaded in dark gray) is broadcast horizontally, and each*r*x*r*block of the pivot row*b*of matrix_{k•}*B*(shown shaded in dark gray) is broadcast vertically. Then each*r*x*r*block*c*of matrix_{ij}*C*is updated,*c*=_{ij}*c*+_{ij}*a*x_{ik}*b*_{kj}. - Figure 9.6: Different combinations of rectangles in a generalized block. (
*a*) No*r*x*r*block of rectangle*R*_{31}is a horizontal neighbor of rectangle*R*_{23}; therefore h[3][1][2][3] = 0. (*b*) All*r*x*r*blocks of rectangle*R*_{31}are horizontal neighbors of rectangle*R*_{33}; h[3][1][3][3] = 3. (*c*) Neighbors of rectangle*R*_{22}in rectangle*R*_{21}make up a 3 x 6 rectangle area (shaded in dark gray); h[2][1][2][2] = 3. (*d*) Neighbors of rectangle*R*_{33}in rectangle*R*_{21}make up the last row of this rectangle (shaded in dark gray); h[2][1][3][3] = 1. - Figure 9.7: Execution times of the heterogeneous and homogeneous 2D algorithms and the heterogeneous 1D algorithm. All algorithms are performed on the same heterogeneous network.
- Figure 9.8: Speedups of the heterogeneous 1D and 2D algorithms compared with that of the homogeneous 2D block cyclic algorithm. All algorithms are performed on the same heterogeneous network.
- Figure 9.9: Execution times of the 2D heterogeneous block cyclic algorithm on a heterogeneous network and of the 2D homogeneous block cyclic algorithm on a homogeneous network. The networks have approximately the same total power of processors and share the same communication network.
- Figure 9.10: One step of the block Cholesky factorization algorithm: See below for complete explanation.
- Figure 9.11: Speedup achieved by the heterogeneous block cyclic Cholesky factorization compared to its homogeneous counterpart on a heterogeneous network of eight workstations.
- Figure 9.12: Speedups achieved by the heterogeneous Cholesky factorizations based on the data distribution and process distribution compared to their homogeneous prototype on a heterogeneous network of eight workstations.
- Figure 9.13: Speedups for different permutations of the numbers of bodies in groups.
- Figure 9.14: The seven-point “honeycomb” scheme of oil/water well disposition.

Parallel Computing on Heterogeneous Networks (Wiley Series on Parallel and Distributed Computing)

ISBN: B000W7ZQBO

EAN: N/A

EAN: N/A

Year: 2005

Pages: 95

Pages: 95

Authors: Alexey L. Lastovetsky

flylib.com © 2008-2017.

If you may any questions please contact us: flylib@qtcs.net

If you may any questions please contact us: flylib@qtcs.net