
One of the types of computation that is easiest to parallelize is the Monte Carlo family of algorithms. In such computations, a random number generator is used to create a number of independent trials. Statistics done with the outcomes of the trials provide a solution to the problem.
We illustrate this technique with another computation of the value of π. If we select points at random in the unit square [0, 1] [0, 1] and compute the percentage of them that lies inside the quarter circle of radius 1, then we will be approximating . (See [48] for a more detailed discussion together with an approach that does not use a parallel random number generator.) We use the SPRNG parallel random number generator (sprng.cs.fsu.edu). The code is shown in Figure 8.12.
#include "mpi.h" #include <stdio.h> #define SIMPLE_SPRNG /* simple interface */ #define USE_MPI /* use MPI */ #include "sprng.h" /* SPRNG header file */ #define BATCHSIZE 1000000 int main( int argc, char *argv[] ) { int i, j, numin = 0, totalin, total, numbatches, rank, numprocs; double x, y, approx, pi = 3.141592653589793238462643; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &numprocs ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); if ( rank == 0 ) { numbatches = atoi( argv[1] ); } MPI_Bcast( &numbatches, 1, MPI_INT, 0, MPI_COMM_WORLD ); for ( i = 0; i < numbatches; i++ ) { for ( j = 0; j < BATCHSIZE; j++) { x = sprng( ); y = sprng( ); if ( x * x + y * y < 1.0 ) numin++; } MPI_Reduce( &numin, &totalin, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD ); if ( rank == 0 ) { total = BATCHSIZE * ( i + 1 ) * numprocs; approx = 4.0 * ( (double) totalin / total ); printf( "pi = %.16f; error = %.16f, points = %d\n", approx, pi  approx, total ); } } MPI_Finalize( ); return 0; }
The defaults in SPRNG make it extremely easy to use. Calls to the sprng function return a random number between 0.0 and 1.0, and the stream of random numbers on the different processes is independent. We control the grain size of the parallelism by the constant BATCHSIZE, which determines how much computation is done before the processes communicate. Here a million points are generated, tested, and counted before we collect the results to print them. We use MPI_Bcast to distribute the commandline argument specifying the number of batches, and we use MPI_Reduce to collect at the end of each batch the number of points that fell inside the quarter circle, so that we can print the increasingly accurate approximations to π.
