3.4. PARALLEL LANGUAGES | Parallel Computing on Heterogeneous Networks (Wiley Series on Parallel and Distributed Computing)

3.3. THREAD LIBRARIES

Thread libraries are used to directly implement a thread parallel programming model and allow the programmers to explicitly write MT programs not relying on the optimizing compilers. The basic paradigm of multithreading implemented in different thread libraries (POSIX, NT, Solaris, OS/2, etc.) is the same as that briefly summarized in Section 3.1. The libraries just differ in the implementation details of the same basic model.

The POSIX thread library we use is also known as Pthreads. The main reason we focus on this library is that in 1995 the specification of Pthreads became a part of the IEEE POSIX standard, and hence it is considered the standard for all Unix systems. Most hardware vendors now offer Pthreads in addition to their proprietary thread libraries.

Pthreads are originally defined as a C language library. The standard Fortran interface to this library is not yet complete. Pthreads introduce three classes of objects and operations on the objects:

Threads
Mutexes
Condition variables

Any thread of an MT program is represented by its ID, which is a reference to an opaque data object holding full information about the thread. This information is used and modified by operations on threads. The operations can create threads, terminate threads, join threads, and so on. There are also operations to set and query thread attributes.

Mutex is an abbreviation for mutual exclusion. Mutex variables (or simply mutexes) are one of the primary means of thread synchronization, normally used when several threads update the same global data. A mutex acts as a lock protecting access to the shared data resource. Only one thread can lock a mutex at any given time. Thus, even if several threads simultaneously try to lock a mutex, only one of them will succeed and claim ownership of that mutex. No other thread can lock that mutex until the owning thread unlocks it. Thus mutexes serialize access to the shared resource by concurrent threads. Operations on mutexes include creating, destroying, locking, and unlocking mutexes. There are also operations that set and modify the attributes associated with mutexes.

Condition variables are yet another way for threads to synchronize their work. A condition variable is a global variable shared by several threads and used by those threads to signal each other that some condition is satisfied. Namely a thread that waits for some condition to be satisfied may block itself on a condition variable, and then do nothing until some other thread unblocks it by performing a corresponding operation on the same condition variable as soon as the condition is satisfied. Without condition variables the thread would be constantly polling to check if the condition is met, which is very resource-consuming since the thread would be continuously busy in this activity. Thus condition variables allow the programmer to use “cold” waiting, which does not keep processors busy, instead of “hot” waiting. Operations on condition variables include creating condition variables, waiting on and signaling condition variables, and destroying condition variables. There are also operations to set and query condition variable attributes.

3.3.1. Operations on Threads

An MT program starts up with one initial thread running the function main. All other threads must be explicitly created. Creation of a new thread is performed by the following Pthreads function:

 int pthread_create(pthread_t *thread,                   const pthread_attr_t *attr,                   void *(*start_routine)(void*),                   void *arg)

This function creates a new thread that runs concurrently with the calling thread. The new thread executes the function start_routine. Only one argument can be passed to this function via arg. For cases where multiple arguments must be passed, this limitation is easily overcome by creating a structure that contains all of the arguments and then passing a pointer to that structure in the pthread_create function.

The attr argument specifies thread attributes to be applied to the new thread. The attr argument can also be NULL, in which case default attributes are used. On success, the ID of the newly created thread is stored in the location pointed by the thread argument, and 0 is returned. On error, a nonzero error code is returned. Once created, threads become peers and so can create other threads.

Upon successful completion, pthread_create stores the ID of the created thread in the location referenced by thread. The caller can use this thread ID to perform various operations on the thread. Any thread may learn its own ID with the function

pthread_t pthread_self(void)

returning the unique, system assigned thread ID of the calling thread. Function

 int pthread_equal(pthread_t t1, pthread_t t2)

compares two thread IDs t1 and t2. The pthread_equal function returns a nonzero value if t1 and t2 are equal. Otherwise, 0 is returned.

The new thread terminates either explicitly, by calling the function

 void pthread_exit(void *status)

or implicitly, by returning from the start_routine function. The latter is equivalent to calling pthread_exit with the result returned by start_routine as exit code. If function main finishes before the threads it has created, and exits with pthread_exit, the other threads will continue to execute. Otherwise, they will be automatically terminated when main finishes. A termination status may be optionally specified. If the thread is not detached (see below), the exit status specified by status is made available to any successful join with the terminating thread. The pthread_exit function does not close files. Any file opened inside the thread will remain open after the thread is terminated.

Joining is a synchronization operation on threads. The operation is implemented by function

 int pthread_join(pthread_t t, void **status)

which blocks the calling thread until the specified thread t terminates. There are two types of threads: joinable and detached. It is impossible to join a detached thread (see below). Several threads cannot wait for the same thread to complete; one thread will complete successfully and the others will terminate with an error. The pthread_join function will not block processing of the calling thread if the specified thread t has already terminated. This function returns successfully when thread t terminates. If a pthread_join call returns successfully with a non-null status argument, the value passed to the pthread_exit function by the terminating thread will be placed in the location referenced by status.

When a thread is created, one of its possible attributes defines whether or not the thread may be joined. If the thread is created detached, then it cannot be joined; that is, use of the ID of the newly created thread by the pthread_join function is an error. If the thread is created joinable, then it can be joined. A detached or joinable state of the thread is set by using the attr argument in the pthread_create function. The argument points to a thread attribute variable, that is, a variable of the pthread_attr_t type.

The attribute variable is an opaque data object holding all attributes of the thread. Typically this variable is first initialized by the function

 int pthread_attr_init(pthread_attr_t *attr)

to set the default value for all of the individual attributes used by a given implementation. Then the detached status attribute is set in the attribute object with the function

 int pthread_attr_setdetachstate(pthread_attr_t *attr,                                int detstat)

where detstat can be set to either PTHREAD_CREATE_DETACHED or PTHREAD_CREATE_JOINABLE. A value of PTHREAD_CREATE_DETACHED causes all threads created with attr to be in the detached state, whereas using a value of PTHREAD_CREATE_JOINABLE causes all threads created with attr to be in the joinable state. The default value of the detached status attribute is PTHREAD_CREATE_JOINABLE. The detached status attribute can be retrieved using the function

 int pthread_attr_getdetachstate(const pthread_attr_t *attr,                                  int *pdetstat)

which stores the value of the detached status attribute in the location pointed by pdetstat, if successful.

A single attribute object can be used in multiple simultaneous calls to the pthread_create function. Other attributes held in the attribute object, for example, may specify the address and size for a thread’s stack (allocated by system by default), priority of the thread (0 by default).

The function

 int pthread_attr_destroy(pthread_attr_t *attr)

releases resources used by the attribute object attr, which cannot be reused until it is reinitialized. The function

 int pthread_detach(pthread_t *t)

can be used to explicitly detach thread t even though it was created as joinable. The only reason for using detached threads is some reduction in the Pthreads overhead.

3.3.2. Operations on Mutexes

A typical sequence of operations on a mutex is as follows:

Create a mutex.
Initialize the mutex.
Lock the mutex by one of several competing threads. The winner starts owning the mutex, and the losers act depending on the type of lock operation used. The blocking lock operation blocks the calling thread until the mutex becomes available and this thread locks the mutex. The nonblocking lock operation terminates even if the mutex has been already locked by another thread. On completion, the operation informs the calling thread whether or not it has managed to lock the mutex.
Unlock the mutex by its current owner. The mutex becomes available for locking by other competing threads.
Destroy the mutex. The mutex cannot be re-used without re-initialization.

To create a mutex, a variable of the pthread_mutex_t type must be declared. The variable is an opaque data object that contains all information about the mutex in an implementation-dependent form.

The mutex can be initialized dynamically by the function

 int pthread_mutex_init(pthread_mutex_t *mutex,                        const pthread_mutexattr_t *mutexattr)

which initializes the mutex referenced by mutex with attributes specified by mutexattr. Typically mutexattr is specified as NULL to accept the default mutex attributes. Upon successful initialization, the state of the mutex becomes initialized and unlocked.

The mutex may also be initialized statically, when the mutex variable is declared, by the macro PTHREAD_MUTEX_INITIALIZER. For example,

pthread_mutex_t a_mutex = PTHREAD_MUTEX_INITIALIZER;

The effect is equivalent to dynamic initialization by a call to the pthread_mutex_init function with argument mutexattr specified as NULL, except that no error checks are performed.

The function

 int pthread_mutex_destroy(pthread_mutex_t *mutex)

destroys the mutex object, freeing the resources it might hold.

The function

 int pthread_mutex_lock(pthread_mutex_t *mutex);

locks the mutex object referenced by mutex. If the mutex is already locked, the calling thread blocks until the mutex becomes available. This operation returns with the mutex object referenced by mutex in the locked state with the calling thread as its owner.

The function

 int pthread_mutex_trylock(pthread_mutex_t *mutex);

behaves identically to the pthread_mutex_lock function except that if the mutex object referenced by mutex is currently locked, the call returns immediately. The pthread_mutex_trylock function returns 0 if it succeeds to lock the mutex. Otherwise, a nonzero error is returned.

The function

 int pthread_mutex_unlock(pthread_mutex_t *mutex);

releases the mutex object referenced by mutex, resulting in the mutex becoming available. If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted.

3.3.3. Operations on Condition Variables

A condition variable (or simply a condition) is a synchronization device that allows threads to suspend execution and relinquish the processors until some predicate on shared data is satisfied. A condition variable must always be associated with a mutex, to avoid a race whereby a thread prepares to wait on a condition variable, and another thread signals the condition variable just before the first thread actually waits on it.

A typical sequence of operations on a condition variable can be depicted as follows:

To create the condition, a variable of the pthread_cond_t type must be declared. The variable is an opaque data object that contains all information about the condition in an implementation-dependent form.

The condition variable can be initialized dynamically, with function

 int pthread_cond_init(pthread_cond_t *cond,                       const pthread_condattr_t *condattr)

which initializes the condition variable pointed by cond, using the condition attributes specified in condxattr. Typically condattr is specified as NULL to accept the default condition attributes.

The condition variable may also be initialized statically, when it is declared, with the macro PTHREAD_COND_INITIALIZER. For example,

pthread_cond_t a_cond = PTHREAD_COND_INITIALIZER ;

The effect is equivalent to dynamic initialization by a call to the pthread_cond_init function with argument condattr specified as NULL, except that no error checks are performed.

The function

 int pthread_cond_signal(pthread_cond_t *cond)

restarts one of the threads that are waiting on the condition variable pointed by cond. If no threads are waiting on this condition variable, nothing happens. If several threads are waiting on the condition, only one is restarted, but it is not specified which thread should proceed.

The function

 int pthread_cond_broadcast(pthread_cond_t *cond)

restarts all the threads that are waiting on the condition variable pointed by cond. Nothing happens if no threads are waiting on this condition.

The function

 int pthread_cond_wait(pthread_cond_t *cond, _   pthread_mutex_t *mutex)

atomically unlocks the mutex pointed by mutex (as per pthread_unlock_mutex) and waits for the condition variable pointed by cond to be signaled. The thread execution is suspended and does not consume any processor time until the condition variable is signaled. The mutex must be locked by the calling thread on entrance to the pthread_cond_wait function. Before returning to the calling thread, pthread_cond_wait re-acquires the mutex (as per pthread_lock_mutex). Unlocking the mutex and suspending on the condition variable is done atomically. Thus, if all threads always acquire the mutex before signaling the condition, this guarantees that the condition cannot be signaled (and thus ignored) between the time a thread locks the mutex, and the time it waits on the condition variable.

The function

 int pthread_cond_timedwait(pthread_cond_t *cond,                            pthread_mutex_t *mutex,                            const struct timespec *abstime)

atomically unlocks the mutex pointed by mutex and waits on the condition variable pointed by cond, as pthread_cond_wait does, but it also bounds the duration of the wait. If the condition variable has not been signaled within the amount of time specified by abstime, the mutex is re-acquired and pthread_cond_timedwait returns error ETIMEDOUT. The abstime argument specifies an absolute time.

The function

 int pthread_cond_destroy(pthread_cond_t *cond)

destroys the condition variable pointed by cond, freeing the resources it might hold. No threads must be waiting on the condition variable on entrance to pthread_cond_destroy.

3.3.4. Example of MT Application: Multithreaded Dot Product

We consider a Pthreads application computing the dot product of two real m-length vectors x and y to illustrate parallel programming an n-processor SMP computer with thread libraries. This MT application divides vectors x and y into n subvectors. While the first n - 1 subvectors are of the same length m/n, the last nth subvector may be shorter if m is not a multiple of n. This application uses n parallel threads with ith thread computing its fraction of the total dot product by multiplying subvectors x_i and y_i. The n parallel threads share a data object that accumulates as the dot product, and synchronize their access to the data object by a mutex. The main thread creates the n threads, waits for them to complete their computations by joining with each of the threads, and then puts out the result.

The source code of the application is as follows:

#include <stdio.h> #include <stdlib.h> #include <pthread.h> #define MAXTHRDS 124 typedef struct {   double *my_x;   double *my_y;   double my_dot_prod;   double *global_dot_prod;   pthread_mutex_t *mutex;   int my_vec_len; } dot_product_t; void *serial_dot_product(void *arg) {   dot_product_t *dot_data;   int i;   dot_data = arg;   for(i=0; i<dot_data->my_vec_len; i++)      dot_data->my_dot_prod += dot_data->my_x[i] *dot_data->my_y[i];    pthread_mutex_lock(dot_data->mutex);   *(dot_data->global_dot_prod) += dot_data->my_dot_prod;   pthread_mutex_unlock(dot_data->mutex);   pthread_exit(NULL); } int main() {   double *x, *y, dot_prod;   pthread_t *working_thread;   dot_product_t *thrd_dot_prod_data;   void *status;   pthread_mutex_t *mutex_dot_prod;   int num_of_thrds;   int vec_len;   int subvec_len;   int i;   printf("Number of processors = ");   if(scanf("%d", &num_of_thrds) < 1 || num_of_thrds > MAXTHRDS) {   printf("Check input for number of processors. Bye.\n");      return -1;   }   printf("Vector length = ");   if(scanf("%d", &vec_len)<1) {      printf("Check input for vector length. Bye.\n");      return -1;   }   subvec_len = vec_len/num_of_thrds;   x = malloc(vec_len*sizeof(double));   y = malloc(vec_len*sizeof(double));   for(i=0; i<vec_len; i++) {     x[i] = 1.;     y[i] = 1.;   }   working_thread = malloc(num_of_thrds*sizeof(pthread_t));   thrd_dot_prod_data = malloc(num_of_thrds*sizeof (dot_product_t));   mutex_dot_prod = malloc(sizeof(pthread_mutex_t));   pthread_mutex_init(mutex_dot_prod, NULL);   for(i=0; i<num_of_thrds; i++) {      thrd_dot_prod_data[i].my_x = x + i*subvec_len;      thrd_dot_prod_data[i].my_y = y + i*subvec_len;      thrd_dot_prod_data[i].global_dot_prod = &dot_prod;      thrd_dot_prod_data[i].mutex = mutex_dot_prod;      thrd_dot_prod_data[i].my_vec_len =   (i==num_of_thrds-1)? vec_len-(num_of_thrds-1)*subvec_len   : subvec_len;   pthread_create(&working_thread[i], NULL, serial_dot_product,   (void*)&thrd_dot_prod_data[i]);   }   for(i=0; i<num_of_threds; i++   pthread_join(working_thread[i], &status);   printf("Dot product = %f\n", dot_prod);   free(x);   free(y);   free(working_thread);   free(thrd_dot_prod_data);   pthread_mutex_destroy (mutex_dot_prod);   free(mutex_dot_prod); }