program

Team-FLY

16.7 Parallel File Copy

This section revisits the parallel file copy of Program 12.8 on page 427. The straightforward implementation of the parallel file copy creates a new thread to copy each file and each directory. When called with a large directory tree, this implementation quickly exceeds system resources. This section outlines a worker pool implementation that regulates how many threads are active at any time. In a worker pool implementation, a fixed number of threads are available to handle the load. The workers block on a synchronization point (in this case, an empty buffer) and one worker unblocks when a request comes in (an item is put in the buffer). Chapter 22 compares the performance of worker pools to other server threading strategies.

16.7.1 Parallel file copy producer

Begin by creating a producer thread function that takes as a parameter an array of size 2 containing the pathnames of two directories. For each regular file in the first directory, the producer opens the file for reading and opens a file of the same name in the second directory for writing. If a file already exists in the destination directory with the same name , that file should be opened and truncated. If an error occurs in opening either file, both files are closed and an informative message is sent to standard output. The two open file descriptors and the name of the file are put into the buffer. Use the bufferconddone implementation so that the threads can be terminated gracefully. The buffer.h file contains the definition of buffer_t , the type of a buffer entry. Use the following definition for this project.

 typedef struct {    int infd;    int outfd;    char filename[PATH_MAX]; } buffer_t; 

Only ordinary files will be copied for this version of the program. The filename member should contain the name of the file only, without a path specification. Use the opendir and readdir functions described in Section 5.2 on page 152 to access the source directory. These functions are not thread-safe, but there will be only one producer thread and only this thread will call these functions. Use the lstat function described in Section 5.2.1 on page 155 to determine if the file is a regular file. The file is a regular file if the S_ISREG macro returns true when applied to the st_mode field of the stat structure. Program 16.16 shows a function that returns true if filename represents a regular file and false otherwise .

This is a producer-driven bounded buffer problem. When the producer is finished filling the buffer with filenames from the given directory, it calls setdone in Program 16.11 and exits.

Program 16.16 isregular.c

A function that returns true if the filename parameter is a regular file .

 #include <sys/stat.h> #include <sys/types.h> int isregular(const char *filename) {    struct stat buf;    if (lstat(filename, buf) == -1)       return 0;    return S_ISREG(buf.st_mode); } 

16.7.2 Parallel file copy consumer

Each consumer thread reads an item from the buffer, copies the file from the source file descriptor to the destination file descriptor, closes the files, and writes a message to standard output giving the file name and the completion status of the copy.

Note that the producer and multiple consumers are writing to standard output and that this is a critical section that must be protected. Devise a method for writing these messages atomically.

The consumers should terminate when they detect that a done flag has been set and no more entries remain in the buffer, as in Program 16.13.

16.7.3 Parallel file copy main program

The main program should take the number of consumers and the source and destination directories as command-line arguments. The application always has exactly one producer thread.

The main program should start the threads and use pthread_join to wait for the threads to complete, as in Program 16.15. Use gettimeofday to get the time before the first thread is created and after the last join. Display the total time to copy the files in the directory.

Experiment with different buffer sizes and different numbers of consumer threads. Which combinations produce the best results? Be careful not to exceed the per-process limit on the number of open file descriptors. The number of open file descriptors is determined by the size of the buffer and the number of consumers. Make sure that the consumers close the file descriptors after copying a file and before removing another item from the buffer.

16.7.4 Parallel file copy enhancements

After the programs described above are working correctly, add the following enhancements.

  1. Copy subdirectories as well as ordinary files, but do not (at this time) copy the contents of the subdirectories. (Just create a subdirectory in the destination directory for each subdirectory in the source directory.) You can either have the producer do this (and not put a new entry into the buffer) or add a field in buffer_t giving the type of file to be copied. Read item 3 below before deciding which method to use.

  2. Copy FIFOs. For each FIFO in the source directory, make a FIFO with the same name in the destination directory. You can handle this as in item 1.

  3. Recursively copy subdirectories. This part should just require modifying the producer if the producer creates the subdirectory. If the consumers create the subdirectories, you need to figure out how to avoid having the producer try to open a destination file before its directory has been created. Store the path of the file relative to the source directory in the buffer slots so that the consumers can print relevant messages.

  4. Keep statistics about the number and types of files copied. Keep track of the total number of bytes copied. Keep track of the shortest and longest copy times.

  5. Add a signal thread that outputs the statistics accumulated so far when the process receives a SIGUSR1 signal. Make sure that the handler output is atomic with respect to the output generated by the producer and the consumers.

Team-FLY


Unix Systems Programming
UNIX Systems Programming: Communication, Concurrency and Threads
ISBN: 0130424110
EAN: 2147483647
Year: 2003
Pages: 274

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net