7.9 Pipelined Preprocessor

Team-FLY

The C preprocessor, cpp , preprocesses C source code so that the C compiler itself does not have to worry about certain things. For example, say a C program has a line such as the following.

 #define BUFSIZE 250 

In this case, cpp replaces all instances of the token BUFSIZE by 250 . The C preprocessor deals with tokens, so it does not replace an occurrence of BUFSIZE1 with 2501 . This behavior is clearly needed for C source code. It should not be possible to get cpp into a loop with something like the following.

 #define BUFSIZE (BUFSIZE + 1) 

Various versions of cpp handle this difficulty in different ways.

In other situations, the program may not be dealing with tokens and might replace any occurrence of a string, even if that string is part of a token or consists of several tokens. One method of handling the loops that may be generated by recursion is not to perform any additional test on a string that has already been replaced . This method fails on something as simple as the following statements.

 #define BUFSIZE 250 #define BIGGERBUFSIZE (BUFSIZE + 1) 

Another way to handle this situation is to make several passes through the input file, one for each #define and to make the replacements sequentially. The processing can be done more efficiently (and possibly in parallel) with a pipeline. Figure 7.10 shows a four-stage pipeline. Each stage in the pipeline applies a transformation to its input and then outputs the result for input to the next stage. A pipeline resembles an assembly line in manufacturing.

Figure 7.10. Four-stage pipeline.

graphics/07fig10.gif

This section develops a pipeline of preprocessors based on the ring of Program 7.1. To simplify the programming, the preprocessors just convert single characters to strings of characters .

  1. Write a processchar function that has the following prototype.

     int processchar(int fdin, int fdout, char inchar, char *outstr); 

    The processchar function reads from file descriptor fdin until end-of-file and writes to file descriptor fdout , translating any occurrence of the character inchar into the string outstr . If successful, processchar returns 0. If unsuccessful , processchar returns1 and sets errno . Write a driver to test this function before using it with the ring.

  2. Modify Program 7.1 so that it now takes four command-line arguments ( ringpp ). Run the program by executing the following command.

     ringpp n conf.in file.in file.out 

    The value of the command-line argument n specifies the number of stages in the pipeline. It corresponds to nprocs-2 in Program 7.1. The original parent is responsible for generating pipeline input by reading file.in , and the last child is responsible for removing output from the pipeline and writing it to file.out . Before ringpp creates the ring, the original parent opens the file conf.in , reads in n lines, each containing a character and a string. It stores this information in an array. The ringpp program reads the conf.in file before any forking, so the information in the array is available to all children.

  3. The original parent is responsible for copying the contents of the file.in input file to its standard output. When it encounters end-of-file on file.in , the process exits. The original parent generates the input for the pipeline and does not perform any pipeline processing.

  4. The last child is responsible for removing output from the pipeline. The process copies data from its standard input to file.out , but it does not perform any pipeline processing. The process exits when it encounters an end-of-file on its standard input.

  5. For i between 2 and n+1 , child process i uses the information in the (i-1) -th entry of the translation array to translate a character to a string. Each child process acts like a filter, reading the input from standard input, making the substitution and writing the result to standard output. Call the processchar function to process the input. When processchar encounters an end-of-file on input, each process closes its standard input and standard output, then exits.

  6. After making sure that the program is working correctly, try it with a big file (many megabytes) and a moderate number (10 to 20) of processes.

  7. If possible, try the program on a multiprocessor machine to measure the speedup. (See Section 7.10 for a definition of speedup .)

Each stage of the pipeline reads from its standard input and writes to its standard output. You can generalize the problem by having each stage run execvp on an arbitrary process instead of calling the same function. The conf.in file could contain the command lines to execvp instead of the table of string replacements specific to this problem.

It is also possible to have the original parent handle both the generation of pipeline input and the removal of its output. In this case, the parent opens file.in and file.out after forking its child. The process must now handle input from two sources: file.in and its standard input. It is possible to use select to handle this, but the problem is more complicated than might first appear. The process must also monitor its standard output with select because a pipe can fill up and block additional writes. If the process blocks while writing to standard output, it is not able to remove output from the final stage of the pipeline. The pipeline might deadlock in this case. The original parent is a perfect candidate for threading. Threads are discussed in Chapters 12 and 13.

Team-FLY


Unix Systems Programming
UNIX Systems Programming: Communication, Concurrency and Threads
ISBN: 0130424110
EAN: 2147483647
Year: 2003
Pages: 274

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net