Another way to improve the performance of a build is to take advantage of the parallelism inherent in the problem the makefile is solving. Most makefile s perform many tasks that are easily carried out in parallel, such as compiling C source to object files or creating libraries out of object files. Furthermore, the very structure of a well-written makefile provides all the information necessary to automatically control the concurrent processes.
Example 10-1 shows our mp3_player program executed with the jobs option, ”jobs =2 (or -j 2 ). Figure 10-1 shows the same make run in a pseudo UML sequence diagram. Using ”jobs =2 tells make to update two targets in parallel when that is possible. When make updates targets in parallel, it echos commands in the order in which they are executed, interleaving them in the output. This can make reading the output from parallel make more difficult. Let's look at this output more carefully .
Example 10-1. Output of make when ”jobs = 2
$ make -f ../ch07-separate-binaries/makefile --jobs=2 bison -y --defines ../ch07-separate-binaries/lib/db/playlist.y flex -t ../ch07-separate-binaries/lib/db/scanner.l > lib/db/scanner.c gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -M ../ch07-separate-binaries/app/player/play_mp3.c \ sed 's,\(play_mp3\.o\) *:,app/player/ app/player/play_mp3.d: ,' > app/player/ play_mp3.d.tmp mv -f y.tab.c lib/db/playlist.c mv -f y.tab.h lib/db/playlist.h gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -M ../ch07-separate-binaries/lib/codec/codec.c \ sed 's,\(codec\.o\) *:,lib/codec/ lib/codec/codec.d: ,' > lib/codec/codec.d.tmp mv -f app/player/play_mp3.d.tmp app/player/play_mp3.d gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -M lib/db/playlist.c \ sed 's,\(playlist\.o\) *:,lib/db/ lib/db/playlist.d: ,' > lib/db/playlist.d.tmp mv -f lib/codec/codec.d.tmp lib/codec/codec.d gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -M ../ch07-separate-binaries/lib/ui/ui.c \ sed 's,\(ui\.o\) *:,lib/ui/ lib/ui/ui.d: ,' > lib/ui/ui.d.tmp mv -f lib/db/playlist.d.tmp lib/db/playlist.d gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -M lib/db/scanner.c \ sed 's,\(scanner\.o\) *:,lib/db/ lib/db/scanner.d: ,' > lib/db/scanner.d.tmp mv -f lib/ui/ui.d.tmp lib/ui/ui.d mv -f lib/db/scanner.d.tmp lib/db/scanner.d gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -c -o app/player/play_mp3.o ../ch07-separate-binaries/app/player/play_mp3.c gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -c -o lib/codec/codec.o ../ch07-separate-binaries/lib/codec/codec.c gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -c -o lib/db/playlist.o lib/db/playlist.c gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -c -o lib/db/scanner.o lib/db/scanner.c ../ch07-separate-binaries/lib/db/scanner.l: In function yylex: ../ch07-separate-binaries/lib/db/scanner.l:9: warning: return makes integer from pointer without a cast gcc -I lib -I ../ch07-separate-binaries/lib -I ../ch07-separate-binaries/include -c -o lib/ui/ui.o ../ch07-separate-binaries/lib/ui/ui.c ar rv lib/codec/libcodec.a lib/codec/codec.o ar: creating lib/codec/libcodec.a a - lib/codec/codec.o ar rv lib/db/libdb.a lib/db/playlist.o lib/db/scanner.o ar: creating lib/db/libdb.a a - lib/db/playlist.o a - lib/db/scanner.o ar rv lib/ui/libui.a lib/ui/ui.o ar: creating lib/ui/libui.a a - lib/ui/ui.o gcc app/player/play_mp3.o lib/codec/libcodec.a lib/db/libdb.a lib/ui/libui.a -o app/ player/play_mp3
Figure 10-1. Diagram of make when ”jobs = 2
First, make must build the generated source and dependency files. The two generated source files are the output of yacc and lex . This accounts for commands 1 and 2. The third command generates the dependency file for play_mp3.c and is clearly begun before the dependency files for either playlist.c or scanner.c are completed (by commands 4, 5, 8, 9, 12, and 14). Therefore, this make is running three jobs in parallel, even though the command-line option requests two jobs.
The mv commands, 4 and 5, complete the playlist.c source code generation started with command 1. Command 6 begins another dependency file. Each command script is always executed by a single make , but each target and prerequisite forms a separate job. Therefore, command 7, which is the second command of the dependency generation script, is being executed by the same make process as command 3. While command 6 is probably being executed by a make spawned immediately following the completion of the make that executed commands 1-4-5 (processing the yacc grammar), but before the generation of the dependency file in command 8.
The dependency generation continues in this fashion until command 14. All dependency files must be complete before make can move on to the next phase of processing, re-reading the makefile . This forms a natural synchronization point that make automatically obeys.
Once the makefile is reread with the dependency information, make can continue the build process in parallel again. This time make chooses to compile all the object files before building each of the archive libraries. This order is nondeterministic. That is, if the makefile is run again, it may be that the libcodec.a library might be built before the playlist.c is compiled, since that library doesn't require any objects other than codec.o . Thus, the example represents one possible execution order amongst many.
Finally, the program is linked. For this makefile , the link phase is also a natural synchronization point and will always occur last. If, however, the goal was not a single program but many programs or libraries, the last command executed might also vary.
Running multiple jobs on a multiprocessor obviously makes sense, but running more than one job on a uniprocessor can also be very useful. This is because of the latency of disk I/O and the large amount of cache on most systems. For instance, if a process, such as gcc , is idle waiting for disk I/O it may be that data for another task such as mv , yacc , or ar is currently in memory. In this case, it would be good to allow the task with available data to proceed. In general, running make with two jobs on a uniprocessor is almost always faster than running one job, and it is not uncommon for three or even four tasks to be faster than two.
The ”jobs option can be used without a number. If so, make will spawn as many jobs as there are targets to be updated. This is usually a bad idea, because a large number of jobs will usually swamp a processor and can run much slower than even a single job.
Another way to manage multiple jobs is to use the system load average as a guide. The load average is the number of runnable processes averaged over some period of time, typically 1 minute, 5 minutes, and 15 minutes. The load average is expressed as a floating point number. The ”load-average (or -l ) option gives make a threshold above which new jobs cannot be spawned. For example, the command:
$ make --load-average=3.5
tells make to spawn new jobs only when the load average is less than or equal to 3.5. If the load average is greater, make waits until the average drops below this number, or until all the other jobs finish.
When writing a makefile for parallel execution, attention to proper prerequisites is even more important. As mentioned previously, when ”jobs is 1, a list of prerequisites will usually be evaluated from left to right. When ”jobs is greater than 1, these prerequisites may be evaluated in parallel. Therefore, any dependency relationship that was implicitly handled by the default left to right evaluation order must be made explicit when run in parallel.
Another hazard of parallel make is the problem of shared intermediate files. For example, if a directory contains both foo.y and bar.y , running yacc twice in parallel could result in one of them getting the other's instance of y.tab.c or y.tab.h or both moved into its own .c or .h file. You face a similar hazard with any procedure that stores temporary information in a scratch file that has a fixed name .
Another common idiom that hinders parallel execution is invoking a recursive make from a shell for loop:
dir: for d in $(SUBDIRS); \ do \ $(MAKE) --directory=$$d; \ done
As mentioned in Section 6.1 in Chapter 6, make cannot execute these recursive invocations in parallel. To achieve parallel execution, declare the directories .PHONY and make them targets:
.PHONY: $(SUBDIRS) $(SUBDIRS): $(MAKE) --directory=$@