Optimizing C and Fortran 77 compilers are widely used for parallel programming SMP architectures. The main optimization they perform is loop parallelization where different iterations of a loop are simultaneously executed by different parallel threads.
The most difficult problem that must be solved by optimizing compilers is recognition of the loops that can be parallelized. Solution of the problem is based on an analysis of the data dependencies in the loops. Optimizing compilers for the SMP architecture and for the vector and superscalar architectures share the same methods and algorithms for recognition of parallelizable loops as outlined in Section 2.4. The two groups of compilers actually share the same advantages and disadvantages of the “optimizing C or Fortran 77 compilers” approach to the problem of efficiently portable programming of the target parallel architectures as we saw in the analysis of Chapter 2.
To that analysis we want to add a few more words about the problem of porting serial legacy code to parallel architectures. Most serial C and Fortran 77 programs cannot be efficiently implemented on SMP computers by just optimizing the C and Fortran 77 compilers. A good serial algorithm maximizes the re-use of information, computed at each loop iteration, by all subsequent iterations, thus minimizing redundant computations. This results in strong, profound, and sophisticated interiteration data dependences without the optimizing compilers having to parallelize the most principal and time-consuming loops. Industrial C and Fortran 77 optimizing compilers parallelize a very small fraction of loops in such programs (e.g., about 3% for an actual Fortan 77 modeling of physical phenomena), and the parallelizable loops very often only initialize arrays and do not contribute much to the total execution time. Therefore most serial programs must be re-designed so that optimizing compilers can generate a code that runs efficiently on SMP computers.