Table 1 reports some software cost measures from our experiments, which we review to underline the qualities of the structured approach: fast code development, code portability, and performance portability.
Development Costs and Code Expressiveness
When restructuring the existing sequential code to parallel, most of the work is devoted to making the code modular. The amount of sequential code needed to develop the building blocks for structured parallel applications is reported in Table 1 as modularization, separate from the true parallel code. Once modularization has been accomplished, several prototypes for different parallel structures are usually developed and evaluated. The skeleton description of a parallel structure is shorter, quicker to write and far more readable than its equivalent written in MPI. As a test, starting from the same sequential modules, we developed an MPI version of C4.5. Though it exploits simpler solutions (Master-Slave, no pipelined communications) than the skeleton program, the MPI code is longer, more complex and error-prone than the structured version. On the contrary, the speed-up results showed no significant gain from the additional programming effort.
The speed-up and scale-up results of the applications we have shown are not all breakthrough, but comparable to those of similar solutions performed with unstructured parallel programming (e.g., MPI). The Partitioned Apriori is fully scalable with respect to database size, like count-distribution implementations. The C4.5 prototype behaves better than other pure task-parallel implementations. It suffers the limits of this parallelization scheme, due to the support of external objects being incomplete. We know of no other results about spatial clustering using our approach to the parallelization of cluster expansion.
Code and Performance Portability
Skeleton code is by definition portable over all the architectures that support the programming environment. Since the SkIE two-level parallel compiler uses standard compilation tools to build the final application, the intermediate code and the run-time support of the language can exploit all the advantages of parallel communication libraries. We can enhance the parallel support by using architecture-specific facilities when the performance gain is valuable, but as long as the intermediate code complies with industry standards the applications are portable to a broad set of architectures. The SMP and T3E tests of the ARM prototype were performed this way, with no extra development time, by compiling on the target machine the MPI and C++ code produced by SkIE. These results also show a good degree of performance portability.