In this chapter we have described how the performance of an application can be improved by applying techniques of system-level parallelism. We have also demonstrated how the complete application (consisting of the four hardware processes and one software test process) can be implemented on an Altera FPGA platform for the purpose of in-system testing.
These techniques for system-level partitioning and parallelism are widely applicable to FPGA-based applications and are a critical part of performance optimization. While it's tempting to think that a large C application (one originally written for a traditional processor) can be adapted with little change to a parallel platform such as an FPGA, the reality is that most algorithms, like the image filter described here, require some new thinking about how to achieve the best balance of parallel hardware to meet both performance and size constraints. Using a streaming programming model for partitioning can be an effective way to create such parallelism.
In the following chapter, we'll continue our exploration of this image filter example. You will see how a software test bench can be used in conjunction with an embedded operating system to create a complete, single-board, FPGA-based computing system.