Chapter 11. Describing System-Level Parallelism
In the preceding chapters, we have focused primarily on the creation of individual processes and described how these processes may be optimized for performance using techniques such as array splitting, loop pipelining, and loop unrolling. We have also seen how additional processes may be used for the purpose of testing by creating both desktop and embedded software test benches that interact with Impulse C hardware processes during simulation or during actual operation within an FPGA.
In this chapter, we will return to the topic of communicating processes and show how the use of system-level parallelism can improve throughput for many types of applications. Applications designed in this way might reside entirely in hardware, as a collection of hardware processes communicating via streams that are implemented as FIFOs, or they might involve a combination of hardware processes that communicate with one or more software processes residing on an embedded or discrete microprocessor. Furthermore, these multiple, independent processes may reside on a single FPGA or be spread across multiple FPGAs to form a larger parallel computing grid.
When designing for system-level parallelism, it is important to create processes that can operate in parallel or in a pipelined sequence to take full advantage of the available hardware resources.
The example we will use for this purpose is an image-processing filter. This filter accepts a stream of pixels (which are assumed to be 24-bit RGB values) and performs an edge-detection operation, streaming the resulting modified pixel values to the output. Many similar image filters may be created using the methods described in this example. The general method of creating multiple parallel instances of hardware processes can be used for a wide variety of applications. We'll present three different ways of implementing this image filter, each of which demonstrates different aspects of system-level parallelism. In the latter half of this chapter, we will take one version of the image filter example all the way to hardware, using an Altera FPGA-based prototyping board.