Chapter 8. Porting a Legacy Application to Impulse C
In this and the following two chapters we will pull together some of the ideas presented in the preceding chapters and create a hardware/software implementation of a data encryption engine. While doing this we will demonstrate how legacy C code that was not originally written with a hardware implementation in mind can be iteratively ported, compiled, simulated, and refined to create a reasonably efficient hardware implementation.
The initial goal of this effort will not be to create the fastest possible or most compact implementation of this particular algorithm. Instead, our goal is to give you the skills and knowledge you'll need when analyzing and converting other types of algorithms that may already be implemented in standard C on standard processors.
An important aspect of any application porting (whether you are porting to an FPGA or to another, more traditional processor architecture, such as one supporting hyperthreading) is to analyze data movement in the application, in large part to ensure that your efforts at creating an efficient, high-performance implementation in hardware are not negated by simple bandwidth limitations.
To demonstrate how such an evaluation can be performed, consider the problem of data encryption, in which a stream of incoming data must be processed very quickly against a specified set of values (the key) to generate a resulting encrypted or decrypted data stream. Such a problem may involve substantial computation but is also bandwidth-intensive: the final implementation must not compromise data throughput in order to increase overall performance.
In this chapter we will focus on the initial design process, including the use of standard C debugging tools and application monitoring. We will also generate some HDL (using a generic VHDL hardware platform) and perform a simulation of the resulting hardware.
In Chapter 9 we'll target a specific FPGA and discuss as well the creation of an embedded software test bench. This test bench will allow us to obtain some quantifiable results by actually running the application in hardware, under the control of an embedded processor.
In Chapter 10 we'll continue with this example, demonstrating through a series of steps how an application not originally optimized for a hardware implementation can be made to run faster and to require less hardware. As we go through these stepwise optimizations, we hope to show some of the techniques that can be used to accelerate many types of applications, both at the level of the process (the coding of inner code loops and so on) and at the level of the algorithm and its I/O requirements.