1.7. When Is C Language Appropriate for FPGA Design?
Experimenting with mixed hardware/software solutions can be a time-consuming process due to the historical disconnect between software development methods and the lower-level methods required for hardware design, including design for FPGAs. For many applications, the complete hardware/software design is represented by a collection of software and hardware source files that are not easily compiled, simulated, or debugged with a single tool set. In addition, because the hardware design process is relatively inefficient, hardware and software design cycles may be out of sync, requiring system interfaces and fundamental software/hardware partitioning decisions to be prematurely locked down.
With the advent of C-based FPGA design tools, however, it is now possible to use familiar software design tools and standard C language for a much larger percentage of a given applicationin particular, those parts of the design that are computationally intensive. Later performance tweaks may introduce handcrafted HDL code as a replacement for the automatically generated hardware, just as DSP users often replace higher-level C code with handcrafted assembly language. Because the design can be compiled directly from C code to an initial FPGA implementation, however, the point at which a hardware engineer needs to be brought in to make such performance tweaks is pushed further back in the design cycle, and the system as a whole can be designed using more productive software design methods.
These emerging hardware compiler tools allow C-language applications to be processed and optimized to create hardware, in the form of FPGA netlists, and also include the necessary C language extensions to allow highly parallel, multiple-process applications to be described. For target platforms that include embedded processors, these tools can be used to generate the necessary hardware/software interfaces as well as generating low-level hardware descriptions for specific processes.
One key to success with these tools, and with hardware/software approaches in general, is to partition the application appropriately between software and hardware resources. A good partitioning strategy must consider not only the computational requirements of a given algorithmic component, but also the data bandwidth requirements. This is because hardware/software interfaces may represent a significant performance bottleneck.
Making use of a programming model appropriate for highly parallel applications is also important. It is tempting to off-load specific functions to an FPGA using traditional programming methods such as remote procedure calls (whereby values are pushed onto some stack or stack equivalent, a hardware function is invoked, and the processor waits for a result) or by creating custom processor instructions that allow key calculations to be farmed out to hardware. Research has demonstrated, however, that alternate, more dataflow-oriented methods of programming are more efficient and less likely to introduce blockages or deadlocks into an application. In many cases, this means rethinking the application as a whole and finding new ways to express data movement and processing. The results of doing so, however, can be dramatic. By increasing application-level parallelism and taking advantage of programmable hardware resources, for example, it is possible to accelerate common algorithms by orders of magnitude over a software-only implementation.
During the development of such applications, design tools can be used to visualize and debug the interconnections of multiple parallel processes. Application monitoring, for example, can provide an overall view of the application and its constituent processes as it runs under the control of a standard C debugger. Such instrumentation can help quantify the results of a given partitioning strategy by identifying areas of high data throughput that may represent application bottlenecks. When used in conjunction with familiar software profiling methods, these tools allow specific areas of code to be identified for more detailed analysis or performance tweaking. The use of cycle-accurate or instruction-set simulators later in the development process can help further optimize the application.