A.5. FPGA-Specific Optimization TechniquesBecause the designer is actually building and creating the embedded processor system hardware in an FPGA, much can be done to improve the performance of the hardware itself. Additionally, with an FPGA embedded processor residing next to additional FPGA hardware resources, a designer can consider custom coprocessor designs specifically targeted at a design's core algorithm. Increasing the FPGA's Operating FrequencyEmploying FPGA design techniques to increase the operating frequency of the FPGA embedded processor system increases performance. Several methods are considered. Logic Optimization and ReductionConnect only the peripherals and buses that will be used. Here are a few examples:
Area and Timing ConstraintsXilinx FPGA place and route tools perform much better when given guidelines as to what is most important to the designer. In the Xilinx tools, a designer can specify the desired clock frequency, pin location, and logic element location. By providing these details, the tools can make smarter trade-offs during the hardware design implementation. Some peripherals require additional constraints to ensure proper operation. For example, both the DDR SDRAM controller and the 10/100 Ethernet MAC require additional constraints to guarantee that the tools create correct and optimized logic. The designer must read the datasheet for each peripheral and follow the recommended design guidelines. Hardware AccelerationDedicated hardware outperforms software. The embedded designer who is serious about increasing performance must consider the FPGA's ability to accelerate the processor performance with dedicated hardware. Although this technique consumes FPGA resources, the performance improvements can be extraordinary. Turn on the Hardware Divider and Barrel-ShifterMicroBlaze can be customized to use a hardware divider and a hardware barrel-shifter rather than performing these functions in software. Enabling these processor capabilities consumes more logic but improves performance. In one example, enabling the hardware divider and barrel-shifter adds 414 LCs, but the performance is improved by 18.1%. Software Bottlenecks Converted to Coprocessing HardwareCustom hardware logic can be designed to offload an FPGA embedded processor. When a software bottleneck is identified, a designer can choose to convert the bottleneck algorithm into custom hardware. Custom software instructions can then be defined to operate the hardware coprocessor. Both MicroBlaze and Virtex-4 PowerPC include very low-latency access points into the processor, which are ideal for connecting custom coprocessing hardware. Virtex-4 introduces the Auxiliary Processing Unit (APU) for the PowerPC. The APU provides a direct connection from the PowerPC to co-processing hardware. In MicroBlaze, the low-latency interface is called the Fast Simplex Link (FSL) bus. The FSL bus contains multiple channels of dedicated, unidirectional, 32-bit interfaces. Because the FSL channels are dedicated, no arbitration or bus mastering is required. This allows an extremely fast interface to the processor. Converting a software bottleneck into hardware may seem like a very difficult task. Traditionally, a software designer identifies the bottleneck, after which the algorithm is transitioned to an FPGA designer who writes VHDL or Verilog code to create the hardware coprocessor. Fortunately, this process has been greatly simplified by tools that can generate FPGA hardware from C code. One such tool is CoDeveloper from Impulse Accelerated Technologies. This tool allows one designer who is familiar with C to port a software bottle neck into a custom piece of coprocessing FPGA hardware using CoDeveloper's Impulse C libraries. Here are some examples of algorithms that could be targeted for hardware-based coprocessors:
Any operation that is algorithmic, mathematical, or parallel is a good candidate for a hardware coprocessor. FPGA logic consumption is traded for performance. The advantages can be enormous, improving performance by tens or hundreds of times. |