Section 10.2. Refinement 1: Reducing Size by Introducing a Loop


10.2. Refinement 1: Reducing Size by Introducing a Loop

As described earlier, a macro (F) was used to repeat the 16 steps of the core processing. In software, this might run a little faster than using a loop. In hardware, though, using the macro in such a way means that we are duplicating that code 16 times, creating a potentially much larger implementation. The obvious solution here is to reduce code repetition by introducing a loop.

Without making major modifications, we can introduce a loop as follows:

 for (i=0; i<16; i++) {      F(left,right,Ks[i]);     i++;     F(right,left,Ks[i]); } 

Regenerating hardware using the Impulse C tools, we obtain an implementation using about 1,500 slices in the Xilinx device. The looping instructions introduce some extra delay, but the performance is still 9.4 times faster than the unmodified software implementation. In short, for a small hit in performance the design size has been cut roughly in half. Refinement four will shed a somewhat different light on this performance difference.

Tip

Evaluate the use of straight-line code versus loops to help balance cycle delays with hardware size.




    Practical FPGA Programming in C
    Practical FPGA Programming in C
    ISBN: 0131543180
    EAN: 2147483647
    Year: 2005
    Pages: 208

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net