Section 10.2. Refinement 1: Reducing Size by Introducing a Loop

10.2. Refinement 1: Reducing Size by Introducing a Loop

As described earlier, a macro (F) was used to repeat the 16 steps of the core processing. In software, this might run a little faster than using a loop. In hardware, though, using the macro in such a way means that we are duplicating that code 16 times, creating a potentially much larger implementation. The obvious solution here is to reduce code repetition by introducing a loop.

Without making major modifications, we can introduce a loop as follows:

 for (i=0; i<16; i++) {      F(left,right,Ks[i]);     i++;     F(right,left,Ks[i]); }

Regenerating hardware using the Impulse C tools, we obtain an implementation using about 1,500 slices in the Xilinx device. The looping instructions introduce some extra delay, but the performance is still 9.4 times faster than the unmodified software implementation. In short, for a small hit in performance the design size has been cut roughly in half. Refinement four will shed a somewhat different light on this performance difference.

Tip

Evaluate the use of straight-line code versus loops to help balance cycle delays with hardware size.