Section 6.6. Hardware Generation Notes | Practical FPGA Programming in C

6.6. Hardware Generation Notes

The following subsections describe a few important topics related to the generation of hardware. These topics will be expanded upon in subsequent chapters.

Instruction-Level Pragmas

Instruction- and block-level optimization features may be controlled at the level of your C source code through the use of certain predefined pragmas.

Pragma CO PIPELINE

Pipelining of instructions is not automatic and requires an explicit declaration in your C source code as follows:

 #pragma CO PIPELINE

This declaration must be included within the body of the loop and prior to any statements that are to be pipelined. For example:

 for (i=0; i<10; i++) { #pragma CO PIPELINE   sum += i << 1; }

Note

The PIPELINE pragma must appear at the top of the loop to be pipelined, before any other statements within the loop, and the loop may not contain any nested loops.

Pragma CO UNROLL

Loop unrolling may be enabled with the use of the UNROLL pragma, which appears as follows:

 for (tap = 0; tap < TAPS; tap++) { #pragma CO UNROLL     accum += firbuffer[tap] * coef[tap]; }

Unrolling a loop will result in that code within the loop being duplicated in hardware as many times as needed to implement the operation being described. It is therefore important to consider the size of the resulting hardware and unroll loops that have a relatively small number of iterations. The iterations of the loop must also not include interdependent calculations and/or assignments that would prevent the loop from being implemented as a parallel (unrolled) structure in hardware.

Note that the UNROLL pragma must appear at the top of the loop, before any other statements in the loop, and the loop must be a for loop with a loop variable of type int and constant bounds.

Pragma CO SET StageDelay

The general-purpose pragma CO SET is used to pass optimization information to the optimizers. One SET pragma is currently defined:

 #pragma CO SET stageDelay 32

The numeric argument refers to the maximum number of combinational gate delays permissible for an instruction stage. This pragma is described in more detail later in this chapter.

Understanding Latency and Rate

The latency and rate numbers reported by Stage Master apply to pipelines. In this context, latency refers to the number of cycles required for an input to reach the output of a pipeline, or, in other words, the length of the pipeline.

The rate is the number of cycles required for each input to the pipeline. (This is sometimes called the input rate or the introduction rate.) A rate of 1 means that the pipeline accepts inputs every cycle. A rate of 2 means that the pipeline accepts an input every other cycle.

Controlling Stage Delays

It was stated earlier in this chapter that all statements within a stage are implemented in a single clock cycle. One implication of this is that the number of individual statements (operations and assignments) being performed may have a direct (and potentially large) impact on the maximum clock rate of your application when synthesized to actual hardware. This impact can be mitigated by using the StageDelay parameter. This parameter specifies the maximum delay for the stages of the generated hardware. StageDelay parameters are specified using the generic CO SET pragma:

 #pragma CO SET StageDelay 32

The stage delay specified in the CO SET StageDelay pragma refers to the maximum number of combinational delays (levels of logic, or gate delays) that are allowed within a given pipelined stage. Note that optimizations performed by FPGA synthesis tools may further reduce (or in some cases expand) the number of combinational delays in the final implementation.

A combinational gate delay is roughly equivalent to the gate delay in the target hardware. Depending on the capabilities of the synthesis and routing tools being used, a logic operator such as an AND, OR, or SHIFT will require one delay unit, while an arithmetic operation or relational operation may require n or more delays, where n is the bit width.

Another way to explicitly control stage delays is to make use of the co_par_break function described in Appendix C. This function forces a new stage, allowing you to create sequential, multicycle operations in C code that would otherwise be generated as parallel logic.