6.5. Debugging the Generated Hardware
Hardware that has been generated from C code should (in a perfect world) exhibit behavior that is exactly the same as is observed during a software simulation, such as when running under the control of a C debugger. In practice, however, there are many situations in which subtle coding errors made in the C code (such as relying on variables being initialized or making incorrect assumptions about process synchronization) can result in an application that operates perfectly during software simulation but fails in the actual hardware. To help guard against this, making use of hardware debugging techniques and hardware simulators can be an important part of your design efforts.
Although debugging automatically generated HDL may seem daunting (particularly if you are a software engineer), it is actually not as bad as you might think. You will find that the generated outputs will be quite dense with intermediate, low-level signals (perhaps hundreds or even thousands of them, most of which will be optimized away in the hardware synthesis process). Fortunately, you will also find that the variables used in your C file are still, for the most part, intact and have their names preserved so they can be monitored during debugging.
To help in analyzing control flow and cycle-by-cycle synchronization issues, it's useful to know that the hardware generator implements each process in your application as a separate state machine, with symbolic state names that can be referenced back to specific blocks and stages of C code in the original application. Parallel operations are found within the state machine and/or within concurrent statements found elsewhere in the generated HDL module.
The following excerpts from the generated FIR filter hardware description help illustrate this point. First, notice that in the declarations section for the lower-level HDL file we have the following declaration:
type stateType is (init,b0s0,b0s1,b0s2,b0s3,b0s4,b0s5,b0s6,b0s7,b0s8,b0s9,b0s10, b0s11,b0s12,b0s13,b0s14,b0s15,b0s16,b0s17,b0s18,b0s19,b0s20, b0s21,b0s22,b0s23,b0s24,b0s25,b0s26,b0s27,b0s28,b0s29,b0s30, b0s31,b0s32,b0s33,b0s34,b0s35,b0s36,b0s37,b0s38,b0s39,b0s40, b0s41,b0s42,b0s43,b0s44,b0s45,b0s46,b0s47,b0s48,b0s49,b0s50, b0s51,b0s52,b0s53,b0s54,b0s55,b0s56,b0s57,b0s58,b0s59,b0s60, b0s61,b0s62,b0s63,b0s64,b0s65,b0s66,b0s67,b0s68,b0s69,b0s70, b0s71,b0s72,b0s73,b0s74,b0s75,b0s76,b0s77,b0s78,b0s79,b0s80, b0s81,b0s82,b0s83,b0s84,b0s85,b0s86,b0s87,b0s88,b0s89,b0s90, b0s91,b0s92,b0s93,b0s94,b0s95,b0s96,b0s97,b0s98,b0s99,b0s100, b0s101,b0s102,b1s0,b1s1,b2s0,finished); signal thisState, nextState : stateType;
The generated type stateType symbolically represents all the blocks and stages of the generated process. In the case of the FIR filter there are quite a few of these states in the machine (107 of them, to be exact) that represent two major blocks of functionality in the expanded code. One of these states (the first one, b0s0) is shown here, along with the clock logic that drives the machine:
if (clk'event and clk='1') then case thisState is when b0s0 => if (stateEn = '1') then r_tap <= ni4126_tap; end if;
Comments found elsewhere in the generated HDL help identify which specific block and cycle a given operation is associated with. For example, the following concurrent multiply and accumulate operations are associated with stage one of block one, as indicated by the comment line preceding them:
-- b1s1 ni4130_nSample <= r_filter_in; ni4131_firbuffer_50 <= ni4130_nSample; ni4132_accum <= X"00000000"; ni4133_tap <= X"00000000"; ni4134_accum <= add(ni4132_accum, mul(r_firbuffer_0, r_coef_0)); ni4135_tap <= X"00000001"; ni4136_accum <= add(ni4134_accum, mul(r_firbuffer_1, r_coef_1));
Of the 107 states in the machine, those blocks identified by the b0 state name prefix represent the initialization section of the FIR filter, which consisted of two unrolled loops in the original C code. There are many stages in this block, but because this is only initialization code, the overhead of all those cycles is of little importance.
The key routine of the FIR filter, the inner code loop that actually processes the data stream, is represented by the states prefixed by b1 and b2, of which there are only two (b1s0 and b2s0) when pipelining has not been enabled and only one (b1s0) when pipelining has been enabled through the use of the PIPELINE pragma. You can use these symbolic states as an aid to hardware debugging with an HDL simulator, as shown in Figure 6-8.
Figure 6-8. Debugging the hardware state machine using a VHDL simulator.
Figures 6-9 and 6-10 show another hardware debugging session (again using the FIR application as an example) in which the expanded source code of the original example (in which the specific blocks and stages of the code have been identified, both graphically and in an expanded source listing) can, without too much difficulty, be related to specific lines of the generated HDL. Notice in the example shown that the variable firbuffer_50, which corresponds to one element of the scalarized firbuffer array, is easily identified in the HDL code during source-level hardware simulation and debugging. Comments embedded in the HDL code also help identify the specific blocks and stages of the original C code that correspond to the HDL statements being executed.
Figure 6-9. Viewing the expanded C code in the Impulse C Stage Master optimizer.
Figure 6-10. Debugging the same sequence of code in a hardware VHDL simulator.
The goal of performing hardware simulations at this level (after compilation from C) is to identify and verify correct cycle-by-cycle behavior. An example of this kind of debugging is stepping through the design one clock cycle at a time (or through some defined number of cycles) to zero in on a specific problem area, as defined by both space (the area of code) and time (the clock cycle in which a problem manifests itself). The Impulse design flow has three fundamental ways in which to perform cycle-accurate hardware simulations of this type:
Which of these methods you choose will depend on the nature of the problem you are attempting to debug, on your expertise as a hardware designer, and on your access to HDL simulation tools.