9.4. Testing the Triple-DES Algorithm in HardwareIn Chapter 8, we presented a triple-DES algorithm, showing how an iterative process of software simulation, compilation, and hardware simulation could be used to verify the proper function of the algorithm and to get some initial performance numbers in terms of cycle delays. To evaluate that performance we used a combination of information obtained from the compiler tools (the latency and rate) and hardware simulation (using a VHDL simulator). But while hardware simulation is an excellent way to both debug an application and analyze performance, there is really no substitute for programming an application in hardware and watching it work. As we mentioned briefly in the preceding chapter, one way to test individual Impulse C processes (or collections of processes as appropriate) is to set up a hardware/software test environment. We'll discuss this procedure in more detail here, and show the complete path from software down to the mixed processor and hardware implementation. Platform SelectionA number of different FPGA and processor combinations would allow us to create a mixed hardware/software test for our prototype encryption algorithm. To simplify the creation of such a platform, we have selected a widely available FPGA prototyping board, the V2MB1000 board available from Memec Design (www.memec.com). This board includes a Xilinx Virtex II device as well as a variety of useful on-board peripheral interfaces, including a network interface, two serial ports, a USB port, and other such hardware. The V2MB1000 board is pictured in Figure 9-1. Figure 9-1. Insight/Memec V2MB1000 development board.To create the embedded software test bench, we will make use of the MicroBlaze soft processor core and its FSL (Fast Simplex Link) interconnects. This platform combination will provide us with a highly efficient software-to-hardware communication channel, allowing us to stream character data from MicroBlaze to the FPGA-based algorithm at a relatively high rate. Software and Hardware Algorithm ComparisonThe goal of this test is to compile the same algorithm (the encryption process) on both the MicroBlaze processor as a standard C function and as hardware on the FPGA. This will allow us to compare the results, both in terms of accuracy of the algorithm and performance, for the hardware and software implementations. In addition, the MicroBlaze will be used to run test producer and consumer processes that will pass text data into the algorithm and accept the results, as shown in Figure 9-2. Figure 9-2. Embedded test bench block diagram.The test that we will create for in-system validation will be much simpler than the test bench described in the previous chapter. Recall that in that test we created a software test bench that read a large number of characters from a text file, encrypted those characters, and then decrypted them to produce the identical characters for output. The test bench also performed the same encryption and decryption operations using legacy C function calls for comparison purposes. One aspect of that earlier software test bench was that it made use of multiple processes, including a producer, consumer, and processes that controlled the legacy encryption and decryption. In fact, the earlier test bench had six processes in total, which (when simulated under the control of Visual Studio) were implemented as six distinct threads. This allowed us to emulate the parallel behavior of those processes. For this test we will need to create a simpler test bench, one that does not rely on threads to implement multiple processes. Although we could make use of a threading library (such as pthreads) or an embedded operating system, for this example we would like to keep this test bench as simple as practical. The resulting test bench process, which is compatible with both the MicroBlaze processor and Windows desktop compilation (by virtue of #ifdef statements) is shown in Figure 9-3. Starting from the top of this file, notice the following:
Figure 9-3. Embedded test bench for the triple-DES hardware process.#define TIMED_TEST #include "xparameters.h" #ifdef TIMED_TEST #include "xtmrctr.h" #endif #include <stdio.h> #include "co.h" #include "des.h" #define BLOCKSIZE 8 // Unsigned characters per block #define KS_DEPTH 48 // Key pairs #ifndef WIN32 #define printf xil_printf #ifdef TIMED_TEST XTmrCtr TimerCounter; #endif #endif extern co_architecture co_initialize(void *); extern void deskey(k,key,decrypt); // Generates a key schedule for testing extern unsigned long Spbox[SPBOX_X][SPBOX_Y]; // Combined SP boxes extern DES3_KS Ks; // Key schedule generated by deskey() // Sample data, this could be read iteratively from a file. static unsigned char Blocks[]={ 0x6f,0x98,0x26,0x35,0x02,0xc9,0x83,0xd7} ; void des_test(co_stream config_out, co_stream blocks_out, co_stream input_stream) { int i, k; unsigned char block[8]; uint8 blockElement; unsigned long data,err; #ifdef TIMED_TEST Xuint32 counter; #endif // Send the keyschedule and SPbox data HW_STREAM_OPEN(des_test,config_out, O_WRONLY, UINT_TYPE(32)); for ( k = 0; k < 2; k++ ) { for ( i = 0; i < KS_DEPTH; i++ ) { data = Ks[i][k]; HW_STREAM_WRITE(des_test,config_out,data); } } for ( i = 0; i < SPBOX_X; i++ ) { for ( k = 0; k < SPBOX_Y; k++ ) { data = Spbox[i][k]; HW_STREAM_WRITE(des_test,config_out,data); } } HW_STREAM_CLOSE(des_test,config_out); // Send the test block of data to the encryption process HW_STREAM_OPEN(des_test,blocks_out, O_WRONLY, UINT_TYPE(8)); HW_STREAM_OPEN(des_test,input_stream,O_RDONLY,UINT_TYPE(8)); #ifdef TIMED_TEST XTmrCtr_Reset(&TimerCounter,0); #endif for ( k = 0; k < BLOCKSIZE; k++ ) { blockElement = Blocks[k]; HW_STREAM_WRITE(des_test,blocks_out,blockElement); } for ( k = 0; k < BLOCKSIZE; k++ ) { HW_STREAM_READ(des_test,input_stream,blockElement,err); block[k] = blockElement; } #ifdef TIMED_TEST counter = XTmrCtr_GetValue(&TimerCounter,0); #endif HW_STREAM_CLOSE(des_test,blocks_out); HW_STREAM_CLOSE(des_test,input_stream); #ifdef TIMED_TEST xil_printf("FPGA processing done (%d ticks).\ n\ r",counter); #else xil_printf("FPGA processing done.\ n\ r"); #endif printf("FPGA block out:"); for (i=0; i<BLOCKSIZE; i++) { printf(" %02x",block[i]); } printf("\ n\ r"); } int main(int argc, char *argv[]) { // The key is 24 bytes unsigned char * key = (unsigned char *) "Gflk jqo40978J0dmm$%@878"; co_architecture my_arch; #ifdef IMPULSE_C_TARGET #ifdef TIMED_TEST XTmrCtr_Initialize(&TimerCounter, XPAR_OPB_TIMER_0_DEVICE_ID); XTmrCtr_SetResetValue(&TimerCounter,0,0); XTmrCtr_Start(&TimerCounter,0); #endif #endif printf("Impulse C 3DES DEMO\ n\ r"); des3key(Ks, key, 0); /* Create a keyschedule for encryption */ printf("Running encryption test on FPGA ...\ n\ r"); my_arch = co_initialize((void *)Iterations); co_execute(my_arch); return(0); } |