In this chapter we have described in general terms how C code is parallelized by the compiler, and we have provided some useful techniques for writing C code that is "optimizer-friendly." Unlike writing software code, writing efficient C for hardware requires some additional thought and a basic awareness of how parallelism is extracted by the compiler and optimizer. In the next three chapters we will apply this knowledge directly and show how the performance of a real-world C algorithm can be dramatically improved with relatively little effort. In subsequent chapters we'll apply these techniques to larger, more interesting applications.

