17.7 Using Metaprograms to Unroll Loops

Ru-Brd

17.7 Using Metaprograms to Unroll Loops

One of the first practical applications of metaprogramming was the unrolling of loops for numeric computations , which is shown here as a complete example.

Numeric applications often have to process n -dimensional arrays or mathematical vectors. One typical operation is the computation of the so-called dot product . The dot product of two mathematical vectors a and b is the sum of all products of corresponding elements in both vectors. For example, if each vectors has three elements, the result is

 a[0]*b[0] + a[1]*b[1] + a[2]*b[2] 

A mathematical library typically provides a function to compute such a dot product. Consider the following straightforward implementation:

  // meta/loop1.hpp  #ifndef LOOP1_HPP  #define LOOP1_HPP  template <typename T>  inline T dot_product (int dim, T* a, T* b)  {      T result = 0;      for (int i=0; i<dim; ++i) {          result += a[i]*b[i];      }      return result;  }  #endif  // LOOP1_HPP  

When we call this function as follows

  // meta/loop1.cpp  #include <iostream>  #include "loop1.hpp"  int main()  {      int a[3] = { 1, 2, 3 };      int b[3] = { 5, 6, 7 };      std::cout << "dot_product(3,a,b) = " << dot_product(3,a,b)                << '\n';      std::cout << "dot_product(3,a,a) = " << dot_product(3,a,a)                << '\n';  } 

we get the following result:

 dot_product(3,a,b) = 38  dot_product(3,a,a) = 14 

This is correct, but it takes too long for serious high-performance applications. Even declaring the function inline is often not sufficient to attain optimal performance.

The problem is that compilers usually optimize loops for many iterations, which is counterproductive in this case. Simply expanding the loop to

 a[0]*b[0] + a[1]*b[1] + a[2]*b[2] 

would be a lot better.

Of course, this performance doesn't matter if we compute only some dot products from time to time. But, if we use this library component to perform millions of dot product computations, the differences become significant.

Of course, we could write the computation directly instead of calling dot_product() , or we could provide special functions for dot product computations with only a few dimensions, but this is tedious . Template metaprogramming solves this issue for us: We "program" to unroll the loops. Here is the metaprogram:

  // meta/loop2.hpp  #ifndef LOOP2_HPP  #define LOOP2_HPP  // primary template  template <int DIM, typename T>  class DotProduct {    public:      static T result (T* a, T* b) {          return *a * *b  +  DotProduct<DIM-1,T>::result(a+1,b+1);      }  };  // partial specialization as end criteria  template <typename T>  class DotProduct<1,T> {    public:      static T result (T* a, T* b) {          return *a * *b;      }  };  // convenience function  template <int DIM, typename T>  inline T dot_product (T* a, T* b)  {      return DotProduct<DIM,T>::result(a,b);  }  #endif  // LOOP2_HPP  

Now, by changing your application program only slightly, you can get the same result:

  // meta/loop2.cpp  #include <iostream>  #include "loop2.hpp" 
 int main()  {      int a[3] = { 1, 2, 3};      int b[3] = { 5, 6, 7};      std::cout << "dot_product<3>(a,b) = " << dot_product<3>(a,b)                << '\n';      std::cout << "dot_product<3>(a,a) = " << dot_product<3>(a,a)                << '\n';  } 

Instead of writing

 dot_product(3,a,b) 

we write

 dot_product<3>(a,b) 

This expression instantiates a convenience function template that translates the call into

 DotProduct<3,int>::result(a,b) 

And this is the start of the metaprogram.

Inside the metaprogram the result is the product of the first elements of a and b plus the result of the dot product of the remaining dimensions of the vectors starting with their next elements:

 template <int DIM, typename T>  class DotProduct {    public:      static T result (T* a, T* b) {          return *a * *b + DotProduct<DIM-1,T>::result(a+1,b+1);      }  }; 

The end criterion is the case of a one-dimensional vector:

 template <typename T>  class DotProduct<1,T> {    public:      static T result (T* a, T* b) {          return *a * *b;      }  }; 

Thus, for

 dot_product<3>(a,b) 

the instantiation process computes the following:

 DotProduct<3,int>::result(a,b)  = *a * *b  + DotProduct<2,int>::result(a+1,b+1)  = *a * *b  + *(a+1) * *(b+1)  + DotProduct<1,int>::result(a+2,b+2)  = *a * *b  + *(a+1) * *(b+1)  + *(a+2) * *(b+2) 

Note that this way of programming requires that the number of dimensions is known at compile time, which is often (but not always) the case.

Libraries, such as Blitz++ (see [Blitz++]), the MTL library (see [MTL]), and POOMA (see [POOMA]), use these kinds of metaprograms to provide fast routines for numeric linear algebra. Such metaprograms often do a better job than optimizers because they can integrate higher-level knowledge into the computations. [2] The industrial-strength implementation of such libraries involves many more details than the template- related issues we present here. Indeed, reckless unrolling does not always lead to optimal running times. However, these additional engineering considerations fall outside the scope of our text.

[2] In some situations metaprograms significantly outperform their Fortran counterparts, even though Fortran optimizers are usually highly tuned for these sorts of applications.

Ru-Brd


C++ Templates
C++ Templates: The Complete Guide
ISBN: 0201734842
EAN: 2147483647
Year: 2002
Pages: 185

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net