Item 48: Be aware of template metaprogramming | Effective C++ Third Edition 55 Specific Ways to Improve Your Programs and Designs

Template metaprogramming (TMP) is the process of writing template-based C++ programs that execute during compilation. Think about that for a minute: a template metaprogram is a program written in C++ that executes inside the C++ compiler. When a TMP program finishes running, its output pieces of C++ source code instantiated from templates is then compiled as usual.

If this doesn't strike you as just plain bizarre, you're not thinking about it hard enough.

C++ was not designed for template metaprogramming, but since TMP was discovered in the early 1990s, it has proven to be so useful, extensions are likely to be added to both the language and its standard library to make TMP easier. Yes, TMP was discovered, not invented. The features underlying TMP were introduced when templates were added to C++. All that was needed was for somebody to notice how they could be used in clever and unexpected ways.

TMP has two great strengths. First, it makes some things easy that would otherwise be hard or impossible. Second, because template metaprograms execute during C++ compilation, they can shift work from runtime to compile-time. One consequence is that some kinds of errors that are usually detected at runtime can be found during compilation. Another is that C++ programs making use of TMP can be more efficient in just about every way: smaller executables, shorter runtimes, lesser memory requirements. (However, a consequence of shifting work from runtime to compile-time is that compilation takes longer. Programs using TMP may take much longer to compile than their non-TMP counterparts.)

Consider the pseudocode for STL's advance introduced on page 228. (That's in Item 47. You may want to read that Item now, because in this Item, I'll assume you are familiar with the material in that one.) As on page 228, I've highlighted the pseudo part of the code:

 template<typename IterT, typename DistT> void advance(IterT& iter, DistT d) {   if (iter is a random access iterator) {     iter += d;                                     // use iterator arithmetic   }                                                // for random access iters   else {     if (d >= 0) { while (d--) ++iter; }            // use iterative calls to     else { while (d++) --iter; }                   // ++ or -- for other   }                                                // iterator categories }

We can use typeid to make the pseudocode real. That yields a "normal" C++ approach to this problem one that does all its work at runtime:

 template<typename IterT, typename DistT> void advance(IterT& iter, DistT d) {   if (typeid(typename std::iterator_traits<IterT>::iterator_category) ==       typeid(std::random_access_iterator_tag)) {      iter += d;                                     // use iterator arithmetic   }                                                 // for random access iters   else {     if (d >= 0) { while (d--) ++iter; }             // use iterative calls to     else { while (d++) --iter; }                    // ++ or -- for other   }                                                 // iterator categories }

Item 47 notes that this typeid-based approach is less efficient than the one using traits, because with this approach, (1) the type testing occurs at runtime instead of during compilation, and (2) the code to do the runtime type testing must be present in the executable. In fact, this example shows how TMP can be more efficient than a "normal" C++ program, because the traits approach is TMP. Remember, traits enable compile-time if...else computations on types.

I remarked earlier that some things are easier in TMP than in "normal" C++, and advance offers an example of that, too. Item 47 mentions that the typeid-based implementation of advance can lead to compilation problems, and here's an example where it does:

 std::list<int>::iterator iter; ... advance(iter, 10);                          // move iter 10 elements forward;                                             // won't compile with above impl.

Consider the version of advance that will be generated for the above call. After substituting iter's and 10's types for the template parameters IterT and DistT, we get this:

 void advance(std::list<int>::iterator& iter, int d) {   if (typeid(std::iterator_traits<std::list<int>::iterator>::iterator_category) ==       typeid(std::random_access_iterator_tag)) {     iter += d;                                        // error!   }   else {     if (d >= 0) { while (d--) ++iter; }     else { while (d++) --iter; }   } }

The problem is the highlighted line, the one using +=. In this case, we're trying to use += on a list<int>::iterator, but list<int>::iterator is a bidirectional iterator (see Item 47), so it doesn't support +=. Only random access iterators support +=. Now, we know we'll never try to execute the += line, because the typeid test will always fail for list<int>::iterators, but compilers are obliged to make sure that all source code is valid, even if it's not executed, and "iter += d" isn't valid when iter isn't a random access iterator. Contrast this with the traits-based TMP solution, where code for different types is split into separate functions, each of which uses only operations applicable to the types for which it is written.

TMP has been shown to be Turing-complete, which means that it is powerful enough to compute anything. Using TMP, you can declare variables, perform loops, write and call functions, etc. But such constructs look very different from their "normal" C++ counterparts. For example, Item 47 shows how if...else conditionals in TMP are expressed via templates and template specializations. But that's assembly-level TMP. Libraries for TMP (e.g., Boost's MPL see Item 55) offer a higher-level syntax, though still not something you'd mistake for "normal" C++.

For another glimpse into how things work in TMP, let's look at loops. TMP has no real looping construct, so the effect of loops is accomplished via recursion. (If you're not comfortable with recursion, you'll need to address that before venturing into TMP. It's largely a functional language, and recursion is to functional languages as TV is to American pop culture: inseparable.) Even the recursion isn't the normal kind, however, because TMP loops don't involve recursive function calls, they involve recursive template instantiations.

The "hello world" program of TMP is computing a factorial during compilation. It's not a very exciting program, but then again, neither is "hello world," yet both are helpful as language introductions. TMP factorial computation demonstrates looping through recursive template instantiation. It also demonstrates one way in which variables are created and used in TMP. Look:

 template<unsigned n>                 // general case: the value of struct Factorial {                   // Factorial<n> is n times the value                                      // of Factorial<n-1>   enum { value = n * Factorial<n-1>::value }; }; template<>                           // special case: the value of struct Factorial<0> {                // Factorial<0> is 1   enum { value = 1 }; };

Given this template metaprogram (really just the single template metafunction Factorial), you get the value of factorial(n) by referring to Factorial<n>::value.

The looping part of the code occurs where the template instantiation Factorial<n> references the template instantiation Factorial<n-1>. Like all good recursion, there's a special case that causes the recursion to terminate. Here, it's the template specialization Factorial<0>.

Each instantiation of the Factorial template is a struct, and each struct uses the enum hack (see Item 2) to declare a TMP variable named value. value is what holds the current value of the factorial computation. If TMP had a real looping construct, value would be updated each time around the loop. Since TMP uses recursive template instantiation in place of loops, each instantiation gets its own copy of value, and each copy has the proper value for its place in the "loop."

You could use Factorial like this:

 int main() {   std::cout << Factorial<5>::value;            // prints 120   std::cout << Factorial<10>::value;           // prints 3628800 }

If you think this is cooler than ice cream, you've got the makings of a template metaprogrammer. If the templates and specializations and recursive instantiations and enum hacks and the need to type things like Factorial<n-1>::value make your skin crawl, well, you're a pretty normal C++ programmer.

Of course, Factorial demonstrates the utility of TMP about as well as "hello world" demonstrates the utility of any conventional programming language. To grasp why TMP is worth knowing about, it's important to have a better understanding of what it can accomplish. Here are three examples:

Ensuring dimensional unit correctness. In scientific and engineering applications, it's essential that dimensional units (e.g., mass, distance, time, etc.) be combined correctly. Assigning a variable representing mass to a variable representing velocity, for example, is an error, but dividing a distance variable by a time variable and assigning the result to a velocity variable is fine. Using TMP, it's possible to ensure (during compilation) that all dimensional unit combinations in a program are correct, no matter how complex the calculations. (This is an example of how TMP can be used for early error detection.) One interesting aspect of this use of TMP is that fractional dimensional exponents can be supported. This requires that such fractions be reduced during compilation so that compilers can confirm, for example, that the unit time^1/2 is the same as time^4/8.
Optimizing matrix operations. Item 21 explains that some functions, including operator*, must return new objects, and Item 44 introduces the SquareMatrix class, so consider the following code:
```
 typedef SquareMatrix<double, 10000> BigMatrix; BigMatrix m1, m2, m3, m4, m5;               // create matrices and ...                                         // give them values BigMatrix result = m1 * m2 * m3 * m4 * m5;    // compute their product 
```
Calculating result in the "normal" way calls for the creation of four temporary matrices, one for the result of each call to operator*. Furthermore, the independent multiplications generate a sequence of four loops over the matrix elements. Using an advanced template technology related to TMP called expression templates, it's possible to eliminate the temporaries and merge the loops, all without changing the syntax of the client code above. The resulting software uses less memory and runs dramatically faster.
Generating custom design pattern implementations. Design patterns like Strategy (see Item 35), Observer, Visitor, etc. can be implemented in many ways. Using a TMP-based technology called policy-based design, it's possible to create templates representing independent design choices ("policies") that can be combined in arbitrary ways to yield pattern implementations with custom behavior. For example, this technique has been used to allow a few templates implementing smart pointer behavioral policies to generate (during compilation) any of hundreds of different smart pointer types. Generalized beyond the domain of programming artifacts like design patterns and smart pointers, this technology is a basis for what's known as generative programming.

TMP is not for everybody. The syntax is unintuitive, and tool support is weak. (Debuggers for template metaprograms? Ha!) Being an "accidental" language that was only relatively recently discovered, TMP programming conventions are still somewhat experimental. Nevertheless, the efficiency improvements afforded by shifting work from runtime to compile-time can be impressive, and the ability to express behavior that is difficult or impossible to implement at runtime is attractive, too.

TMP support is on the rise. It's likely that the next version of C++ will provide explicit support for it, and TR1 already does (see Item 54). Books are beginning to come out on the subject, and TMP information on the web just keeps getting richer. TMP will probably never be mainstream, but for some programmers especially library developers it's almost certain to be a staple.

Things to Remember

Template metaprogramming can shift work from runtime to compile-time, thus enabling earlier error detection and higher runtime performance.
TMP can be used to generate custom code based on combinations of policy choices, and it can also be used to avoid generating code inappropriate for particular types.