9.3 Parsing Templates | C++ Templates: The Complete Guide

Ru-Brd

Two fundamental activities of compilers for most programming languages are tokenization ”also called scanning or lexing ”and parsing. The tokenization process reads the source code as a sequence of characters and generates a sequence of tokens from it. For example, on seeing the sequence of characters int* p=0; , the "tokenizer" will generate token descriptions for a keyword int ,a symbol/operator * , an identifier p , a symbol/operator = , an integer literal , and a symbol/operator ; .

A parser will then find known patterns in the token sequence by recursively reducing tokens or previously found patterns into higher level constructs. For example, the token is a valid expression, the combination * followed by an identifier p is a valid declarator, and that declarator followed by " = " followed by the expression " " is also a valid declarator. Finally, the keyword int is a known type name , and, when followed by the declarator *p=0 , you get the initializating declaration of p .

9.3.1 Context Sensitivity in Nontemplates

As you may know or expect, tokenizing is easier than parsing. Fortunately, parsing is a subject for which a solid theory has been developed, and many useful languages are not hard to parse using this theory. However, the theory works best for so-called context-free language , and we have already noted that C++ is context sensitive. To handle this, a C++ compiler will couple a symbol table to the tokenizer and parser: When a declaration is parsed, it is entered in the symbol table. When the tokenizer finds an identifier, it looks it up and annotates the resulting token if it finds a type.

For example, if the C++ compiler sees

x*

the tokenizer looks up x . If it finds a type, the parser sees

 identifier, type, x  symbol, *

and concludes that a declaration has started. However, if x is not found to be a type, then the parser receives from the tokenizer

 identifier, nontype, x  symbol, *

and the construct can be parsed validly only as a multiplication. The details of these principles are dependent on the particular implementation strategy, but the gist should be there.

Another example of context sensitivity is illustrated in the following expression:

 X<1>(0)

If X is the name of a class template, then the previous expression casts the integer to the type X<1> generated from that template. If X is not a template, then the previous expression is equivalent to

 (X<1)>0

In other words, X is compared with 1 , and the result of that comparison ”true or false, implicitly converted to 1 or in this case ”is compared with . Although code like this is rarely used, it is valid C++ (and valid C, for that matter). A C++ parser will therefore look up names appearing before a < and treat the < as an angle bracket only if the name is that of a template; otherwise , the < is an ordinary "less than" operator.

This form of context sensitivity is an unfortunate consequence of having chosen angle brackets to delimit template argument lists. Here is another such consequence:

 template<bool B>  class Invert {    public:      static bool const result = !B;  };  void g()  {      bool test = B<(1>0)>::result;  // parentheses required!  }

If the parentheses in B<(1>0)> were omitted, the "larger than" symbol would be mistaken for the closing of the template argument list. This would make the code invalid because the compiler would read it to be equivalent to ((B<1>))0>::result . ^[3]

^[3] Note the double parentheses to avoid parsing (B<1>)0 as a cast operation ”yet another source of syntactic ambiguity.

The tokenizer isn't spared problems with the angle-bracket notation either. We have already cautioned (see Section 3.2 on page 27) to introduce whitespace when nesting template-ids, as in

 List<List<int> > a;  //  ^--  whitespace is not optional!

Indeed, the whitespace between the two closing angle brackets is not optional: Without this whitespace, the two > characters combine into a right shift token >> , and hence are never treated as two separate tokens. This is a consequence of the so-called maximum munch tokenization principle: A C++ implementation must collect as many consecutive characters as possible into a token.

This particular issue is a very common stumbling block for beginning template users. Several C++ compiler implementations have therefore been modified to recognize this situation and treat the >> as two separate > in this particular situation (and with a warning that it is not really valid C++). The C++ committee is also considering mandating this behavior in a revision of the standard (see Section 13.1 on page 205).

Another example of the maximum munch principle is the less known fact that the scope resolution operator ( :: ) must also be used carefully with angle brackets:

 class X {   };  List<::X> many_X;  // SYNTAX ERROR!

The problem in the previous example is that the sequence of characters <: is a so-called digraph ^[4] : an alternative representation for the symbol [ . Hence, the compiler really sees the equivalent of List[:X> many_X; , which makes no sense at all. Again, the solution is to add some whitespace:

^[4] Digraphs were added to the language to ease the input of C++ source with international keyboards that lack certain characters (such as # , [ , and ] ).

 List< ::X> many_X;  //  ^--  whitespace is not optional!

9.3.2 Dependent Names of Types

The problem with names in templates is that they cannot always be sufficiently classified . In particular, one template cannot look into another template because the contents of that other template can be made invalid by an explicit specialization (see Chapter 12 for details). The following contrived example illustrates this:

 template<typename T>  class Trap {    public:      enum{x};  // (1)  x  is not a type here  };  template<typename T>  class Victim {    public:      int y;      void poof() {          Trap<T>::x*y;  // (2) declaration or multiplication?  }  };  template<>  class Trap<void> {  // evil specialization!  public:      typedef int x;  // (3)  x  is a type here  };  void boom(Trap<void>& bomb)  {      bomb.poof();  }

As the compiler is parsing line (2), it must decide whether it is seeing a declaration or a multiplication. This decision in turn depends on whether the dependent qualified name Trap<T>::x is a type name. It may be tempting to look in the template Trap at this point and find that, according to line (1), Trap<T>::x is not a type, which would leave us to believe that line (2) is a multiplication. However, a little later the source corrupts this idea by overriding the generic X<T>::x for the case where T is void . In this case, Trap<T>::x is in fact type int .

The language definition resolves this problem by specifying that in general a dependent qualified name does not denote a type unless that name is prefixed with the keyword typename . If it turns out, after substituting template arguments, that the name is not the name of a type, the program is invalid and your C++ compiler should complain at instantiation time. Note that this use of typename is different from the use to denote template type parameters. Unlike type parameters, you cannot equivalently replace typename with class . The typename prefix to a name is required when the name

Appears in a template
Is qualified
Is not used as in a list of base class specifications or in a list of member initializers introducing a constructor definition
Is dependent on a template parameter

Furthermore, the typename prefix is not allowed unless at least the first three previous conditions hold. To illustrate this, consider the following erroneous example ^[5] :

^[5] From [VandevoordeSolutions], proving once and for all that C++ promotes code reuse.

 template<typename ₁ T>  struct S: typename ₂ X<T>::Base {      S(): typename ₃ X<T>::Base(typename ₄ X<T>::Base(0)) {}      typename ₅ X<T> f() {          typename ₆ X<T>::C * p;  // declaration of pointer  p          X<T>::D * q;  // multiplication!  }      typename ₇ X<int>::C * s;  };  struct U {      typename ₈ X<int>::C * pc;  };

Each occurrence of typename ”correct or not ”is numbered with a subscript for easy reference. The first, typename ₁ , indicates a template parameter. The previous rules do not apply to this first use. The second and third typename s are disallowed by the third item in the previous rules. Names of base classes in these two contexts cannot be preceded by typename . However, typename ₄ is required. Here, the name of the base class is not used to denote what is being initialized or derived from. Instead, the name is part of an expression to construct a temporary X<T>::Base from its argument (a sort of conversion, if you will). The fifth typename is prohibited because the name that follows it, X<T> , is not a qualified name. The sixth occurrence is required if this statement is to declare a pointer. The next line omits the typename keyword and is, therefore, interpreted by the compiler as a multiplication. The seventh typename is optional because it satisfies all the previous rules except the last. Finally, typename ₈ is prohibited because it is not used inside a template.

9.3.3 Dependent Names of Templates

A problem very similar to the one encountered in the previous section occurs when a name of a template is dependent. In general, a C++ compiler is required to treat a < following the name of a template as the beginning of a template argument list; otherwise, it is a "less than" operator. As is the case with type names, a compiler has to assume that a dependent name does not refer to a template unless the programmer provides extra information using the keyword template :

 template<typename T>  class Shell {    public:      template<int N>      class In {        public:          template<int M>          class Deep {              public:              virtual void f();          };      };  };  template<typename T, int N>  class Weird {    public:      void case1(Shell<T>::template In<N>::template Deep<N>* p) {          p->template Deep<N>::f();  // inhibit virtual call  }      void case2(Shell<T>::template In<T>::template Deep<T>& p) {          p.template Deep<N>::f();  // inhibit virtual call  }  };

This somewhat intricate example shows how all the operators that can qualify a name ( :: , -> , and . ) may need to be followed by the keyword template . Specifically, this is the case whenever the type of the name or expression preceding the qualifying operator is dependent on a template parameter, and the name that follows the operator is a template-id (in other words, a template name followed by template arguments in angle brackets). For example, in the expression

 p.template Deep<N>::f()

the type of p depends on the template parameter T . Consequently, a C++ compiler cannot look up Deep to see if it is a template, and we must explicitly indicate that Deep is the name of a template by inserting the prefix template . Without this prefix, p.Deep<N>::f() is parsed as ((p.Deep)<N)>f() . Note also that this may need to happen multiple times within a qualified name because qualifiers themselves may be qualified with a dependent qualifier. (This is illustrated by the declaration of the parameters of case1 and case2 in the previous example.)

If the keyword template is omitted in cases such as these, the opening and closing angle brackets are parsed as "less than" and "greater than" operators. However, if the keyword is not strictly needed, it is in fact not allowed at all. ^[6] You cannot "just sprinkle" template qualifiers throughout your code.

^[6] This is actually not totally clear from the text of the standard, but the people who worked on that part of the text seem to agree.

9.3.4 Dependent Names in Using-Declarations

Using-declarations can bring in names from two places: namespaces and classes. The namespace case is not relevant in this context because there are no such things as namespace templates . Using-declarations that bring in names from classes can, in fact, bring in names only from a base class to a derived class. Such using-declarations behave like "symbolic links" or "shortcuts" in the derived class to the base declaration, thereby allowing the members of the derived class to access the nominated name as if it were actually a member declared in that derived class. A short nontemplate example illustrates the idea better than mere words:

 class BX {    public:      void f(int);      void f(char const*);      void g();  };  class DX : private BX {    public:      using BX::f;  };

The previous using-declaration brings in the name f of the base class BX into the derived class DX . In this case, this name is associated with two different declarations, thus emphasizing that we are dealing with a mechanism for names and not individual declarations of such names. Note also that this kind of using-declaration can make accessible an otherwise inaccessible member. The base BX (and thus its members) are private to the class DX , except that the functions BX::f have been introduced in the public interface of DX and are therefore available to the clients of DX . Because using-declarations enable this, the earlier mechanism of access declarations is deprecated in C++ (meaning that future revisions of C++ may not contain the mechanism):

 class DX : private BX {    public:      BX::f;  // access declaration syntax is deprecated   // use  using BX::f  instead  };

By now you can probably perceive the problem when a using-declaration brings in a name from a dependent class. Although we know about the name, we don't know whether it's the name of a type, a template, or something else:

 template<typename T>  class BXT {    public:      typedef T Mystery;      template<typename U>      struct Magic;  };  template<typename T>  class DXTT : private BXT<T> {    public:      using typename BXT<T>::Mystery;      Mystery* p;  // would be a syntax error if not for the  typename  };

Again, if we want a dependent name to be brought in by a using-declaration to denote a type, we must explicitly say so by inserting the keyword typename . Strangely, the C++ standard does not provide for a similar mechanism to mark such dependent names as templates. The following snippet illustrates the problem:

 template<typename T>  class DXTM : private BXT<T> {    public:      using BXT<T>::template Magic;  // ERROR: not standard  Magic<T>* plink;  // SYNTAX ERROR:  Magic  is not a  };  //               known template

Most likely this is an oversight in the standard specifications and future revisions will probably make the previous construct valid.

9.3.5 ADL and Explicit Template Arguments

Consider the following example:

 namespace N {      class X {   };      template<int I> void select(X*);  }  void g (N::X* xp)  {      select<3>(xp);  // ERROR: no ADL!  }

In this example, we may expect that the template select() is found through ADL in the call select<3>(xp) . However, this is not the case because a compiler cannot decide that xp is a function call argument until it has decided that <3> is a template argument list. Conversely, we cannot decide that <3> is a template argument list until we have found select() to be a template. Because this chicken and egg problem cannot be resolved, the expression is parsed as (select<3)>(xp) , which makes no sense.

Ru-Brd