10.4 Implementation Schemes | C++ Templates: The Complete Guide

Ru-Brd

In this section we review some ways in which popular C++ implementations support the inclusion model. All these implementations rely on two classic components : a compiler and a linker . The compiler translates source code to object files, which contain machine code with symbolic annotations (cross-referencing other object files and libraries). The linker creates executable programs or libraries by combining the object files and resolving the symbolic cross-references they contain. In what follows , we assume such a model even though it is entirely possible (but not popular) to implement C++ in other ways. For example, you could imagine a C++ interpreter.

When a class template specialization is used in multiple translation units, a compiler will repeat the instantiation process in every translation unit. This poses very few problems because class definitions do not directly create low-level code. They are used only internally by a C++ implementation to verify and interpret various other expressions and declarations. In this regard, the multiple instantiations of a class definition are not materially different from the multiple inclusions of a class definitiontypically through header file inclusionin various translation units.

However, if you instantiate a (noninline) function template, the situation may be different. If you were to provide multiple definitions of an ordinary noninline function, you would violate the ODR. Assume, for example, that you compile and link a program consisting of the following two files:

  //   File   a.cpp   :  int main()  {  }  //   File   b.cpp   :  int main()  {  }

C++ compilers will compile each module separately without any problems because indeed they are valid C++ translation units. However, your linker will most likely protest if you try to link the two together. Duplicate definitions are not allowed.

In contrast, consider the template case:

  //   File   t.hpp   :   // common header (inclusion model)  template<typename T>  class S {    public:      void f();  };  template<typename T>  void S::f()  // member definition  {  }  void helper(S<int>*);  //   File   a.cpp   :  #include "t.hpp"  void helper(S<int>* s)  {      s->f();  // (1) first point of instantiation of  S::f  }  //   File   b.cpp   :  #include "t.hpp"  int main()  {      S<int> s;      helper(&s);      s.f();  // (2) second point of instantiation of  S::f  }

If the linker treats instantiated members of templates just like it does ordinary functions or member functions, the compiler needs to ensure that it generates code at only one of the two POIs: at points (1) or (2), but not both. To achieve this, a compiler has to carry information from one translation unit to the other, and this is something C++ compilers were never required to do prior to the introduction of templates. In what follows, we discuss the three broad classes of solutions that are en vogue among C++ implementers.

Note that the same problem occurs with all linkable entities produced by template instantiation: instantiated function templates and member function templates, as well as instantiated static data members.

10.4.1 Greedy Instantiation

The first C++ compilers that popularized greedy instantiation were produced by a company called Borland. It has grown to be the most commonly used technique among the various C++ systems, and in particular it is almost universally the mechanism of choice in development environments for Microsoft Windows-based personal computers.

Greedy instantiation assumes that the linker is aware that certain entitieslinkable template instantiations in particularmay in fact appear in duplicate across the various object files and libraries. The compiler will typically mark these entities in a special way. When the linker finds multiple instances, it keeps one and discards all the others. There is not much more to it than that.

In theory, greedy instantiation has some serious drawbacks:

The compiler may be wasting time on generating and optimizing N instantiations, of which only one will be kept.
Linkers typically do not check that two instantiations are identical because some insignificant differences in generated code can validly occur for multiple instances of one template specialization. These small differences should not cause the linker to fail. (These differences could result from tiny differences in the state of the compiler at the instantiation times.) However, this often also results in the linker not noticing more substantial differences, such as when one instantiation was compiled for maximum performance whereas the other was compiled for most convenient debugging.
The sum of all the object files could potentially be much larger than with alternatives because the same code may be duplicated many times.

In practice, these shortcomings do not seem to have caused major problems. Perhaps this is because greedy instantiation contrasts very favorably with the alternatives in one important aspect: The traditional source-object dependency is preserved. In particular, one translation unit generates but one object file, and each object file contains compiled code for all the linkable definitions in the corresponding source file (which includes the instantiated definitions).

Finally, it may be worth noting that the linker mechanism that allows duplicate definitions of linkable entities is also typically used to handle duplicate spilled inlined functions ^[9] and virtual function dispatch tables . ^[10] If this mechanism is not available, the alternative is usually to emit these items with internal linkage, at the expense of generating larger code.

^[9] When a compiler is unable to "inline" every call to a function that you marked with the keyword inline ,a separate copy of the function is emitted in the object file. This may happen in multiple object files.

^[10] Virtual function calls are usually implemented as indirect calls through a table of pointers to functions. See [LippmanObjMod] for a thorough study of such implementation aspects of C++.

10.4.2 Queried Instantiation

The most popular implementation in this category is provided by a company called Sun Microsystems , starting with release 4.0 of their C++ compiler. Queried instantiation is conceptually remarkably simple and elegant and yet it is chronologically the most recent class of instantiation schemes that we review here. In this scheme, a database shared by the compilations of all translation units participating in a program is maintained . This database keeps track of which specializations have been instantiated and on what source code they depend. The generated specializations themselves are typically stored with this information in the database. Whenever a point of instantiation for a linkable entity is encountered , one of three things can happen:

No specialization is available: In this case, instantiation occurs, and the resulting specialization is entered in the database.
A specialization is available but is out of date because source changes have occurred since it was generated. Here, too, instantiation occurs, but the resulting specialization replaces the one previously stored in the database.
An up-to-date specialization is available in the database. Nothing needs to be done.

Although conceptually simple, this design presents a few implementation challenges:

It is not trivial to maintain correctly the dependencies of the database contents with respect to the state of the source code. Although it is not incorrect to mistake the third case for the second, doing so increases the amount of work done by the compiler (and hence overall build time).
It is quite common to compile multiple source files concurrently. Hence, an industrial-strength implementation needs to provide the appropriate amount of concurrency control in the database.

Despite these challenges, the scheme can be implemented quite efficiently . Furthermore, there are no obvious pathological cases that would make this solution scale poorly, in contrast, for example, with greedy instantiation, which may lead to a lot of wasted work.

The use of a database may also present some problems to the programmer, unfortunately . The origin of most of these problems lies in that fact that the traditional compilation model inherited from most C compilers no longer applies: A single translation unit no longer produces a single stand-alone object file. Assume, for example, that you wish to link your final program. This link operation needs not only the contents of each of the object files associated with your various translation units, but also the object files stored in the database. Similarly, if you create a binary library, you need to ensure that the tool that creates that librarytypically a linker or an archiveris aware of the contents of the database. More generally , any tool that operates on object files may need to be made aware of the database contents. Many of these problems can be alleviated by not storing the instantiations in the database, but instead by emitting the object code in the object file that caused the instantiation in the first place.

Libraries present yet another challenge. A number of generated specializations may be packaged in a library. When the library is added to another project, that project's database may need to be made aware of the instantiations that are already available. If not, and if the project creates some of its own points of instantiation for the specializations present in the library, duplicate instantiation may occur. A possible strategy to deal with such situations is to use the same linker technology that enables greedy instantiation: Make the linker aware of generated specializations and have it weed out duplicates (which should nonetheless occur much less frequently than with greedy instantiation). Various other subtle arrangements of sources, object files, and libraries can lead to frustrating problems such as missing instantiations because the object code containing the required instantiation was not linked in the final executable program. Such problems should not be construed as shortcomings of the queried instantiation approach but rather should be taken as a solid argument against complex and subtle software build environments.

10.4.3 Iterated Instantiation

The first compiler to support C++ templates was Cfront 3.0a direct descendant of the compiler that Bjarne Stroustrup wrote to develop the language. ^[11] An inflexible constraint on Cfront was that it had to be very portable from platform to platform, and this meant that it (1) used the C language as a common target representation across all target platforms and (b) used the local target linker. In particular, this implied that the linker was not aware of templates. In fact, Cfront emitted template instantiations as ordinary C functions, and therefore it had to avoid duplicate instantiations. Although the Cfront source model was different from the standard inclusion and separation models, its instantiation strategy can be adapted to fit the inclusion model. As such, it also merits recognition as the first incarnation of iterated instantiation. The Cfront iteration can be described as follows:

^[11] Do not let this phrase mislead you into thinking that Cfront was an abstract prototype: It was used in industrial contexts, and formed the basis of many commercial C++ compiler offerings. Release 3.0 appeared in 1991 but was plagued with bugs . Version 3.0.1 followed soon thereafter and made templates usable.

Compile the sources without instantiating any required linkable specializations.
Link the object files using a prelinker .
The prelinker invokes the linker and parses its error messages to determine whether any are the result of missing instantiations. If so, the prelinker invokes the compiler on sources that contain the needed template definitions, with options to generate the missing instantiations.
Repeat step 3 if any definitions are generated.

The need to iterate step 3 is prompted by the observation that the instantiation of one linkable entity may lead to the need for another such entity that was not yet instantiated. Eventually the iteration will "converge," and the linker will succeed in building a complete program.

The drawbacks of the original Cfront scheme are quite severe:

The perceived time to link is augmented not only by the prelinker overhead but also by the cost of every required recompilation and relinking. Some users of Cfront-based systems reported link times of "a few days" compared with "about an hour " with the alternative schemes reported earlier.
Diagnostics (errors, warnings) are delayed until link time. This is especially painful when linking becomes expensive and the developer must wait hours just to find out about a typo in a template definition.
Special care must be taken to remember where the source containing a particular definition is located (step 1). Cfront in particular used a central repository, which had to deal with some of the challenges of the central database in the queried instantiation approach. In particular, the original Cfront implementation was not engineered to support concurrent compilations.

Despite these shortcomings, the iteration principle was refined for the two compilation systems that would later pioneer the more advanced C++ template features ^[12] : the Edison Design Group's (EDG) implementation and HP's aC++. ^[13] In what follows, we expand on the technique developed by EDG to demonstrate its C++ front-end technology. ^[14]

^[12] We are not unbiased . However, the first publically available implementations of such things as member templates, partial specialization, modern name lookup in templates, and the template separation model came out of these companies.

^[13] HP's aC++ was grown out of technology from a company called Taligent (later absorbed by International Business Machines, or IBM). HP also added greedy instantiation to aC++ and made that the default mechanism.

^[14] EDG does not sell C++ implementations to end users. Instead, they provide an essential but portable component of such an implementation to other software vendors who can then integrate this into a complete platform-specific solution. Some of EDG's customers choose to keep their portable instantiation iteration, but they can just as easily adapt it to a greedy instantiation environment (which is not portable because it depends on special linker capabilities).

EDG's iteration enables two-way communication between the prelinker and the various compilation steps: The prelinker can direct instantiations performed for a particular translation unit through an instantiation request file, and the compiler can notify the prelinker about possible points of instantiation either by embedding information in the object files or by producing separate template information files. The instantiation request files and the template information files have names that correspond to the name of the file being compiled, but with suffixes .ii and .ti respectively. The iteration works as follows:

While compiling the source of a translation unit, the EDG compiler reads the corresponding .ii file if one exists and creates the instantiations directed therein. At the same time, it writes which points of instantiation it could have honored to the object file resulting from this compilation or to a separate .ti file. It also writes how this file is compiled.
The link step is intercepted by the prelinker, which examines the object files and corresponding .ti files that participate in the link step. For each instantiation that has not yet been generated, the required directive is added to a .ii file corresponding to a translation unit that can honor the directive.
If any .ii files are modified, the prelinker reinvokes the compiler (step 1) for the corresponding sources files, and the prelinker iteration repeats.
When closure is been achieved, a single actual link step is performed.

This scheme addresses the issue of concurrent builds by maintaining global information on a pertranslation-unit basis. The perceived link time can still be significantly higher than with greedy and queried instantiation, but because no actual linking is performed, the growth is much less catastrophic. More important, because the prelinker maintains global consistency among the .ii files, these files can be reused in the next build cycle. Specifically, after having made some changes to the source, the programmer restarts a build of the files affected by the modifications. Each resulting compilation immediately instantiates the specializations requested by the .ii files that lingered from the previous compilation of that file and chances are good that the prelinker will not need to trigger additional recompiles at link time.

In practice, EDG's scheme works quite well, and, although a build "from scratch" is typically more time-consuming than the alternative schemes, subsequent build times are quite competitive.

Ru-Brd