22.1 Direct, Indirect, and Inline Calls | C++ Templates: The Complete Guide

Ru-Brd

Typically, when a C or C++ compiler encounters the definition of a noninline function, it generates and stores machine code for that function in an object file. It also creates a name associated with the machine code; in C, this name is typically the function name itself, but in C++ the name is usually extended with an encoding of the parameter types to allow for unique names even when a function is overloaded (the resulting name is usually called a mangled name , although the term decorated name is also used). Similarly, when the compiler encounters a call site like

 f();

it generates machine code for a call to a function of that type. For most machine languages, the call instruction itself necessitates the starting address of the routine. This address can be part of the instruction (in which case the instruction is called a direct call ), or it may reside somewhere in memory or in a machine register ( indirect call ). Almost all modern computer architectures provide both types of routine calling instructions, but (for reasons that are beyond the scope of this book) direct calls are executed more efficiently than indirect calls. In fact, as computer architectures get more sophisticated, it appears that the performance gap between direct calls and indirect calls increases . Hence, compilers generally attempt to generate a direct call instruction when possible.

In general, a compiler does not know at which address a function is located (the function could, for example, be in another translation unit). However, if the compiler knows the name of the function, it generates a direct call instruction with a dummy address. In addition, it generates an entry in the generated object file directing the linker to update that instruction to point to the address of a function with the given name. Because the linker sees the object files created from all the translation units, it knows the call sites as well as the definition sites and hence is able to patch up all the direct call sites. ^[1]

^[1] The linker performs a similar role for accesses to namespace scope variables , for example.

Unfortunately, when the name of the function is not available, an indirect call must be used. This is usually the case for calls through pointers to functions:

 void foo (void (*pf)())  {      pf();  // indirect call through pointer to function  pf  }

In this example it is, in general, not possible for a compiler to know to which function the parameter pf points (after all, it is most likely different for a different invocation of foo() ). Hence, the technique of having the linker match names does not work. The call destination is not known until the code is actually executed.

Although a modern computer can often execute a direct call instruction about as quickly as other common instructions (for example, an instruction to add two integers), function calls can still be a serious performance impediment. The following example shows this:

 int f1(int const & r)  {      return ++(int&)r;  // not reasonable, but legal  }  int f2(int const & r)  {      return r;  }  int f3()  {      return 42;  }  int foo()  {      int param = 0;      int answer = 0;      answer = f1(param);      f2(param);      f3();      return answer + param;  }

Function f1() takes a const int reference argument. Ordinarily, this means that the function does not modify the object that is passed by reference. However, if the object passed in is a modifiable value, a C++ program can legally cast away the const property and change the value of the object anyway. (You could argue that this is not reasonable; however, it is standard C++.) Function f1() does exactly this. Because of this possibility, a compiler that optimizes generated code on a perfunction basis (and most compilers do) has to assume that every function that takes references or pointers to objects may modify those objects. Note that in general a compiler sees only the declaration of a function because the definition (the implementation ) is in another translation unit.

In the code example, most compilers therefore assume that f2() can modify answer too (even though it does not). In fact, the compiler cannot even assume that f3() does not modify the local variable param . Indeed, the functions f1() and f2() had an opportunity to store the address of param in a globally accessible pointer. From the limited perspective of the compiler, it is therefore not impossible for f3() to use such a globally accessible pointer to modify param . The net effect is that ordinary function calls confuse most compilers regarding what happened to various objects, forcing them often to store their intermediate values in main memory instead of keeping them in fast registers and preventing many optimizations that involve the movement of machine code (the function call often forms a barrier for code motion).

Advanced C++ compilation systems exist that are capable of tracking many instances of such potential aliasing (in the scope of f1() , the expression r is an alias for the object named param in the scope of foo() ). However, this ability comes at a price: compilation speed, resource usage, and code reliability. Projects that otherwise build in minutes sometimes take hours or even days to be compiled (provided the necessary gigabytes of memory are available to the compiler). Furthermore, such compilation systems are typically much more complex and are therefore more often prone to generating wrong code. Even when a superoptimizing compiler generates correct code, the source code may contain unintended violations of subtle C and C++ aliasing rules. ^[2] Some of these violations are fairly harmless with ordinary optimizers, but superoptimizers may turn them into true bugs .

^[2] For example, accessing an unsigned int through a pointer to a regular (signed) int is such an error.

However, ordinary optimizers can be helped tremendously by the process of inlining. Suppose f1() , f2() , and f3() are declared inline. The compiler can then transform the code of foo() to something essentially equivalent to

 int foo'()  {      int param = 0;      int answer = 0;      answer = ++(int&)param;      return answer + param;  }

which a very ordinary optimizer can turn into

 int foo''()  {      return 2;  }

This illustrates that the benefit of inlining lies not only in the avoidance of executing machine code for a calling sequence but also (and often more important) in making visible to an optimizer what happens to the variables passed to the function.

What does this have to do with templates? Well, as we see later, it is sometimes possible using template-based callbacks to generate code that involves direct or even inline calls when more traditional callbacks would result in indirect calls. The savings in running time can be considerable.

Ru-Brd