In Chapters 7 and 10 you had opportunities to observe the amount of overhead involved with function and procedure calls. The Itanium calling standards are actually quite modest in their impact as compared to some earlier architectures and programming environments. Nevertheless, the system software for high-performance architectures commonly provides options for reducing call overhead. We have previously alluded to moving functions inline i.e., copying the body of the function or procedure right into the instruction stream of the caller rather than setting up a call. The same function can be replicated several times. While doing that does increase the overall size of the machine-language program, virtual paging by the operating system can readily handle that aspect. Importantly, the total number of executed instructions decreases by the amount of calling and returning overhead that is avoided. Some compilers provide the ability to consider all the components of a program holistically and to move routines inline when that is advantageous. An even wider view would involve the linker in order to consider bringing certain library functions inline as well. We have prepared the very simple C program INLINE (Figure 11-2) in order to compare the effect of having an internal function square be placed inline or not. If the compiler front end can consider both the main program and the function together, its optimizer can then analyze opportunities to eliminate call overhead by moving portions of code inline. Table 11-9 compares the assembly language output from the full-featured cc compiler for HP-UX at optimization levels +O2 and +O3. We again abridge the output where the two results were similar, in order to highlight the differences. As expected (Table 11-3), the compiler pulls the small function square inline at optimization level +O3. Here, with such a very small function, the overall code length in the body of main is virtually the same whether the function is inline or not, but code expansion would occur for a longer function. Figure 11-2 INLINE: Program to illustrate bringing a function inline// This C program shows inline optimization. #include <stdio.h> main () { long long r2,x,y,z; long long square( long long ); printf("Enter 3 integers: "); scanf("%lld" "%lld" "%lld",&x,&y,&z); r2 = square(x)+square(y)+square(z); printf("%lld\n",r2); return 0; } long long square( long long n ) { return n*n; }
Many factors would need to be assessed in deciding whether bringing a function inline can be beneficial. These include: whether br.call or br.ret may cause pipeline bubbles, whether there could be thrashing in I-cache behavior if the function is at a distant address, whether the inline version would require significantly more registers, whether the inline function code can be interwoven with mainline code in bundles, and whether preparing arguments for the function call is time-consuming (perhaps some of them do not change from call to call). |