GCC Optimizer


The job of the optimizer is essentially to do one of three potentially orthogonal tasks . It can optimize the code to make it faster and smaller, it can optimize the code to make it faster but potentially larger, or it can simply reduce the size of the code but potentially make it slower. Luckily, we have control over the optimizer to instruct it on what we really want.

Note  

While the GCC optimizer does a good job of code optimization, it can sometimes result in larger or slower images (the opposite of what you may be after). It s important to test your image to ensure that you re getting what you expect. When you don t get what you expect, changing the options you provide to the optimizer can usually remedy the situation.

In this section, we ll look at the various mechanisms to optimize code using GCC.

Table 4.4: Optimization Settings and Descriptions

Optimization Level

Description

-O0

No optimization (the default level).

-O, -O1

Tries to reduce both compilation time and image size.

-O2

More optimizations than -O1 , but only those that don t increase size over speed (or vice-versa).

-Os

Optimize for resulting image size (all -O2 , except for those that increase size).

-O3

Even more optimizations ( -O2 , plus a couple more).

In its simplest form, GCC provides a number of levels of optimization that can be enabled. The -O ( oh) option permits the specification of five different optimization levels, listed in Table 4.4.

Enabling the optimizer simply entails specifying the given optimization level on the GCC command line. For example, in the following command line, we instruct the optimizer to focus on reducing the size of the resulting image:

 $ gcc -Os test.c -o test 

Note that it is possible to specify different optimization levels for each file that is to make up an image. There are certain optimizations (not contained within the optimization levels) that require all files to be compiled with the option if one is compiled with it. We ll not address any of those here.

Let s now dig into the optimization levels and see what each does and also identify the individual optimizations that are provided.

-O0 Optimization

With -O0 optimization (or no optimizer spec specified at all), the compiler will simply generate code that provides the expected results and is easily debuggable within a source code debugger (such as the GNU Debugger, gdb ). The compiler is also much faster when not optimizing, as the optimizer is not invoked at all.

-O1 Optimization (-O)

In the first level of optimization, the optimizer s goal is to compile as quickly as possible and also to reduce the resulting code size and execution time. Compilation may take more time with -O1 (over -O0 ), but depending upon the source being compiled, this is usually not noticeable.

Table 4.5: Optimizations Available in -O1

Optimization Level

Description

defer -pop

Defer popping function args from stack until necessary.

thread- jumps

Perform jump threading optimizations (to avoid jumps to jumps).

branch-probabilities

Use branch profiling to optimize branches.

cprop-registers

Perform a register copy-propagation optimization pass.

guess-branch-probability

Enable guessing of branch probabilities.

omit-frame-pointer

Do not generate stack- frames (if possible).

The individual optimizations in -O1 are shown in Table 4.5.

The -O1 optimization is usually a safe level if you still desire to safely debug the resulting image.

Note  

When specifying optimizations explicitly, the -f option is used to identify them. For example, to enable the defer-pop optimization, we would simply define this as -fdefer-pop . If the option is enabled via an optimization level, and you want it turned off, simply use the negative form -fno-defer-pop .

-O2 Optimization

The second optimization level provides even more optimizations (while including those in -O1 ) but does not include any optimizations that will trade speed for space (or vice-versa). The optimizations that are present in -O2 are listed in Table 4.6.

Note that Table 4.6 lists only those optimizations that are unique to -O2 , but it doesn t list the -O1 optimizations. It should be assumed that -O2 is the collection of optimizations shown in Tables 4.5 and 4.6.

-Os Optimization

The -Os optimization level simply disables some -O2 optimizations that would otherwise increase the size of the resulting image. Those optimizations that are disabled for -Os (that do appear in -O2 ) are -falign-labels , -falign-jumps , -falign-labels , and -falign-functions . Each of these has the potential to increase the size of the resulting image, and therefore they are disabled to help build a smaller executable.

Table 4.6: Optimizations Available in -O2

Optimization

Description

align- loops

Align the start of loops.

align-jumps

Align the labels that are only reachable by jumps.

align-labels

Align all labels.

align-functions

Align the beginning of functions.

optimize-sibling-calls

Optimize sibling and tail recursive calls.

cse-follow-jumps

When performing CSE, follow jumps to their targets.

cse-skip-blocks

When performing CSE, follow conditional jumps.

gcse

Perform global common subexpression elimination .

expensive-optimizations

Perform a set of expensive optimizations.

strength-reduce

Perform strength reduction optimizations.

rerun-cse-after-loop

Rerun CSE after loop optimizations.

rerun-loop-opt

Rerun the loop optimizer twice.

caller-saves

Enable register saving around function calls.

force-mem

Copy memory operands into registers before using.

peephole2

Enable an rtl peephole pass before sched2 .

regmove

Enable register move optimizations.

strict-aliasing

Assume that strict aliasing rules apply.

delete-null-pointer-checks

Delete useless null pointer checks.

reorder-blocks

Reorder basic blocks to improve code placement.

schedule-insns

Reschedule instructions before register allocation.

schedule-insns2

Reschedule instructions after register allocation.

-O3 Optimization

The -O3 optimization level is the highest level of optimization provided by GCC. In addition to those optimizations provided in -O2 , this level also includes those shown in Table 4.7.

Table 4.7: Optimizations Enabled in -O3 (Above -O2)

Optimization

Description

-finline-functions

Inline simple functions into the calling function.

-frename-registers

Optimize register allocation for architectures with large numbers of registers (makes debugging difficult).

Table 4.8: Architectures (CPUs) Supported for x86

Target CPU

-mcpu=

i386 DX/SX/CX/EX/SO

i386

i486 DX/SX/DX2/SL/SX2/DX4

i486

487

i486

Pentium

pentium

Pentium MMX

pentium-mmx

Pentium Pro

pentiumpro

Pentium II

pentium2

Celeron

pentium2

Pentium III

pentium3

Pentium IV

pentium4

Via C3

c3

Winchip 2

winchip2

Winchip C6-2

winchip-c6

AMD K5

i586

AMD K6

k6

AMD K6 II

k6-2

AMD K6 III

k6-3

AMD Athlon

athlon

AMD Athlon 4

athlon

AMD Athlon XP/MP

athlon

AMD Duron

athlon

AMD Tbird

athlon-tbird

Architectural Optimizations

While standard optimization levels can provide meaningful improvements on software performance and code size, specifying the target architecture can also be very useful. The -mcpu option tells the compiler to generate instructions for the CPU type as specified. For the standard 86 target, Table 4.8 lists some of the options.

So if we were compiling specifically for the Intel Celeron architecture, we d use the following command line:

 $ gcc -mcpu=pentium2 test.c -o test 

Of course, combining the -mcpu option with an optimization level can lead to additional performance benefits. One very important point to note is that once we compile for a given CPU, it may not run on another. Therefore, if we re more interested in an image running on a variety of CPUs, allowing the compiler to pick the default (i386) will support any of the X86 architectures.




GNU/Linux Application Programming
GNU/Linux Application Programming (Programming Series)
ISBN: 1584505680
EAN: 2147483647
Year: 2006
Pages: 203
Authors: M. Tim Jones

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net