5.3 Compile-Time Versus Run-Time Issues | Applied C++: Practical Techniques for Building Better Software

If you have ever recompiled your code on a newer version of a compiler or on a different platform, you will quickly discover that there are new errors and warnings produced. These errors are a result of the level of compile-time checking your compiler is performing. You will particularly notice these errors with templates, because compilers sometimes have problems deciding what to do when presented with conflicting specifications.

Unfortunately, compilers cannot perform all checks at compile time. Some must be deferred until run time, such as casting references with dynamic_cast<> or exception specifications (see page 142). The design of the application is also important. For example, virtual function calls require one or more run-time lookups to determine which function to call.

Your designs will be more robust if you can shift as much burden as possible to your compiler. You need to pay careful attention to any warnings that are issued because they can identify design weaknesses or potential problems. Many developers consider all warnings as extraneous messages. This practice, however, allows certain mistakes into the production code.

5.3.1 Compiler Issues

When you build your application to run across many platforms, you will certainly encounter issues with the different compilers. Each compiler handles the code differently according to its features and quirks . Even if your application is only intended for a single platform, you still have to consider compiler issues, because compiler enhancements, patches, and other upgrades are frequently required, thus changing the compilation environment.

COMPILER WARNINGS

Let's look at an example of some potentially dangerous code and the warnings it produces using a variety of compilers:

 1:  #define SIZE 5000 2:  #define N  500000 3: 4:  int main() 5:  { 6:    int i = SIZE * N;       // Overflows an integer 7:    long l = 0; 8:    unsigned long ul = 0; 9: 10:   if (l > ul) {}          // signed/unsigned mismatch 11:   return 0; 12: }

This sample code is provided only as a test for the compilers; we don't use it anywhere in our image framework. The issues it raises, however, are ones that everyone encounters at some point during development. For example, there are two real problems evident in lines 6 and 10.

In line 6:

 int i = SIZE * N;

seems fine until you look at the values of SIZE and N . For 32-bit integers, SIZE*N does not fit. To correct this problem you can use an unsigned int (appropriate in this example), or you can use a larger signed quantity. This, however, will not fix the problem on many embedded platforms where integers are only 16 bits. In line 10:

 if (l > ul)

there are problems when the value of l is negative. The compiler converts the signed quantity to unsigned and then makes the comparison. For example, if we rewrite this example as:

 long l = -1;   unsigned long ul = 0;   if (l > ul)     std::cout << l << " is greater than " << ul << std::endl;

you will see:

 -1 is greater than 0

displayed on the console.

There are also two smaller issues in our example. In line 6, the variable i is set but never referenced. Most often this occurs when other code that used the variable is removed during the course of development. This condition can also indicate that the function is unfinished . The second issue is at line 12; the file has no newline character at the end of it. Some compilers will generate a warning if the last line of the file is an actual line of code.

So, what happens when this code is compiled? Let's take a look at the output of a few different compilers.

GNU GCC

The GNU compiler, gcc, was tested on various platforms (Solaris, FreeBSD, and AIX) and performance was identical on each one. With version 2.95.3 of gcc, no warnings are reported . On version 3.0.3, gcc reports the following:

 main.cpp:49:1: warning: no newline at end of file

But that is using the compiler with no arguments ( g++ main.cpp ). If we use" g++ -Wall main.cpp " to enable all warnings, we see:

 main.cpp: In function 'int main()': main.cpp:39: warning: comparison between signed and unsigned integer    expressions main.cpp:35: warning: unused variable 'int i' main.cpp:49:1: warning: no newline at end of file

SUN N ATIVE C OMPILER

With Sun's native compiler, there were no warnings.

SGI I RIX

The SGI native Irix compiler reports the following:

 cc-1061 cc: WARNING File = compilertest.cpp, Line = 52   The integer operation result is out of range.   int i = SIZE * N;       // Overflows an integer                ^ cc-1001 cc: WARNING File = compilertest.cpp, Line = 66   The source file does not end with a new-line character.   }    ^ cc-1174 cc: WARNING File = compilertest.cpp, Line = 52   The variable "i" was declared but never referenced.   int i = SIZE * N;       // Overflows an integer       ^

MICROSOFT VISUAL C++

Microsoft Visual C++ (versions 6 and 7) produce the following:

 main.cpp(48): warning C4307: *' : integral constant overflow main.cpp(52): warning C4018: '>' : signed/unsigned mismatch

All of this tells us what we already know: different compilers produce varying amounts of output. Most compilers have a myriad of command-line flags adding even more dimensions to this problem. We have a rule for standardizing the level of code quality across platforms.

All production code should compile cleanly, using the highest level of detection. For gcc, use the -Wall command-line flag; for Microsoft Visual C++, use /W4 (warning level 4).

Let's review each of the warning messages. We have a few comments about each type of warning. For line 6:

 int i = SIZE * N;

some compilers classified this as a warning. Other compilers, such as the native Sun compiler, didn't list any warnings. Even though we are responsible for avoiding overflow errors, in our opinion, all compilers should have flagged this problem as an error, so that it must be fixed before the code will compile. For example, if i had been defined as a short or char , the odds of a problem like this occurring are much greater.

For line 10:

 if (l > ul)

some compilers generated a warning. If l must remain a signed quantity and ul an unsigned quantity, then you need to cast one of the quantities so that they are of the same type. Often this type of problem arises innocently because the signed variable is used only as a counter, as shown in the following brief example:

 for (int i=0; i<N; i++) {   ...   if (i > ul) ... }

Our signed quantity, i , cannot be negative, so the comparison is always valid. It is easy to write code like this, because the alternative:

 for (unsigned int i=0; i<N; i++) ...

makes the line more complicated. Besides, the loop may perform both signed and unsigned comparisons, making this a moot point. However, our recommendation is that you should modify your code to make the intent clear when you see warnings like these during compilation.

The other warnings that were generated were obvious: adding a newline at the end of the file and getting rid of the variable that was never referenced. Some compilers tend to be very picky, with warnings that seem unimportant. But, even if they are unimportant on a relative scale, they do clutter the output, making it more difficult to spot real problems. So, even if the warning seems trivial, such as a missing newline at the end of a file, you should make the change so you will never have to see that warning again. Besides, if you follow our rule for production code, all warnings have to be fixed before the code is considered for release.

COMPILER ERRORS

Most errors found by the compiler are coding errors that must be fixed. The error messages themselves , however, are sometimes non-specific messages like "syntax error." If the source of the error is not obvious, you can start by following these simple steps.

Make sure your brackets are balanced. Indenting your code makes this easier to diagnose.
Make sure all statements end with a semicolon, ; .
Comment out offending lines and see if the error message goes away. Sometimes the mistake is actually on a different line than that indicated by the error message.
Study any macro definitions carefully for mistakes.

In rare cases, some errors are caused by the compiler itself. Perhaps the best example of this is how compilers handle templates. When templates were first introduced, they were used in very well-defined ways. As compilers began to support features such as function templates, inconsistencies started to appear. Let's look at a problem we encountered during prototyping of the image framework:

 template<class T1, class T2, class T3> void add2 (const T1& s1, const T2& s2, T3& d1) { d1 = s1 + s2;} template<class T> class apRGBTmpl { public:   T red;   T green;   T blue;   apRGBTmpl () : red(0), green(0), blue(0) {} }; template<class T1, class T2, class T3> void add2 (const apRGBTmpl<T1>& s1, const apRGBTmpl<T2>& s2,                 apRGBTmpl<T3>& d1) {   d1.red   = s1.red   + s2.red;   d1.green = s1.green + s2.green;   d1.blue  = s1.blue  + s2.blue; } int main() {   apRGBTmpl<unsigned char> rgb1, rgb2;   apRGBTmpl<long> rgb3;   add2 (rgb1, rgb2, rgb3);  // Fails to compile on win32 MSVC7   return 0; }

The first definition of the template function add2() appears as if it is nothing more than a wrapper around a trivial line of code. However, the use of function templates like this allows us to handle the often ignored image processing issue of overflow. This definition of the add2() function template can be used in lots of ways, such as:

 long l1, l2; short s1, s2, s3; add2 (s1, s2, s3); add2 (s1, s2, l1); add2 (l1, l2, s1); // Possible overflows here

Later in our example, we show a trivial implementation of an RGB pixel type, apRGBTmpl<> . This object is more complex than the simple RGB structure we used in earlier prototypes because we provide numerous arithmetic operations as part of the design. We define a version of add2() that takes arbitrary apRGBTmpl<> objects, so that we can write statements like the one defined in main() :

 apRGBTmpl<unsigned char> rgb1, rgb2; apRGBTmpl<long> rgb3; add2 (rgb1, rgb2, rgb3);

Because we have two different template function definitions for add2() , the compiler must decide which one to use. As our comment indicates, Microsoft Visual C++ generates the following errors:

 error C2667: 'add2' : none of 2 overloads have a best conversion error C2668: 'add2' : ambiguous call to overloaded function

The compiler is unclear as to which template should be used. This is the correct behavior according to the C++ standard. The big problem with errors like this is that there is little you can do once you get them. In fact, we had to apply some workarounds to address this problem, as you will see in Chapter 6.

We have also found that many compilers are quite lax regarding syntax for templates. If you experiment with a particular compiler, you may find it accepts certain syntax that violates the ISO C++ standard. Partial template specialization is one area where support is missing from many compilers. This is an example of the issues you will uncover during prototyping. Remember, you are not coding to the standard as much as you are coding to the conformance of the compilers on your target platforms. If your development includes multiple platforms, this means you are effectively designing for the least common denominator.

5.3.2 Run-Time Issues

You have the ability to directly control run-time issues through the design of your application. In this section, we look at some compile-time constructs that also have a run-time component. We also look at performance issues to see how some commonly used C++ techniques can greatly affect execution time.

COMPILE-TIME CONSTRUCTS

The compiler can't always detect all errors at compile time. Sometimes, the compiler detects errors at run time and throws an exception. Some constructs, such as exception specifications (see Section 5.2 on page 135), require both compile-time and run-time checks. If an exception is thrown and the specification is invalid, then std::unexpected() is called and the application terminates (the proper behavior). You must define std::unexpected() such that it can identify that the error condition occurred. Even if you are careful about defining exception specifications, there is nothing to prevent another developer (or even you) from writing software that violates this specification. And, you must do extensive testing to make sure your specifications are correct. These complexities are just some of the reasons we recommend you do not use this construct.

Another run-time issue that occurs is when casting is performed using dynamic_cast<> . This construct is useful for doing a downcast ; that is, converting an object from its current type to that of a derived object. Consider this simple example:

 class apImage { public:   apImage () {}   virtual void f () {} }; class apColorImage: public apImage { public:   apColorImage () {}   virtual void f () {} }; class apGrayImage: public apImage { public:   apGrayImage () {}   virtual void f () {} };

In this example, we define a base class, apImage , and derived classes, apColorImage and apGrayImage . When dynamic_cast<> is used correctly, it converts a pointer or reference to a derived class, and it looks something like this:

 apImage imageInstance; apImage& imageRef = imageInstance; apColorImage colorInstance; apColorImage& colorRef = colorInstance; apImage& upcast = colorRef; apColorImage& downcast1 = dynamic_cast<apColorImage&> (upcast); // ok

This cast converts a reference from an apImage& to an apColorImage& . But when this example is changed to try to convert an object from one derived type to another, or from a base class instance to a derived type, we run into problems, as follows :

 apColorImage& downcast2 = dynamic_cast<apColorImage&> (imageRef); apGrayImage& downcast3  = dynamic_cast<apGrayImage&>  (imageRef);

In both of these cases, a run-time check determines that the conversion cannot be made. The exception, std::bad_cast() , is thrown when this error is detected . Use a try block to catch this error. On some compilers, like Microsoft Visual C++, run-time type identification must be enabled for this exception to be thrown. If this is not enabled, any attempt to use dynamic_cast<> at run time will throw an error. You are also free to use static_cast<> instead of dynamic_cast<> , but no run-time checks are made. In general, we do not recommend using static_cast<> for performing a down cast, because the small penalty of using dynamic_cast<> is negligible.

RUN-TIME PERFORMANCE ISSUES

When you design your application, you need to think about how certain constructs affect performance at run time. For example, the use of inheritance can help turn an incredibly complex problem into a number of smaller, easier problems. Let's take a look at virtual functions.

When a compiler handles virtual functions, it creates virtual tables that are used at run time to determine which function pointer gets used. This additional level of indirection incurs a small run-time penalty each time a virtual function is called. If your design is such that virtual functions are called frequently, then this overhead adds up to a measurable quantity.

Once the first virtual function is added to a class (and hence a virtual table is created), other virtual functions incur only a very small size penalty. This is true because the virtual table only needs to grow by the number of virtual functions added. Note that there is only one virtual table for each type of object. However, as the number of virtual function calls increases , so does the number of table lookups. For many applications, this overhead can be ignored. For example, a drawing application where users manipulate objects by means of a graphical user interface (GUI) is constrained by how often the user generates events, and is not likely to be affected by virtual functions.

At the other end of the spectrum are real-time embedded systems. We learned a valuable lesson from one of our early large-scale C++ efforts. Our design ignored the effects of virtual function overhead, and we used virtual functions liberally. After all, processors were fast and this was such a small effect, or so we thought. As a result, one part of the system was written completely in C++, with a very rich framework. When the first benchmark test was run, the product team was stunned. What used to take one millisecond in an older product now took 50 milliseconds to run. It turns out that about 48 milliseconds of this time was wasted in an overly complex design, thanks in part to too many virtual functions.

We can observe the effect of virtual functions by looking at an example.

 class apSimple { public:   apSimple () : sum_ (0) {}   void sum (int value) { sum_+=value;}   int value () const { return sum_;} private:   int sum_; };

apSimple is a non-virtual class that sums the value of all integers passed to it. We will use apSimple as the baseline for our measurements as we break this into a more complex design.

 class apVirtualBase { public:   apVirtualBase () : sum_ (0) {}   virtual void sum (int value) { sum_+=value;}   int value () const { return sum_;} protected:   int sum_; }; class apDerivedBase : public apVirtualBase { public:   virtual void sum (int value) { sum_+=value;} };

Our base class, apVirtualBase , is almost identical to apSimple , except that our sum() method is now virtual. apDerivedBase derives from apVirtual base and defines an identical definition for sum() . The compiler won't know this, so we can accurately measure any differences between calling apDerivedBase::sum() or apVirtualBase::sum() .

Our baseline is to measure how long it takes this snippet of code to run.

 apSimple simple; Simple* sp = &simple; for (i=0; i<1000000; i++)   sp->sum (i);

Our measurements were done using Microsoft Visual C++, and we used the QueryPerformanceCounter() function to obtain access to the Windows high-resolution counter. We used a pointer to call sum() so it would match our test.

 apDerivedBase derivedbase; apVirtualBase* vb = &derivedbase; for (i=0; i<1000000; i++)   vb->sum (i);

This piece of code will return the same result as the previous one, except that each call to sum() is done by way of the virtual table. A million calls to this function may seem excessive, but look at a hypothetical image processing function to compute the sum of pixels in an image.

 int sum = 0; for (int y=0; y<image->height(); y++)   for (int x=0; x<image->width(); x++)     sum += image->getPixel (x, y);

If image is a pointer to an apImage derived object, our calls to getPixel() will accumulate virtual function call overhead, just like our example. For a 1024 by 1024 image, we are making just over one million calls to have a meaningful benchmark.

Our test platform is an Intel Pentium 4 microprocessor-based machine, running at 2.0 GHz. Our baseline loop took 1.1 milliseconds to execute, while our virtual function loop took 8.6 milliseconds. Seven milliseconds may not seem like much, but it can represent the difference between your application running properly or not. If times like this are too small for you to worry about, then just ignore this section. Otherwise, you need to understand the ramifications of making any function virtual, especially when the function is involved in time-critical code.

To see one possible workaround, let's look at another timing loop.

 apDerivedBase derivedbase; for (i=0; i<1000000; i++)   derivedbase.sum (i);

The difference from our previous loop is that we are calling the method directly in the derived class, rather than by means of a pointer. In this case, there is no ambiguity about what function should be called, and the compiler can avoid the virtual table. If we time this, we see that this function takes the same amount of time as our baseline example. If we apply this concept to our image processing example, we can add a new member function, sum() , to compute the sum.

 int apImage::sum () {   int sum = 0;   for (int y=0; y<image->height(); y++)     for (int x=0; x<image->width(); x++)       sum += getPixel (x, y);   return sum; }

The getPixel() call is no longer using the virtual table and we completely eliminate this overhead. When the user calls image->sum() , we incur a single virtual table lookup.

5.3.3 Template Specialization

Template specialization is one of the run-time and compile-time issues you should be aware of when dealing with templates. Using specialization, you can define specific implementations that differ from the default one. By taking into account the data type of the template argument, a specialization can be written to be more efficient. Sometimes this can result in huge performance gains, but this gain is only for a specific data type. It is easy to forget that templates do not behave like regular classes, because one template instance can have a much different performance than another template instance. Let's start by looking at a simple template design for an image class.

 template<class T> class apImage { public:   apImage (int w, int h) : width_ (w), height_ (h), data_ (0)   { data_ = new T [width() * height()];     memset (data_, 1, width() * height() * sizeof(T));   }   ~apImage () { delete [] data_;}   int width  () const { return width_;}   int height () const { return height_;}   T* getAddr (int x, int y) { return data_ + y*width()+x;}   T getPixel (int x, int y) { return *getAddr(x, y);}   int sum (); private:   T* data_;   int width_, height_; }; template<class T> int apImage<T>::sum () {   int sum = 0;   for (int y=0; y<height(); y++)     for (int x=0; x<width(); x++)       sum += getPixel (x, y);   return sum; }

This example does not do anything we haven't seen before. When you construct an apImage<> object, it will allocate memory for its storage and allocate all the bytes to 1. We offer both getAddr() , to fetch the address of a pixel, and getPixel() , which fetches the pixel value itself. sum() computes the sum of all pixels in the image and does so by calling getPixel() on every pixel. We wrote sum() to take no advantage of the memory layout of the image, to keep the example simple. This would make more sense if apImage<> were actually two objects. If the base class defines sum() and the derived class allocates the memory, sum() could not make any assumptions about how to fetch pixel data, other than to call getPixel() .

Now let's implement a specialization for the data type unsigned char , which is a very common pixel type.

 template<> int apImage<unsigned char>::sum () {   int sum = 0;   unsigned char* p = getAddr (0, 0);   for (int y=0; y<height(); y++)     for (int x=0; x<width(); x++)       sum += *p++;   return sum; }

In this example, we do make assumptions about how memory is stored. Instead of using getPixel() to fetch every pixel, we call getAddr() once to get a pointer to the first pixel in the image. We then simply increment this pointer every time we want to access the next pixel.

There is a large performance difference between these two versions of sum() . For a 1024 by 1024 image, our generic template function took 105 milliseconds to run. Our specialized version for unsigned char took 2 milliseconds. This is an excellent use of specialization to improve the performance of commonly used types.

We caution you to document very clearly where improvements have been made to enhance performance. In a large image processing system, it is doubtful that every team member will have a good understanding of all aspects of the system. What will happen when a team member needs a different template type, and writes :

 apImage<char> image(1024, 1024); ... image.sum();

We already know what will happen. This will work fine, but will take 105 milliseconds to run because our specialization is only valid for unsigned char . The compiler will not remind you to make sure this is what you intended. Nothing will happen at run time either, unless your code monitors the execution time. In fact, if your unit tests or release tests do not find this discrepancy, it might very well be that one of your customers will find this issue. To help minimize this from happening, we suggest you do the following:

Provide well-written documentation regarding performance improvements made to specific data types. This can be placed in the code itself, but it should be duplicated in any developer documentation.
Add performance measurements to unit tests to make sure these optimized functions execute as expected. If the measured performance does not fit within a desired operating range (adjusted for processor speed), the test should fail.
During release testing, generate a list of all template arguments used by the application. Compare this list against the existing documentation and prepare a list of possible discrepancies. The development team should review this list to see if any new specializations are needed.