8.1 Memory Issues | Applied C++: Practical Techniques for Building Better Software

As we have seen, memory management is extremely important when dealing with images. Even in cases where image buffers are already allocated, inefficient use of memory can cause heap fragmentation that can only lead to problems. This is especially true for embedded systems, because restarting an application or rebooting the system is not an option. In this section, we discuss numerous memory issues and look at ways to improve performance and reliability.

8.1.1 Copy on Write

The apAlloc<> class we introduced in Chapter 3 uses reference counting to allow multiple objects to share the same underlying memory. This class has a duplicate() method that forces the object to have its own copy of the data. duplicate() does nothing if the buffer is not shared (i.e., the reference count is 1 ). Consider the following example:

 apAlloc<unsigned char> buffer1 (100000); ... apAlloc<unsigned char> buffer2 = buffer1; buffer2[0] = 0;

Objects buffer1 and buffer2 share the same underlying memory. What if we want to change buffer2 without affecting buffer1 ? We can manually do this by changing the last lines of the previous example to be as follows :

 buffer2.duplicate (); buffer2[0] = 0;

This can become very time intensive if we have to manually duplicate memory every time we plan on changing the contents of a buffer. This is where copy-on-write semantics can help. We can extend our apAlloc<> class to add this capability, but we will also be faced with some limitations. This topic is discussed fully in More Effective C++ . See [Meyers96].

Our apAlloc<> class defines operator[] as:

 template<class T, class A> T& apAlloc<T, A>::operator[] (long index) {   if (index < 0  index >= size())     throw std::range_error ("Index out of range");   return *(data() + index); }

There is also a corresponding const version that is identical. At first glance, it may appear that we can just call duplicate() inside the non- const version to duplicate the buffer if it is shared. However, there is a problem with this, as shown in the following example:

 apAlloc<unsigned char> buffer (100); cout << buffer[0] << endl; buffer[0] = 5;

In both cases, the non- const version of operator[] is called. The decision about whether the const or non- const version is called does not depend on whether operator[] appears on the left-hand side (lvalue) or right-hand side (rvalue). This means that we need to find another solution so that duplicate() will be called only when necessary.

You can rewrite the example to make sure that the const and non- const versions are called at the appropriate times. We use a const apAlloc<> object whenever we only read the contents of the object, as follows:

 apAlloc<unsigned char> buffer (100); const apAlloc<unsigned char> buffer2 = buffer; cout << buffer2[0] << endl; buffer[0] = 5;

This is not the way to practically design commercial software. It puts too much burden on the developer to ensure whether the const or non- const version of a function is called. And worse , a single mistake in the implementation can mean that unnecessary memory duplications are made that slow down the application.

In More Effective C++ [Meyers96], Meyers discusses a technique of using proxy classes to decide how operator[] is being used (either as an lvalue or rvalue). This technique requires that we return a class object that, based on operator overloading, determines which form is being called. While this solves the problem, it does add some additional complexity.

Leaving out copy-on-write semantics from our apAlloc<> class was done on purpose. Providing the compiler with enough information to always do the right thing makes the apAlloc<> class much more complicated. The last thing we want to do is write code that may become difficult to maintain because of its complexity. We believe it is much better to specify what apAlloc<> does and what it does not do.

A Practical Solution

If you really must have copy-on-write in apAlloc<> , here is an alternative way to do it. It does require an additional step, but fortunately, the compiler will always complain if you should forget. The const methods to access data are no problem:

 const T* data () const;   const T& operator[] (long index) const;

These functions will never be used as an lvalue. If you look at apAlloc<> , the non- const versions look almost identical to the const versions. In our design, we handle things differently. Instead of defining a non- const version of data() , we define dataRW() as:

 T* dataRW ();

The RW suffix indicates that the pointer is for reading and writing. By moving the decision of what kind of pointer is needed from the compiler to the developer, we remove the confusion about what is happening, since we no longer offer two versions of a data() function.

operator[] also looks different because it takes an apIndex object as an argument, rather than a simple numerical index as shown:

 T& operator[] (const apIndex& index);

The definition of apIndex is as follows:

 class apIndex { public:   explicit apIndex (unsigned int index) : index_ (index) {}   operator unsigned int () const { return index_;} private:   unsigned int index_; };

apIndex is really just a proxy for an unsigned int . It is critical that we use the explicit keyword in the constructor, to prevent the compiler from implicitly using this constructor to convert an integer into an apIndex object. If this were to ever happen, the non- const version of operator[] might incorrectly be called.

Our code now changes very little, as shown:

 template<class T, class A> T& apAlloc<T, A>::operator[] (const apIndex& apindex) {   unsigned int index = apindex;   if (index >= size())     throw std::range_error ("Index out of range");   return *(data() + index); }

Although we do not show it here, it is now possible to modify operator[] to duplicate the memory if the reference count is greater than one. You would add a flag in the object that enables and disables the copy-on-write behavior, instead of doing it unconditionally.

Our sample code is now slightly different, with references to buffer[0] being replaced by buffer[apIndex(0)] , as shown:

 apAlloc<unsigned char> buffer (100); cout << buffer[0] << endl; buffer[apIndex(0)] = 5;

If we rewrote the last line of this example as:

 buffer[0] = 5;

the compiler would issue an error, because a non- const reference can only be returned when an apIndex object is used. Using the apIndex type as an array index lets the compiler know that you intend to modify the array. This solution, although not perfect, lets us minimize the complexity of our apAlloc<> object.

8.1.2 Caching Issues

When dealing with image processing operations, it is very useful to minimize the number of intermediate images that must be computed. Although we believe caching to be a useful capability, we decided not to include it in our image framework, as it increased the difficulty of managing objects and placed more responsibility on the developer. That said, we still believe it to be a useful technique, and we present a strategy here if you should wish to extend the framework to include caching.

EXAMPLE

Let's take an example of an image processing function and look at how caching makes it more efficient. The following function computes a histogram of all pixels in the image. The histogram always has 256 entries and represents the number of times each pixel value occurs. Our function converts the image to an 8-bit monochrome image before the histogram is computed. The histogram() function is as follows:

 template<class T> unsigned int* histogram (const apImage<T> image) {   static unsigned int counts[256];   for (unsigned int index=0; index<256; index++)     counts[index] = 0;   apImage<Pel8> mono;   copy (image, mono);   apImage<Pel8>::row_iterator i;   for (i=mono.row_begin(); i!=mono.row_end(); i++) {     Pel8* p = i->p;     for (unsigned int x=0; x<mono.width(); x++)       counts[*p++]++;   }   return counts; }

This simple function is deficient in a few ways (for example, it duplicates the image even if it already is of type Pel8 ), but this is not relevant for our discussion. Let's look at how we might use such a function:

 int main () {   apRect boundary (0, 0, 1024, 1024);   apImage<apRGB> src (boundary);   src.set (apRGB(1,2,3));   unsigned int* h = histogram (src);   for (unsigned int i=0; i<256; i++)     std::cout << i << ": " << h[i] << std::endl;   return 0; }

On the surface, histogram() computes an array of values given an image. The fact that it computes a temporary image is hidden. However, computing this temporary image can consume quite a bit of time. The way we have defined histogram() , it always computes an apImage<Pel8> image. One obvious performance enhancement is to test if the image is already an image of type Pel8 . If so, there is no need to copy this image. For other image types, however, a copy is always necessary.

histogram() discards the temporary image when it is no longer needed. However, what happens if another call to histogram() is made, or another image processing step needs an image of the same type? The answer is that this image is computed every time it is needed. What we really want is a framework that can store temporary images so that they can be reused as needed.

A General Purpose Caching Framework

Our need to cache intermediate copies of images can be generalized into a generic framework. It should have the following features:

Before computing a derived object, consult a list of cached objects to see if it is available.
If the object is available, return the cached copy.
If the object is not available, compute this quantity, then save it in the cache.

After we discuss the general framework, we apply it to an image processing example, where we attach a cache object to each instance of an image to hold the list of available images.

CACHING OBJECT

apCache keeps track of one or more cached quantities , called apCacheEntryBase* objects. Each of these cached quantities is maintained as an object derived from apCacheEntryBase , as shown.

 class apCacheEntryBase;  // Forward declaration class apCache { public:   enum eDeletion { eAll, eStateChange, eDataChange};   // Each entry can decide when it will get deleted from the cache   typedef std::map<std::string, apCacheEntryBase*> Cache;   // Our map of related images, by name   apCache ();   ~apCache ();   void add (const std::string& key, apCacheEntryBase* cacheEntry);   // Add an item in the cache, or replace an existing item   apCacheEntryBase* find (const std::string& key);   // Retrieve a cache item, or 0 if not found   void clear (eDeletion deletion = eAll);   // Clear items in our cache. By default, they all are cleared protected:   Cache cache_;    // List of cached items };

ADDING CACHED OBJECTS

apCache uses a std::map object to map a string to an apCacheEntryBase object. Every object that is added to the cache has a unique string. This string is used by other objects to query and fetch a cached object, as shown:

 void apCache::add (const std::string& key,                    apCacheEntryBase* cacheEntry) { cache_[key] = cacheEntry;} apCacheEntryBase* apCache::find (const std::string& key) {   Cache::iterator i = cache_.find (key);   if (i == cache_.end())     return 0;   return (i->second); }

DELETING CACHED OBJECTS

It is equally important to decide when information should be deleted from the cache. For example, if a function changes the original image that was used to compute a derived quantity, the cached image is no longer valid and must be recomputed. apCache has a limited understanding of what is contained in its cache. The clear() method is included to delete items from the cache, as shown.

 void apCache::clear (eDeletion deletion) {   // Delete some or all of our objects. This loop is different   // because we can't delete the iterator we're on.   // See Meyers, Effective STL, Item 9   Cache::iterator i;   for (i=cache_.begin(); i!=cache_.end();) {     apCacheEntryBase* entry = i->second;     if (deletion == eAll  entry->deletion() == eAll          entry->deletion() == deletion) {       delete (i->second);       cache_.erase (i++);     }     else       ++i;   } }

You decide what items are deleted by using the state of the cached object and the value of deletion , which can be any of the following:

eAll - All items should be deleted from the cache.
eStateChange - The state of the object has changed. If this cache contained images, it means that the pixel data has not changed, but other features, such as its origin point, have changed.
eDataChange - The object's data has changed. If this cache contained images, it means that the actual pixels of the parent image have been altered .

Note that, as the comment indicates, the loop must be written this particular way, since it is possible to delete the current item we are examining in cache_ .

Placing references in your code or indicating ways to get more information is very useful to anyone using your code.

BASE CLASS FOR CACHED OBJECTS

Now we can look at the objects that actually get cached. We have seen that apCache keeps track of zero or more apCacheEntryBase* objects. The main purpose of apCacheEntryBase is to act as the base class for any cache object, as shown here.

 class apCacheEntryBase { public:   apCacheEntryBase (apCache::eDeletion deletion);   virtual ~apCacheEntryBase ();   apCache::eDeletion deletion () const { return deletion_;} protected:   apCache::eDeletion deletion_; };

apCache maintains pointers to objects derived from apCacheEntryBase in cache_ . We use pointers to avoid copies, since these cached objects might contain very large and difficult to copy features. apCache::clear() calls delete on these pointers in order to free them.

Templates once again allow us to be very flexible about what is contained in a cache entry, as shown here.

 template<class T> class apCacheEntry : public apCacheEntryBase { public:   apCacheEntry (T& object,                 apCache::eDeletion deletion = apCache::eAll)     : apCacheEntryBase (deletion), object_ (object) {}   virtual ~apCacheEntry () {}   T object () { return object_;} protected:   T object_; };

As you can see, one copy of the cached item, object , is made during construction when it gets stored in object_ . The deletion criteria can also be specified, with the default behavior that any changes to the parent object cause the entry to be deleted. Cached elements are returned as base class objects. They must be cast in order to retrieve the specific information contained inside. This is not a problem, since the function that is trying to access this value is also capable of storing it, so the function must understand the details.

As you might expect, there are a number of important issues that you must understand when writing code that caches derived objects. Let's look at example to clarify things.

EXAMPLE

Let's rewrite the histogram() function we used in our previous example. Instead of caching an apImage<> object, we will store an apImageStorage<> object because it contains all the necessary information, as shown.

 apCache cache; // Stand-alone cache for testing template<class T> unsigned int* histogram (const apImage<T> image) {   static unsigned int counts[256];   for (unsigned int index=0; index<256; index++)     counts[index] = 0;   apImage<Pel8> mono;   std::string cacheKey ("Pel8");   apCacheEntry<apImageStorage<Pel8> >* entry ;   // Retrieve or store our Pel8 image in the cache   apCacheEntryBase* cached = cache.find (cacheKey);   if (!cached) {     // This item is not in the cache. Compute it and cache it     copy (image, mono);     apImageStorage<Pel8> storage = mono.storage ();     entry = new apCacheEntry<apImageStorage<Pel8> > (storage);     cache.add (cacheKey, entry);   }   else {     entry = static_cast<apCacheEntry<apImageStorage<Pel8> >*>             (cached);     mono = apImage<Pel8>(entry->object());   }   apImage<Pel8>::row_iterator i;   for (i=mono.row_begin(); i!=mono.row_end(); i++) {     Pel8* p = i->p;     for (unsigned int x=0; x<mono.width(); x++)       counts[*p++]++;   }   return counts; }

This technique makes the histogram() function more complicated, but also more efficient. cacheKey is the string name we use to see if an image is already in the cache. We use apCache::find() to see if the item is already in the cache.

If the item is not in the cache (i.e., cached is zero), we compute the monochrome image and save it in the cache. The template parameter for our apCacheEntry is apImageStorage<Pel8> , as shown.

 copy (image, mono); apImageStorage<Pel8> storage = mono.storage (); entry = new apCacheEntry<apImageStorage<Pel8> > (storage); cache.add (cacheKey, entry);

(Note: If we were really adding this to the image framework, we would have used something more complicated than just Pel8 as the name of the key. We would have combined the typeid() and any special storage alignment of the image.)

If the item was in the cache, we make mono point to the storage object we obtain from the cache, as shown.

[View full width]

  [View full width] 
 entry = static_cast<apCacheEntry<apImageStorage<Pel8> >*> (  cached); mono = apImage<Pel8>(entry->object());

The rest of the histogram() function remains the same. While caching is clearly a useful performance and efficiency enhancement, it is also obvious that it makes the design more complex. These trade-offs have to be considered when approaching the design of your application.

Adding a Cache to the Image Object

To add caching to our image object, apImage<> , we need to make every instance of apImage<> contain an apCache instance. We must also modify every image processing function that uses an intermediate image, to first check if the image is already available in the cache before computing it.

And, we still have to worry about making sure the cache is cleaned up when the image is changed. Let's take a look at how we might add caching to our image object.

We start by adding a number of new methods to apImage<> to manage the apCache object, cache_ , as shown:

 void clearCache (apCache::eDeletion deletion = apCache::eAll) { cache_.clear (deletion);} apCacheEntryBase* readCache (const std::string& key); void writeCache (const std::string& key, apCacheEntryBase* object);

Now we need to modify every function that can change the image data or state. For example, our assignment operator must add cache_.clear() to make sure the cache removes all items, as shown:

 template<class T, class S> apImage<T,S>& apImage<T,S>::operator= (const apImage& src) {   apImageLocker<T,S> srcLocking (src, false); // Lock state   // Make sure we don't copy ourself!   if (storage_ != src.storage_) {     apImageLocker<T,S> locking (*this, false); // Lock our state     storage_ = src.storage_;     cache_.clear ();                    // Clear our cache   }   return *this; }

Statements like this must be carefully placed in all functions that can modify the image. A single mistake can keep an old version of the image in the cache.

If your application computes the same types of images many times, then you should consider adding caching at the application level of your design. In this case, it would be a mistake to add this functionality within the image object. Making apImage<> cache every image it computes would mean that many of the cached images would never be used. The point of caching is to reduce the number of times we compute temporary images. Caching unused images would defeat the whole purpose.

If you are still not convinced, consider objects with very long lifetimes. These objects are used for many operations and persist in the system for a long time. If these objects cache quantities that are only used once, the application can find itself exhausting memory, without the addition of yet another component to make sure that cached entries do not stay around too long.

In this section, we have highlighted only some of the changes required to add caching to the image framework by means of the apImage<> object. Instead of adding caching to apImage<> , you should consider caching certain images in your application, such as:

Monochrome images derived from color images or vice-versa. Since many image processing operations are written to only work on one type of image, caching a copy of it is a good idea if you expect to perform numerous computations on it.
Images with a specified image alignment. We have said many times that some algorithms run faster when their storage is aligned in memory. If you take the time to compute an image with a specific alignment, you should consider caching a copy in your code to use it again. Our Intel IPP interface is a perfect example. If you pass an improperly aligned image to an IPPFilter<T> object, a copy of this object is made. If possible, you should consider caching if you expect to call many apIPP methods using the same image.