Item 50: Understand when it makes sense to replace new and delete | Effective C++ Third Edition 55 Specific Ways to Improve Your Programs and Designs

Item 50: Understand when it makes sense to replace `new` and `delete`

Let's return to fundamentals for a moment. Why would anybody want to replace the compiler-provided versions of operator new or operator delete in the first place? These are three of the most common reasons:

To detect usage errors. Failure to delete memory conjured up by new leads to memory leaks. Using more than one delete on newed memory yields undefined behavior. If operator new keeps a list of allocated addresses and operator delete removes addresses from the list, it's easy to detect such usage errors. Similarly, a variety of programming mistakes can lead to data overruns (writing beyond the end of an allocated block) and underruns (writing prior to the beginning of an allocated block). Custom operator news can overallocate blocks so there's room to put known byte patterns ("signatures") before and after the memory made available to clients. operator deletes can check to see if the signatures are still intact. If they're not, an overrun or underrun occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer.
To improve efficiency. The versions of operator new and operator delete that ship with compilers are designed for general-purpose use. They have to be acceptable for long-running programs (e.g., web servers), but they also have to be acceptable for programs that execute for less than a second. They have to handle series of requests for large blocks of memory, small blocks, and mixtures of the two. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. They have to worry about heap fragmentation, a process that, if unchecked, eventually leads to the inability to satisfy requests for large blocks of memory, even when ample free memory is distributed across many small blocks.
Given the demands made on memory managers, it's no surprise that the operator news and operator deletes that ship with compilers take a middle-of-the-road strategy. They work reasonably well for everybody, but optimally for nobody. If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform the default ones. By "outperform," I mean they run faster sometimes orders of magnitude faster and they require less memory up to 50% less. For some (though by no means all) applications, replacing the stock new and delete with custom versions is an easy way to pick up significant performance improvements.
To collect usage statistics. Before heading down the path of writing custom news and deletes, it's prudent to gather information about how your software uses its dynamic memory. What is the distribution of allocated block sizes? What is the distribution of their lifetimes? Do they tend to be allocated and deallocated in FIFO ("first in, first out") order, LIFO ("last in, first out") order, or something closer to random order? Do the usage patterns change over time, e.g., does your software have different allocation/deallocation patterns in different stages of execution? What is the maximum amount of dynamically allocated memory in use at any one time (i.e., its "high water mark")? Custom versions of operator new and operator delete make it easy to collect this kind of information.

In concept, writing a custom operator new is pretty easy. For example, here's a quick first pass at a global operator new that facilitates the detection of under- and overruns. There are a lot of little things wrong with it, but we'll worry about those in a moment.

 static const int signature = 0xDEADBEEF; typedef unsigned char Byte; // this code has several flaws see below void* operator new(std::size_t size) throw(std::bad_alloc) {   using namespace std;   size_t realSize = size + 2 * sizeof(int);    // increase size of request so2                                                // signatures will also fit inside   void *pMem = malloc(realSize);               // call malloc to get theactual   if (!pMem) throw bad_alloc();                // memory   // write signature into first and last parts of the memory   *(static_cast<int*>(pMem)) = signature;   *(reinterpret_cast<int*>(static_cast<Byte*>(pMem)+realSize-sizeof(int))) =   signature;   // return a pointer to the memory just past the first signature   return static_cast<Byte*>(pMem) + sizeof(int); }

Most of the shortcomings of this operator new have to do with its failure to adhere to the C++ conventions for functions of that name. For example, Item 51 explains that all operator news should contain a loop calling a new-handling function, but this one doesn't. However, Item 51 is devoted to such conventions, so I'll ignore them here. I want to focus on a more subtle issue now: alignment.

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints could lead to hardware exceptions at runtime. Other architectures are more forgiving, though they may offer better performance if alignment preferences are satisfied. For example, doubles may be aligned on any byte boundary on the Intel x86 architecture, but access to them is a lot faster if they are eight-byte aligned.

Alignment is relevant here, because C++ requires that all operator news return pointers that are suitably aligned for any data type. malloc labors under the same requirement, so having operator new return a pointer it gets from malloc is safe. However, in operator new above, we're not returning a pointer we got from malloc, we're returning a pointer we got from malloc offset by the size of an int. There is no guarantee that this is safe! If the client called operator new to get enough memory for a double (or, if we were writing operator new[], an array of doubles) and we were running on a machine where ints were four bytes in size but doubles were required to be eight-byte aligned, we'd probably return a pointer with improper alignment. That might cause the program to crash. Or it might just cause it to run more slowly. Either way, it's probably not what we had in mind.

Details like alignment are the kinds of things that distinguish professional-quality memory managers from ones thrown together by programmers distracted by the need to get on to other tasks. Writing a custom memory manager that almost works is pretty easy. Writing one that works well is a lot harder. As a general rule, I suggest you not attempt it unless you have to.

In many cases, you don't have to. Some compilers have switches that enable debugging and logging functionality in their memory management functions. A quick glance through your compilers' documentation may eliminate your need to consider writing new and delete. On many platforms, commercial products can replace the memory management functions that ship with compilers. To avail yourself of their enhanced functionality and (presumably) improved performance, all you need do is relink. (Well, you also have to buy them.)

Another option is open source memory managers. They're available for many platforms, so you can download and try those. One such open source allocator is the Pool library from Boost (see Item 55). The Pool library offers allocators tuned for one of the most common situations in which custom memory management is helpful: allocation of a large number of small objects. Many C++ books, including earlier editions of this one, show the code for a high-performance small-object allocator, but they usually omit such pesky details as portability and alignment considerations, thread safety, etc. Real libraries tend to have code that's a lot more robust. Even if you decide to write your own news and deletes, looking at open source versions is likely to give you insights into the easy-to-overlook details that separate almost working from really working. (Given that alignment is one such detail, it's worth noting that TR1 (see Item 54) includes support for discovering type-specific alignment requirements.)

The topic of this Item is knowing when it can make sense to replace the default versions of new and delete, either globally or on a per-class basis. We're now in a position to summarize when in more detail than we did before.

To detect usage errors (as above).
To collect statistics about the use of dynamically allocated memory (also as above).
To increase the speed of allocation and deallocation. General-purpose allocators are often (though not always) a lot slower than custom versions, especially if the custom versions are designed for objects of a particular type. Class-specific allocators are an example application of fixed-size allocators such as those offered by Boost's Pool library. If your application is single-threaded, but your compilers' default memory management routines are thread-safe, you may be able to win measurable speed improvements by writing thread-unsafe allocators. Of course, before jumping to the conclusion that operator new and operator delete are worth speeding up, be sure to profile your program to confirm that these functions are truly a bottleneck.
To reduce the space overhead of default memory management. General-purpose memory managers are often (though not always) not just slower than custom versions, they often use more memory, too. That's because they often incur some overhead for each allocated block. Allocators tuned for small objects (such as those in Boost's Pool library) essentially eliminate such overhead.
To compensate for suboptimal alignment in the default allocator. As I mentioned earlier, it's fastest to access doubles on the x86 architecture when they are eight-byte aligned. Alas, the operator news that ship with some compilers don't guarantee eight-byte alignment for dynamic allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance.
To cluster related objects near one another. If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. Placement versions of new and delete (see Item 52) can make it possible to achieve such clustering.
To obtain unconventional behavior. Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer. For example, you might want to allocate and deallocate blocks in shared memory, but have only a C API through which to manage that memory. Writing custom versions of new and delete (probably placement versions again, see Item 52) would allow you to drape the C API in C++ clothing. As another example, you might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.

Things to Remember

There are many valid reasons for writing custom versions of new and delete, including improving performance, debugging heap usage errors, and collecting heap usage information.