Some Actual Numbers

I l @ ve RuBoard

The difference between "necessary" and "indefensible" overhead can be profound. Here's an example from Item 16's and Appendix B's test results (the entire test harness source code can be downloaded via a link on this book's Web page at http://www.gotw.ca/ publications /mxc++.htm, if you'd like to try it on your system). The test I'm showing here makes copies of 50-char strings in a tight loop. One-third of the copies are never modified (the classic case for COW optimization, in which the deep copy is entirely avoided); the rest of the copied strings are modified exactly once each. I'm showing the results for six flavors of the String class:

" Plain " is a plain non-COW string.
" Plain_FastAlloc " is the same as Plain, but uses an optimized memory allocator instead of the default operator new() allocator (note, however, that the latter was very good in its own right on the compiler platform I tested with).
" COW_Unsafe " is an efficient but thread-unsafe COW string.
" COW_AtomicInt " is an efficient thread-safe COW string that uses atomic integer operations for serializing access to the reference count.
" COW_CritSec " is the same as AtomicInt, but uses Win32 critical section locks instead of atomic integer operations.
" COW_Mutex " is the same as AtomicInt, but uses Win32 mutex locks instead of atomic integer operations.

Here are illustrative results on one compiler and Windows platform I happened to have handy. Recognize that results will vary on other compilers and platforms; see Appendix B for additional result numbers, and for links to source code that you can compile and try out on your own system.

String Implementation Strategy	Elapsed Time (smaller numbers are better)
`Plain`	`1530ms`
`Plain_FastAlloc`	`390ms`
`COW_Unsafe`	`1380ms`
`COW_AtomicInt`	`1750ms`
`COW_AtomicInt2`	`1490ms`
`COW_CritSec`	`7800ms`
`COW_Mutex`	`23010ms`

This is just one scenario, of course. It may or may not be typical of the way your program uses strings. My main purpose in sketching these results is to illustrate the following:

Thread-safe COW is necessarily less efficient than thread-unsafe COW. In this test case, the thread-safety overhead made the difference between COW's being an optimization and its being a pessimization.
Poorly implemented thread-safe COW ( specifically COW_CritSec , which is implemented the same way as a certain popular string library) incurs a severe performance penalty. Note: If your platform lacks atomic integer operations, you have no choice but to use a more heavyweight solution. Then COW becomes a severe pessimization for nearly all cases.
It's not wrong to want to optimize, but a far better (and simpler) optimization is to optimize the memory allocation, not the deep copying. Note also that optimizing the allocator has no thread safety implications. Moral: Don't guess. Measure. Only optimize after measuring where your bottlenecks really are. (Of course, you could always choose to perform both optimizations. But development time is never infinite. Which one gives you more bang for the buck here?)
Again, this test harness was single-threaded. You pay for the overhead whenever you use a library built for possible multithreaded use, whether you need it or not.

As discussed in Item 16, there are other interesting results. One is that when testing with larger strings on compilers with reasonably efficient default allocators , COW's advantage is not so much that it avoids the cost of the memory allocations , but that it avoids the cost of copying the characters in the string ”a result you might find surprising.

I l @ ve RuBoard