Mild versus Severe Inefficiencies

I l @ ve RuBoard

So a thread-safe COW implementation always has to incur serialization overhead, but there's still a vast difference between "necessary" and "indefensible" overhead.

To cut a (very) long story short, it turns out that a thread-safe COW implementation can be written in such a way that the only data to which access needs to be serialized is the reference count itself. (For an exhaustive explanation of this and other aspects of COW and alternative thread-safe implementations, see Items 14 to 16. Item 16 includes a test harness with sample well-optimized source code for no fewer than seven implementations of a limited String class ”five COW implementations and two non-COW implementations. This discussion summarizes and expands on some of the results.)

So here's the thing: Because the only shared data that needs serialization is the reference count ”typically, a single integer ”then you can be a lot more efficient than merely using a heavyweight general-purpose thread safety solution. Instead of using a mutex, which is unavoidable when you have to serialize access to larger structures, you can get away with using the following atomic integer operations if they are available on your system: atomic increment, atomic decrement-and-test, and atomic comparison. These atomic integer operations still incur overhead because they're necessarily slower than plain integer operations, but they're still much faster than a more sophisticated construct such as a mutex. They can often be implemented in a single native assembler instruction or in a short series of instructions.

To get a feel for the difference, consider alternative implementations for one member function of a C++-based String class, the non- const version of operator[]() , which returns a reference to a character at a given offset in the string. [5] For a plain (non-COW) implementation, it would look something like this, where buf_ is a pointer to the String object's internal buffer, where it stores its representation:

[5] COW advocates may complain that the difference between COW and non-COW is most noticeable with possibly-mutating functions such as the non- const version of operator[] . That's true, but this is the easiest function for showing a quick and clear example of the implementation differences between non-COW and various flavors of COW.

 // Plain non-COW operator[] (non-const version). // char& String::operator[]( size_t n ) {   return buf_[n]; } 

A COW implementation looks pretty much the same, except that it must first ensure that the representation is unshared (in operator[]() 's case, there are also reasons why the function should return a reference to non- const and why the representation should be marked unshareable, but that's not directly relevant here so I'll omit it; for more of those details, turn to Item 16):

 // COW operator[] (non-const version). // const char& String::operator[] ( size_t n ) {   EnsureUnique();   return data_->buf[n]; } 

The key to all this is what has to be done inside EnsureUnique() for the different flavors of COW. Here are simplified implementations, which ignore "unshareable" flags and other more-sophisticated operations. Here data_ points to the internal (and possibly shared) representation, which includes the reference count ( refs ):

 // Thread-unsafe COW: no locks. // void String::EnsureUnique() const {   if( data_->refs > 1 )   {     StringBuf* newdata = new StringBuf( *data_ );     --data_->refs;   // now all the real work is     data_ = newdata; //  done, so take ownership   } } // Thread-safe COW: atomic integer operations. // // Uses atomic integer calls to serialize access // to data_->refs. Note that the IntAtomic* calls // are not necessarily function calls, but can be // as efficient as a native assembler instruction. // They still introduce overhead, however, because // EnsureUnique must be called by every possibly- // mutating String function, and the IntAtomic* // operations are slower than normal integer // operations. // void String::EnsureUnique() const {   if( IntAtomicCompare( data_->refs, 1 ) > 0 )   {     StringBuf* newdata = new StringBuf( *data_ );     if( IntAtomicDecrement( data_->refs ) < 1 )     {       delete newdata;  // just in case two threads       data_->refs = 1; //  are trying this at once     }     else     {                  // now all the real work is       data_ = newdata; //  done, so take ownership     }   } } // Thread-safe COW: mutexes. // // Each data_ buffer contains a mutex object for // serializing access to data_->refs. // // This method is needlessly inefficient, but it is // used in some popular commercial string libraries. // EnsureUnique still is called by every possibly- // mutating String function, but the overhead is // worse than with IntAtomic* functions. // void String::EnsureUnique() const {   Lock<Mutex> l(data_->m); //---------   if( data_->refs > 1 )   {     StringBuf* newdata = new StringBuf( *data_ );     --data_->refs;     data_ = newdata;   }   l.Unlock(); //--------------------------------- } 

COW will always add overhead, especially if it's to be made thread-safe. But this should show why there is no reason to use something as indefensibly inefficient as a mutex when atomic integer operations alone will do.

I l @ ve RuBoard


More Exceptional C++
More Exceptional C++: 40 New Engineering Puzzles, Programming Problems, and Solutions
ISBN: 020170434X
EAN: 2147483647
Year: 2001
Pages: 118
Authors: Herb Sutter

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net