Raw Measurements

I l @ ve RuBoard

Testing Const Copying and Destruction: The Target Case of Cow

Notes:

  • COW_AtomicInt always took more than twice as long to create and destroy a const copy as did plain thread-unsafe COW.

  • For every copy of a string that was later modified, COW_AtomicInt added constant unrecoverable overhead (400ms per 1,000,000), not counting the overhead on other operations.

  • Most of COW's primary advantage for small strings could be gained without COW by using a more efficient allocator, or the small-string optimization (for more about the small-string optimization, turn to Appendix A). Of course, you could also implement multiple optimizations ”use COW along with an efficient allocator and/or the small-string optimization.

  • COW's primary advantage for large strings lay, not in avoiding the allocations , but in avoiding the char copying.

Running 1,000,000 iterations with strings of length 10:

 Plain_FastAlloc    495ms  copies: 1000000  allocs: 1000003           Plain   1368ms  copies: 1000000  allocs: 1000003      COW_Unsafe    160ms  copies: 1000000  allocs:       3   COW_AtomicInt    393ms  copies: 1000000  allocs:       3  COW_AtomicInt2    433ms  copies: 1000000  allocs:       4     COW_CritSec    428ms  copies: 1000000  allocs:       3       COW_Mutex  14369ms  copies: 1000000  allocs:       3 

Running 1,000,000 iterations with strings of length 50:

 Plain_FastAlloc    558ms  copies: 1000000  allocs: 1000007           Plain   1598ms  copies: 1000000  allocs: 1000007      COW_Unsafe    165ms  copies: 1000000  allocs:       7   COW_AtomicInt    394ms  copies: 1000000  allocs:       7  COW_AtomicInt2    412ms  copies: 1000000  allocs:       8     COW_CritSec    433ms  copies: 1000000  allocs:       7       COW_Mutex  14130ms  copies: 1000000  allocs:       7 

Running 1,000,000 iterations with strings of length 100:

 Plain_FastAlloc    708ms  copies: 1000000  allocs: 1000008           Plain   1884ms  copies: 1000000  allocs: 1000008      COW_Unsafe    171ms  copies: 1000000  allocs:       8   COW_AtomicInt    391ms  copies: 1000000  allocs:       8  COW_AtomicInt2    412ms  copies: 1000000  allocs:       9     COW_CritSec    439ms  copies: 1000000  allocs:       8       COW_Mutex  14129ms  copies: 1000000  allocs:       8 

Running 1,000,000 iterations with strings of length 250:

 Plain_FastAlloc   1164ms  copies: 1000000  allocs: 1000011           Plain   5721ms  copies: 1000000  allocs: 1000011 [*]      COW_Unsafe    176ms  copies: 1000000  allocs:      11   COW_AtomicInt    393ms  copies: 1000000  allocs:      11  COW_AtomicInt2    419ms  copies: 1000000  allocs:      12     COW_CritSec    443ms  copies: 1000000  allocs:      11       COW_Mutex  14118ms  copies: 1000000  allocs:      11 

Running 1,000,000 iterations with strings of length 1,000:

 Plain_FastAlloc   2865ms  copies: 1000000  allocs: 1000014           Plain   4945ms  copies: 1000000  allocs: 1000014      COW_Unsafe    173ms  copies: 1000000  allocs:      14   COW_AtomicInt    390ms  copies: 1000000  allocs:      14  COW_AtomicInt2    415ms  copies: 1000000  allocs:      15     COW_CritSec    439ms  copies: 1000000  allocs:      14       COW_Mutex  14059ms  copies: 1000000  allocs:      14 

Running 1,000,000 iterations with strings of length 2,500:

 Plain_FastAlloc   6244ms  copies: 1000000  allocs: 1000016           Plain   8343ms  copies: 1000000  allocs: 1000016      COW_Unsafe    174ms  copies: 1000000  allocs:      16   COW_AtomicInt    397ms  copies: 1000000  allocs:      16  COW_AtomicInt2    413ms  copies: 1000000  allocs:      17     COW_CritSec    446ms  copies: 1000000  allocs:      16       COW_Mutex  14070ms  copies: 1000000  allocs:      16 

Testing Append() : An Always-Mutating Periodically-Reallocating Operation

Notes:

  • Plain always outperformed COW.

  • The overhead of COW_AtomicInt compared to Plain did not vary greatly with string lengths. It was fairly constant at about 235ms per 1,000,000 operations.

  • The overhead of COW_AtomicInt compared to COW_Unsafe did not vary greatly with string lengths. It was fairly constant at about 110ms per 1,000,000 operations.

  • The overall ever-better performance for longer strings was due to the allocation strategy (see Item 13), not COW versus Plain issues.

Running 1,000,000 iterations with strings of length 10:

 Plain_FastAlloc    302ms  copies:       0  allocs:  272730           Plain    565ms  copies:       0  allocs:  272730      COW_Unsafe    683ms  copies:       0  allocs:  272730   COW_AtomicInt    804ms  copies:       0  allocs:  272730  COW_AtomicInt2    844ms  copies:       0  allocs:  363640     COW_CritSec    825ms  copies:       0  allocs:  272730       COW_Mutex   8419ms  copies:       0  allocs:  272730 

Running 1,000,000 iterations with strings of length 50:

 Plain_FastAlloc    218ms  copies:       0  allocs:  137262           Plain    354ms  copies:       0  allocs:  137262      COW_Unsafe    474ms  copies:       0  allocs:  137262   COW_AtomicInt    588ms  copies:       0  allocs:  137262  COW_AtomicInt2    536ms  copies:       0  allocs:  156871     COW_CritSec    607ms  copies:       0  allocs:  137262       COW_Mutex   7614ms  copies:       0  allocs:  137262 

Running 1,000,000 iterations with strings of length 100:

 Plain_FastAlloc    182ms  copies:       0  allocs:   79216           Plain    257ms  copies:       0  allocs:   79216      COW_Unsafe    382ms  copies:       0  allocs:   79216   COW_AtomicInt    492ms  copies:       0  allocs:   79216  COW_AtomicInt2    420ms  copies:       0  allocs:   89118     COW_CritSec    535ms  copies:       0  allocs:   79216       COW_Mutex   7810ms  copies:       0  allocs:   79216 

Running 1,000,000 iterations with strings of length 250:

 Plain_FastAlloc    152ms  copies:       0  allocs:   43839           Plain    210ms  copies:       0  allocs:   43839      COW_Unsafe    331ms  copies:       0  allocs:   43839   COW_AtomicInt    438ms  copies:       0  allocs:   43839  COW_AtomicInt2    366ms  copies:       0  allocs:   47825     COW_CritSec    485ms  copies:       0  allocs:   43839       COW_Mutex   7358ms  copies:       0  allocs:   43839 

Running 1,000,000 iterations with strings of length 1,000:

 Plain_FastAlloc    123ms  copies:       0  allocs:   14000           Plain    149ms  copies:       0  allocs:   14000      COW_Unsafe    275ms  copies:       0  allocs:   14000   COW_AtomicInt    384ms  copies:       0  allocs:   14000  COW_AtomicInt2    299ms  copies:       0  allocs:   15000     COW_CritSec    421ms  copies:       0  allocs:   14000       COW_Mutex   7265ms  copies:       0  allocs:   14000 

Running 1,000,000 iterations with strings of length 2,500:

 Plain_FastAlloc    122ms  copies:       0  allocs:    6416           Plain    148ms  copies:       0  allocs:    6416      COW_Unsafe    279ms  copies:       0  allocs:    6416   COW_AtomicInt    380ms  copies:       0  allocs:    6416  COW_AtomicInt2    304ms  copies:       0  allocs:    6817     COW_CritSec    405ms  copies:       0  allocs:    6416       COW_Mutex   7281ms  copies:       0  allocs:    6416 

Testing Operator[]() : A Possibly-Mutating Operation, Never Does Mutate

Notes:

  • Plain always vastly outperformed COW.

  • Results were independent of string lengths

  • The overhead of COW_AtomicInt compared to Plain was constant at about 280ms per 1,000,000 operations.

  • COW_AtomicInt2 fared better in this test case, but COW_AtomicInt did better overall. I am therefore focusing on comparing that with Plain .

 [10x iterations] Running 10000000 iterations with strings of length 10:   Plain_FastAlloc      3ms  copies:       0  allocs:       3 [*]             Plain      2ms  copies:       0  allocs:       3 [*]        COW_Unsafe   1698ms  copies:       0  allocs:       3     COW_AtomicInt   2833ms  copies:       0  allocs:       3    COW_AtomicInt2   2112ms  copies:       0  allocs:       4       COW_CritSec   3587ms  copies:       0  allocs:       3         COW_Mutex  71787ms  copies:       0  allocs:       3 

[*] Results were within the measurement error margin. Both varied from 0ms to 9ms.

Testing Various Integer Increment/Decrement Operations

Test Summary:

  • " plain " performs the operations on normal non- volatile int s

  • " volatile " is the only case to use volatile int s

  • " atomic " uses the Win32 InterlockedXxx() operations

  • " atomic_asm " uses inline x86 assembler locked integer operations

Notes:

  • ++atomic took only three times as long as either ++volatile and unoptimized ++plain

  • ++atomic does not incur function-call overhead

 [100x iterations] Running 100000000 iterations for integer operations:           ++plain   2404ms, counter=100000000           --plain   2399ms, counter=0        ++volatile   2400ms, counter=100000000        --volatile   2405ms, counter=0          ++atomic   7480ms, counter=100000000          --atomic   9340ms, counter=0        ++atomic_asm   8881ms, counter=100000000        --atomic_asm  10964ms, counter=0 

Here are a few extra notes on the relative timings of various flavors of x86 assembler implementations of IntAtomicIncrement() . The timings were taken under the same conditions as above and can be compared directly (this is the one actually used above):

 Instructions                    Timing -----------------------------------------  --asm mov       eax, 1  --asm lock xadd i, eax  --asm mov       result, eax     ~11000ms  --asm mov       eax, 1  --asm lock xadd i, eax          ~10400ms  --asm lock inc i                 ~8900ms 

Note that the non-atomic versions are much better and map directly onto the " plain " timings.

 --asm inc i                      ~2400ms 

Conclusion: There is, indeed, overhead introduced by the x86 LOCK instruction, even on a single-CPU machine. This is natural and to be expected, but I point it out because some people have in the past claimed there was no difference.

Finally, note that the Win32 atomic integer functions clearly do not incur function-call overhead. Never assume ”measure.

I l @ ve RuBoard


More Exceptional C++
More Exceptional C++: 40 New Engineering Puzzles, Programming Problems, and Solutions
ISBN: 020170434X
EAN: 2147483647
Year: 2001
Pages: 118
Authors: Herb Sutter

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net