The Horrors of Boxing


The .NET runtime and the C# language have changed the way people write code. It has opened up the world of programming to new developers and expanded the productivity of the ones who have been around for a while. It handles so many things for you that it's quite easy to forget how much some operations cost. Look at this real-world example.

The Billboard sample that shipped with the DirectX software development kit (SDK) in 2002 and 2003 (it was removed for the DirectX SDK Summer 2004 release) had multiple versions: an unmanaged version written in C++ and two managed versions, one written in C# and the other in VB .NET. Because each of the DirectX SDK samples includes the frame rate while you are running, you can easily tell which of the applications is running "faster" and by approximately how much. Comparing the C++ Billboard sample with the C# Billboard sample, you'd notice the C# version runs at approximately 60% the speed of the C++ sample, and that was about the best it could do.

Considering that the other samples run at similar speeds between the managed and unmanaged variations, there must be something special about this particular sample that causes this slowdown. Naturally, there is, and the culprit is boxing and unboxing.

The term boxing refers to what the .NET runtime does to convert a value type (such as a structure) into an object. The reverse of this process is obviously called unboxing. To perform this operation, the .NET runtime allocates a portion of the heap large enough to hold your value type and then copies the data from the stack (where the value type resides) to the newly allocated heap space.

When looking at the Billboard managed sample, you might notice that each tree drawn has a corresponding structure that maintains the information needed to draw the tree. Because the trees are alpha blended and need to be drawn in a certain order (from back to front), each tree is sorted every frame. This code was in the sample to do this:

 trees.Sort(new TreeSortClass()); 

This class implemented IComparer, which is needed for comparison. If you look at the implementation of this class, you see quickly that the comparison method takes two objects as parameters, whereas we are using a structure. This structure must be boxed before it can be passed on to the compare method. Then, as soon as the compare method is started, the object is unboxed to be able to manipulate the data.

On average, the sort method was called approximately 4,300 times per frame. Each of these calls performs a total of two boxing operations, followed immediately by two unboxing operations. The structure itself was defined as such:

 public struct Tree {         public CustomVertex.PositionColoredTextured v0, v1, v2, v3;         public Vector3 position;         public int treeTextureIndex;         public int offsetIndex;  }; 

If you calculate the structure's size, you see that it is quite large, 116 bytes. So calculate the data that is being allocated and copied during a single sort operation. Multiply the number of bytes that need to be allocated per object (116) by the number of objects (2) by the number of calls per frame (4,300), and you come up with a whopping 997,600 bytes that need to be allocated for every frame. This number covers just the allocation for the boxing operation; it doesn't even consider the copy operation to get the data into the newly allocated object.

Even after the copy operation takes place, as soon as the method is entered, the first thing that happens is that it takes all the data that has just been boxed and unboxes it. The exact same allocation and copy need to be performed a second time, with the exception being that this allocation is performed on the stack.

In reality, for every frame the Billboard sample is running, 1,995,200 bytes on average are allocated between the stack and heap and then copied back and forth between them. This conclusion doesn't even consider the fact that this large amount of tiny allocations (because each allocation is 116 bytes) will cause the garbage collector to kick in quite a few times for a generation zero collection. No wonder that sample ran with such poor performance.

The point of this exercise is that many developers using managed languages don't understand the costs behind the code they are writing. The .NET runtime gives you enormous power and flexibility, but given the "newness" of the Application Programming Interface (API), it's still common to see people take advantage of these features without fully understanding the costs associated with them. I'm quite sure the average developer wouldn't realize that the simple sorting algorithm in the Billboard sample would be allocating and copying close to 2MB of data per frame.



Beginning 3D Game Programming
Beginning 3D Game Programming
ISBN: 0672326612
EAN: 2147483647
Year: 2003
Pages: 191
Authors: Tom Miller

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net