Using Value Types as Objects

One of the biggest questions people ask about writing managed code for things like games is "Is it fast enough?" Speed is very important to an industry that prides itself in getting high-quality games that look near photo-realistic, all in essentially real time. The basic answer is yes, so long as you know how to write high-performance code. Of course, that's the same answer that was given when the switch from assembly to C was made, and that turned out all right.

The .NET Runtime and the C# language have changed the way people write code. They have opened up the world of programming to new developers, and expanded the productivity of the ones that have been around for a while. They handle so many things for you that it's quite easy to forget how much some operations may cost. Look at the real-world example in the following paragraph.

The Billboard sample that ships with the DirectX SDK has multiple versions: an unmanaged version written in C++, along with two managed versions, one written in C# and the other in VB.NET. Since each of the DirectX SDK samples includes the frame rate while you are running, you can easily tell which of the applications is running faster, and by approximately how much. Comparing the C++ Billboard sample with the C# Billboard sample, you'll notice the C# version runs at approximately 60% of the speed of the C++ sample.

Considering that the other samples run at similar speeds between the managed and unmanaged variations, there must be something special about this particular sample that causes this slowdown. Naturally, there is, and the culprit is boxing and unboxing.

The term "boxing" refers to what the .NET runtime does to convert a value type (for example a structure) into an object. The reverse of this process is obviously called "unboxing." In order to perform this operation, the .NET runtime will allocate a portion of the heap large enough to hold your value type, and then copy the data from the stack (where the value type resides) to the newly allocated heap space.

Now, when looking at the Billboard managed sample, you may notice that each tree that will be drawn has a corresponding structure that maintains the information needed to draw the tree. Since the trees are alpha blended, and will need to be drawn in a certain order (from back to front), each tree is sorted every frame. You'll see the code in the sample do this:

 trees.Sort(new TreeSortClass()); 

This class implements IComparer, which is needed for comparison. If you look at the implementation of this class, though, you'll see quickly that the comparison method takes two objects in as the parameters, while we are using a structure. This structure must be boxed before it can be passed on to the compare method. Then, as soon as the compare method is started, the object is unboxed to be able to manipulate the data.

The sort method will be called approximately 4,300 times per frame. Each of these calls will perform a total of two boxing operations, followed immediately by two unboxing operations. The structure itself is defined as such:

 public struct Tree {     public CustomVertex.PositionColoredTextured v0, v1, v2, v3;     public Vector3 position;     public int treeTextureIndex;     public int offsetIndex;  }; 

Now, if you calculate the structures size, you'll see that it is quite large: 116 bytes. So, calculate the data that is being allocated and copied during a single sort operation. Multiply the number of bytes that will need to be allocated per object (116), by the number of objects (2), by the number of calls per frame (4,300), and you will come up with a whopping 997,600 bytes that will need to be allocated for every frame. This just covers the allocation for the boxing operation; it doesn't even consider the copy operation to get the data into the newly allocated object.

Even after the copy operation has taken place, as soon as the method has been entered, the very first thing that is done is to take all the data that has just been boxed, and unbox it. Which means the exact same allocation and copy will need to be performed a second time, with the exception being that this allocation will be performed on the stack.

So, in reality, for every frame the billboard sample is running, on average 1,995,200 bytes are allocated between the stack and heap and then copied back and forth between them. This doesn't even consider the fact that this large amount of tiny allocations (since each allocation will be 116 bytes) will cause the garbage collector to kick in quite a few times for a generation zero collection. Seeing this data, it is easy to understand why this sample lacks the performance of the C++ sample.

The point of this exercise is that many developers using managed languages don't understand the costs behind the code they are writing. The .NET Runtime gives you enormous power and flexibility, but given the "newness" of the API, it's still overly common to see people take advantage of these features without fully understanding the costs associated with them. I'm quite sure the "average" developer wouldn't realize that the simple sorting algorithm in the Billboard sample would be allocating and copying close to two megabytes of data per frame.



Managed DirectX 9 Graphics and Game Programming, Kick Start
Managed DirectX 9 Kick Start: Graphics and Game Programming
ISBN: B003D7JUW6
EAN: N/A
Year: 2002
Pages: 180
Authors: Tom Miller

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net