Performance Tips | Advanced .NET Programming

This is where we get on to what I suspect you've been waiting for most of this chapter - a list of specific things that you can do to improve performance in your managed applications.

First some general points: for developers concerned about optimizing managed code, much of the advice largely remains the same as it did before the days of .NET:

Turn on compiler optimizations when you have finished debugging and are close to shipping. (And of course - very important - make sure you do a final debug of your code in the optimized configuration to make sure those Debug preprocessor symbols and conditionally compiled function calls didn't have some side effect that you hadn't noticed. Although it is unlikely to affect managed code, there are also certain rare situations in which bugs in your code, such as certain buffer overruns, can be masked in debug builds and only appear when the code is optimized.)
Profile your code to identify for certain which routines are the ones eating processor time and if necessary rewrite the code just for those routines. (We'll cover profiling in the next chapter.) Without profiling, it's easy to guess which method you think is using the time, guess wrong, and then waste development time optimizing some code that's not actually going to make any difference to your application.
There's no point trying to perform low-level optimizations manually (such as inlining a small method or moving a variable declaration from inside to outside a loop). Virtually all compilers these days are pretty smart, and will do all that stuff quite happily themselves without any help from you. The optimizations you should be thinking about are high-level ones (not drawing unnecessarily, or using an appropriate algorithm for the task at hand).

There is one additional step you may be able to do for managed code - which is to pass your assemblies through a commercial obfuscator. Their real purpose is to make assemblies harder to decompile by - among other things - mangling private and internal type and variable names. However, some obfuscators are also able to optimize memory layout and reduce name lengths, which makes the assemblies smaller. This is unlikely to affect the time to run the file significantly, but may be important if your assemblies are intended to be downloaded on an intranet or the Internet and time to download is a significant factor in performance.

We'll now conclude the chapter by looking at some miscellaneous tips for performance that are worth bearing in mind when writing your source code. Since not all tips are relevant to all languages, we indicate in each section title which of the three main high-level Microsoft languages the tip mostly applies to.

Don't Waste Time on Optimizing (All Languages)

I know I've said this already, but do remember that your development time and code robustness are important - often more important than performance. Users tend not to be impressed by applications that run like lightning, presenting the results they wanted before they have even had a chance to blink, immediately followed by an Unhandled Exception dialog box.

Remember too that the biggest killers of performance are usually:

Network calls
Page swapping
Inappropriate algorithms and data structures

In most cases, the detailed low-level procedure of your code has a relatively minor effect compared to these factors. In most cases, there really is no point in going through your code with a fine-tooth comb trying to pick out places where you can improve the performance by changing your control flow, etc. All you're probably doing is wasting your time (as well as creating code that might look more complex and therefore is less maintainable). You can improve your performance by making sure you minimize network calls, keep your working set down to reasonable levels, and use appropriate algorithms (for example, don't use a bubble sort to sort 50,000 words). To a lesser extent it also helps to minimize calls that don't go across processes, but which do cross managed-unmanaged or application domain boundaries. Also, I can't emphasize enough that if your code is not performing satisfactorily and the work on optimizing is going to take a while, then you should do research to find out where the problem is, so you can concentrate on finding the correct code to optimize (though obviously you should be sensible about this: if a method has a potential performance problem that stands a good chance of being significant and which is only going to take half an hour to fix, then it's probably worth fixing it anyway).

Use StringBuilder Sensibly (All Languages)

You are no doubt aware of the existence of the StringBuilder class, and of the recommendation that using the StringBuilder will give a higher performance than the using straight String instances if you're concatenating strings together. This advice is true most of the time, but not necessarily all the time. It does depend on what manipulations and how many concatenations you are doing. You have to be doing at least several string concatenations before the benefits outweigh the initial overhead of instantiating the StringBuilder in the first place. On occasions I've seen people use a StringBuilder just to join two strings. That's madness - in that case using a StringBuilder will actually reduce performance! On the other hand, if you have a large number of concatenations (or other operations which won't affect a substantial number of characters at the beginning of the string) then using a StringBuilder will give a vast improvement.

As a rough guide, if you can perform your operation using one String.Concat() method call, then use that. There's no way a StringBuilder can do that faster. And bear in mind that there are String.Concat() overloads that take two, three, or four strings as parameters, and the high-level compilers are fairly good at calling them. This C++ code will compile into a single Concat() call:

 result = s1 + s2 + s3 + s4;

If you can arrange to do all the concatenations in a single high-level language command, then that's by far the best and most efficient solution. But of course if it's impossible in a given situation for you to do that, you have to consider whether to use StringBuilder. At two String.Concat() calls, StringBuilder is probably marginally more efficient, though it depends to some extent on the relative sizes on the strings. Thereafter, StringBuilder rapidly gets the edge. But you will have to balance any performance gains against the reduced readability of code that involves StringBuilders. (For example, you can't use the + operator with a StringBuilder in C# or VB. In C++ this is less of a factor, since C++ doesn't have any shorthand syntax or operator overloads for System.String, which can make C++ System.String manipulations look pretty inscrutable anyway.) There's also the fact that StringBuilder implements only a fraction of the string manipulation functions offered by String. Personally, if my code is in a performance-important area (for example, in a loop) I'll probably use StringBuilder if I'd need three or more calls to String.Concat(). If it's not a piece of code that's important for performance, it'll take a few more calls before I use StringBuilder, because I like code that's easy to read (and therefore to maintain). But different developers will probably draw the boundary at different places. If it's an absolutely performance-critical piece of code, I'll start doing detailed calculations of how many characters get copied and how many objects get instantiated with each approach.

For the remainder of this section, I'm going to present an algebraic calculation of numbers of characters copied. It's algebraic because I've taken a very general case, but if you want to avoid algebra you can just substitute in the actual numbers that will be used in your code instead of the symbols I'm going to use. The calculation will give you a graphic example of how much better StringBuilder is when there are a lot of calculations and will also show you the principles of comparing the efficiency of different approaches.

For our purposes, let's suppose that you have N strings and you needed to join them together. And let us suppose that on average each of the strings contains L characters - so the final string will contain something like N times L characters (or, as we write it in algebra, NL). In fact, to keep it very simple I'm going to assume that every string contains exactly L characters. To get an idea of performance, we'll work out the total number of characters that will be copied without and with a StringBuilder, because this is the factor that will have most impact on performance.

Suppose we have code like the following (we'll assume here that there is some other intermediate code not shown that will prevent the compiler emitting IL code that calls the three- and four-parameter overloads of String.Concat()):

 string result = string1; result += string2; result += string3; // etc. result += stringN;

Roughly how many characters will need to be copied if we use strings to perform the concatenation? We know that each concatenation creates a new string, which means that all the characters in the two strings to be joined will get copied across. That means that for the first concatenation we have two times L - that is, 2L characters. For the second concatenation, we have to join a string of length 2L to a string of length L - so there are 3L characters to copy. For the third concatenation, it's 4L characters, and so on. This means the total number of characters copied is 2L + 3L + ... + NL. Using some algebra, that sum turns out to equal NL(N+1)/2 - L (if you want to keep things very approximate, you can say that's roughly half of L times N squared). Those are scary figures. If you have, say, 10 strings of average length 20 characters that you wish to join to make a single 200-character string, doing it that way will involve the equivalent of copying 2,180 characters! You can see how the performance loss would add up.

Now let's try using StringBuilder. We assume you know in advance roughly how long the final result will be, and so can allocate a StringBuilder with big enough capacity in the first place. So you are looking at code rather like this:

 StringBuilder sb = new StringBuilder(x);   // x initialized to combined                                            // length of all strings sb = string1; sb.Append(string2); // etc. sb.Append(stringN); result = sb.ToString();

To start with the StringBuilder will be allocated. It's a fair guess that this is a relatively atomic operation. Then all that happens is that each string in turn needs to be copied into the appropriate area of memory in the StringBuilder. That means a straight N times L characters will be copied across, so the final answer is NL. Ten strings with an average length of 20 characters gives you 200 characters copied, less than a tenth of what we worked out without using StringBuilder. Note that the StringBuilder.ToString() method does not copy any of the string - it simply returns a string instance that is initialized to point to the string inside the StringBuilder, so there is no overhead here. (The StringBuilder will be flagged so that if any more modifications are made to the data in it, the string will be copied first, so we don't corrupt the String instance. This means that any more operations on either the String or the StringBuilder will cause another NL characters to get copied.)

One other point to remember: the performance benefits of using StringBuilder only apply for concatenating strings within the existing capacity of the StringBuilder instance. Go beyond the capacity, or perform an operation inside the string, such as deleting a couple of characters (which will force all subsequent characters to move position), and you'll still end up having to copy strings around, although in most cases not as often as you would do with String.

Be Careful About Foreach Loops on Arrays (C#, VB)

You'll probably be aware that standard advice is to use for loops instead of foreach loops where possible, because foreach will hurt performance. In fact, although this advice is often correct, the difference between the two loops is not nearly as great as you might imagine, and in many cases using foreach to iterate through the elements of an array will make little or no difference.

The particular question is this: does it make any difference whether you write something like this (in C#, or the equivalent in VB):

 // arr is an array - we assume array of int for the sake of argument, but // the same principles apply.whatever the underlying type for (int i=0; i<arr.Length; i++) {    // Do processing on element i }

Or this:

 foreach(int elem in arr) {    // Do processing on elem }

The traditional thinking is based on the fact that internally a foreach loop in principle causes an enumerator to be instantiated over the collection - and this would presumably cause a significant performance loss if the collection were actually an array that could easily be iterated over without an enumerator, just by using a simple for loop instead. However, it turns out that the two above pieces of code have virtually identical performance in both C# and VB. The reason is to do with compiler optimization. Both the C# and the VB compilers know that the two pieces of code are identical in their effect, and they will therefore always compile a foreach loop over an array into the IL equivalent of a for loop - as you can very easily verify by compiling a simple loop and using the ildasm.exe to inspect the emitted IL code. Generally, I've noticed the code isn't quite identical: in many cases a foreach loop will generate slightly more verbose IL code, but not in any way that is likely to have too dramatic an effect on performance. (However, if performance is absolutely critical, then you will probably not want to risk even a small performance deficit.)

I should also mention one case in which using for might be a lot more efficient: if you know at compile time how big the array is going to be, that does shift the balance in favor of the for loop. The reason is that this code:

 foreach (int x in arr)   // arr is arry of ints {

Will be treated by the compiler as:

 for(int i=0; i<arr.Length; i++) {

Clearly you are going to give the JIT compiler more scope for optimizations if you are actually able to write something like:

 for (int i=0; i<20; i++) {

Bear in mind also that, if you are using some high-level language other than VB or C#, but which supports a foreach construct, then you should be careful about using foreach unless you know that your compiler will optimize it away. Just because C# and VB do, it doesn't mean that a third-party compiler will too.

Use Appropriate String Types (C++)

In managed C++, if you append an S in front of literal strings in your source code, the strings will be treated as System.String instances instead of C-style strings. Buried in the C++ documentation for this feature is the suggestion that you should use this for performance. It may be a small thing, but it really can make a difference. Take a look at what happens when you compile this C++ code, which uses the S-syntax:

 Console::WriteLine(S"Hello World");

This compiles on my machine to this IL:

 IL_0000:   ldstr      "Hello World" IL_0005:   call       void [mscorlib]System.Console::WriteLine(string)

That's about as good as you get for performance.

Now look what happens if you change that S to an L, the normal way that you would traditionally represent Unicode strings in C++:

 Console::WriteLine(L"Hello World");

L"" means Unicode characters - the kind of characters that .NET expects. So you might not expect this small change to make much difference. But when I tried compiling it I got this:

 IL_0000:  ldsflda    valuetype $ArrayType$0x76a8870b ??_C@_1BI@JBHAIODP9?$AAH?$AAe?$AA1?$AA1?$AAo?$AA?5?$AAW?$AAo?$AAr?$AA1?$AAd?$AA?$A A@ IL_0005;  newobj     instance void [mscorlib]System.Strings:.ctor(char*) IL_000a:  call       void [mscorlib]System.Console::WriteLine(string)

What's happened is that the C++ compiler has assumed from your code that you want an array of _wchar_t - after all, that's what L"" normally means. So it's given you exactly that, and used the trick we described at the end of Chapter 2 to embed the unmanaged string literal in the metadata. Of course, Console.WriteLine() needs a String instance, so the code has to create one - which it does by using the fact that String has a constructor that can take a pointer to a zero-terminated character array. It works, but for your pains you've got a slightly bigger assembly (both because of the extra opcodes and because of the extra type with the funny name) and slower code.

Of course, there are times when an unmanaged char array is what you want, in particular if you are passing strings to unmanaged functions that expect ASCII strings. In this case, you would be much better off keeping the strings as char* (that's C++ eight-bit char*, not managed 16-bit char*), because you'll save having to convert between string formats during marshaling.

Be Careful About Crossing Boundaries (All Languages)

By boundaries, I mean anything where the environment changes and work has to be done. That can mean:

Network calls
Calling unmanaged code from managed code and vice versa
Making calls across application domains or - even worse - processes

Obviously, of all these 'boundaries' making network calls is going to hit your performance hardest. If you do need to make network calls, it might be worth doing the calls asynchronously or using multithreaded techniques so you can do background work while waiting for the call to return.

Increasingly for network calls, bandwidth itself is not a problem. For example, many intranets have very fast connections between computers. In this scenario, provided you're not transmitting huge amounts of data, it might not be the amount of data going over the network that hits you so much as the number of separate network calls, with a fixed delay for each one. (Incidentally, that's one reason why it's now considered acceptable to use XML so widely, even though XML is a relatively verbose format, and why some .NET classes, such as the DataSet, have been designed to transmit large amounts of data in one single call.) For the Internet, you may have to be more careful, especially if an application will be run on machines with 56K dialup connections. What all this means of course is that you should consider carefully what your application is doing and where data will be used, and you may find that it makes more sense to make fewer network calls even if that means transmitting more data overall.

For calls across application domains or to unmanaged code, similar principles hold. However, in this case the factors to watch are the number of individual calls, and the total amount of data that needs to be marshaled - or, even worse, converted from one format to another, such as Unicode to ANSI strings.

For calls from managed to unmanaged code, Microsoft has suggested the following figures for the time to make the transition:

Using P/Invoke	Typically thirty x86 instructions; under absolutely ideal conditions, perhaps ten instructions
Using COM Interop	Typically sixty x86 instructions

COM interop takes longer than P/Invoke because more processing has to be done to go through the COM layer. There are effectively two boundaries to go through: managed to unmanaged, then through the COM runtime.

If you are using P/Invoke, try to keep marshaling down to a minimum. Although Microsoft has suggested 10-30 x86 instructions per call, that's an average figure. If you pass in types that don't need to be marshaled (which includes all primitive types) then it'll be faster.

C++ developers have the additional option of using the IJW mechanism. This can be marginally faster than using DllImport, since the CLR needs to make slightly fewer checks during the transition (moreover, you can use IJW within the same assembly, so in some cases you get a saving from that too). C++ also gives you better ability to design data types that can be used with minimal marshaling across the unmanaged-managed boundary. However, even with C++, calling unmanaged code is not free. The CLR will still have to carry out various admin tasks, such as marking your thread to prevent the garbage collector from intercepting it, before it can hand over to unmanaged code. One suggestion I've seen is to allow perhaps eight x86 instructions under ideal conditions.

Use Value Types Sensibly (All Languages)

Using value types can make a real difference. This is both because such types can be instantiated and initialized faster, and because each instance takes up less space in memory. The difference on a 32-bit machine is currently three words (value types don't need the reference, method table index, or sync block index), so if you have a large number of objects of a certain type, the memory saving may be significant from the point of view of your working set. On the other hand, you will be aware of the performance penalty if value types need to be passed as objects to methods due to boxing. If you're writing in C++, you can circumvent this performance loss to some extent because C++ is more explicit about requiring you to perform individual boxing and unboxing operations, instead of the compiler guessing what you want.

When deciding whether to declare a type as a value type or a reference type, you will need to consider carefully how the type is going to be used. In some cases you might even decide it makes more sense to define two types - one value, one reference - which have the same purpose.

There is one catch you need to be aware of. You'll be aware that classes should normally be specified as auto layout, since this allows the CLR to arrange its fields however it sees fit in the optimal manner for performance and class size, taking account of byte alignment issues. For classes that you declare in high-level languages, this is the default, but for valuetypes (struct in C#, Structure in VB and __value class or __value struct in C++), the default is sequential layout. In other words, this C# code:

 public struct MyStruct {

Compiles to this IL:

 .class public sequential ansi sealed beforefieldinit MyStruct        extends [mscorlib]System.ValueType {

Sequential layout is good for structs that are going to be passed via P/Invoke to unmanaged code, since it corresponds to the layout in unmanaged code, and is therefore likely to cause less marshaling work. However, it may be less efficient inside your managed code. So if you are declaring structs purely for performance reasons, and not intending to pass them to unmanaged code, consider declaring them like this:

 [StructLayout(LayoutKind.Auto)] public struct MyStruct {

This will compile to this IL:

 .class public auto ansi sealed beforefieldinit Mystruct        extends [mscorlib]System.ValueType {

Don't Use Late Binding (VB)

It's always been good programming advice to use the most specific type possible, and this is just as true in .NET programming. It is possible in any language to cause performance problems by using less specific types than necessary - for example, declaring a variable as type object when you actually want an integer. Now C# and C++ developers are extremely unlikely to do this, because the culture in C-style languages for many years has been one of type safety. Generally speaking, to someone with a C++ background transferring to C#, it would simply make no sense to write:

 object x;

in place of:

 int x;

even though the former syntax is quite correct in C#.

However, the situation in VB, before the days of .NET, was different. In VB6, the Variant class was commonly used as a general-purpose type. There was some performance loss associated with this, but this loss was often relatively small, and many VB6 developers felt that the flexibility offered by Variant more than made up for that. Besides, VB syntax does favor using Variant. Typing in:

 Dim x

involves less fingerwork than:

 Dim x As Integer

If you are inclined to do that in VB.NET, then I have one word for you: don't. In VB6, saying Dim x meant you got a variant, which would hurt performance a little bit. In VB.NET, saying Dim x means you get a System.Object instance, which will hurt performance a lot. Object is a reference type. Treat an int as an object, and you get boxing. And that's not all; in fact, you probably won't believe just how bad it gets. Let's do some IL-investigating. We'll have a little bit of VB code that adds two numbers together:

 Sub Main()    Dim o1 As Integer = 23    Dim o2 As Integer = 34    Dim o5 As Integer = o1 + o2    Console.WriteLine(o5) End Sub

Simple enough. If you type this in and compile it, the emitted IL is pretty simple too:

 .method public static void Main() cil managed {   .entrypoint   .custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = ( 01 00 00 00 )   // Code size 17 (0x11)   .maxstack 2   .locals init (int32 V_0,                 int32 V_1,                 int32 V_2)   IL_0000:  ldc.i4.s   23   IL_0002:  stloc.0   IL_0003:  ldc.i4.s   34   IL_0005:  stloc.1   IL_0006:  ldloc.0   IL_0007:  ldloc.1   IL_0008:  add.ovf   IL_0009:  stloc.2   IL_000a:  ldloc.2   IL_000b:  call       void [mscorlib]System.Console:WriteLine(int32)   IL_0010:  ret   }

This code loads the constant 23 and stores it in local variable 0. Then it stores 34 in local 1, loads the two variables, adds them, stores the result, and prints it out. Notice especially that add. ovf instruction - one simple IL instruction to perform an add operation, which will almost certainly be JIT-compiled into one executable instruction, followed by an overflow check.

Now let's make one change to the VB code. We'll forget to explicitly declare one of the Integers as an Integer but leave it as unspecified type - in other words, System.Object:

 Sub Main()    Dim o1 As Integer = 23    Dim o2 = 34    Dim o5 As Integer = o1 + o2    Console.WriteLine(o5) End Sub

Let's see what that does to the IL. In this code, I've highlighted everything that's changed:

 .method public static void Main() cil managed {   .entrypoint   .custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = { 01 00 00 00 )   // Code size       36 (0x24)   .maxstack 2   .locals init (int32 V_0,                 object V_1,                 int32 V_2)   IL_0000:  ldc.i4.s  23   IL_0002:  stloc.0   IL_0003:  ldc.i4.s   34   IL_0005:  box       [mscorlib]System.Int32   IL_000a:  stloc.1   IL_000b:  ldcoc.0   IL_000c:  box        [mscorlib]System.Int32   IL_0011:  ldloc.1   IL_0012:  call       object [Microsoft.VisualBasic]Microsoft.VisualBasic,                          CompilerServices.ObjectType::AddObj(object, object)   IL_0017:  call       int32 [Microsoft. VisualBasic]Microsoft.VisualBasic.                          CompilerServices.IntegerType::FromObject(object)   IL_001c:  stloc.2   IL_001d:  ldloc.2   IL_001e:  call       void [mscorlib]System.Console::WriteLine(int32)   IL_0023:  ret }

The first constant, 23, gets loaded OK. But the second constant, the 34, has to be boxed into an object before it can be stored in local 1. Now we get set up for the addition. We load that 23 back onto the evaluation stack, but, because there's no way we can add an integer to an object, we have to box that integer too! So we get two objects. IL won't let us add two objects, but there's a method that will. It's buried away in that Microsoft.VisualBasic.CompilerServices namespace - it's a static method, called ObjectType. AddObject. Of course, since we are calling this method across an assembly, it is less likely to be inlined at JIT-compile time. Internally, this method will have to unbox those objects, work out from their types what type of addition is required, add them, then box the result so it can return the result as an object. Then we go and call another method from the VB Compiler Services library to convert that object into an integer.

But you get the point. That missing As Integer in the VB source file has cost us two direct boxing operations, two calls into a library (in place of the add.ovf command), and almost certainly further boxing and unboxing operations inside those library calls. Have I done enough to convince you that using late binding in VB.NET sucks? (And it's not often I use language like that.)

If you're wondering if the situation is as bad in C#, the answer is that C# has stricter type-safety requirements, and won't let you write code analogous to the VB code I've just presented. In C#, you can declare a variable of type object and set it to an int value, but you have to explicitly cast it to an int before you can use it in arithmetic operations. That explicit cast gives the C# compiler the information it needs to generate more efficient code (though not as efficient as it would have been if all variables were properly defined as int in the first place).

There is another reason for bringing up this issue. Even if you are careful always to write VB code that uses early binding (in other words, declaring variables of the correct type), you may encounter issues with code that has been imported from VB6 using VS.NET's automatic conversion process. The VS.NET converter will do the best it can, but if you've declared something without an explicit type in your VB6 code, the best the converter can do is to declare it as Object in your VB.NET code. So something that was perhaps a little dodgy, but considered acceptable programming practice for some purposes, has been auto-converted into some VB.NET code that is highly sub-optimal. Of course, it's not just this. The Visual Basic Compiler Services library is full of classes and methods that are designed to implement features that are only there for backwards compatibility but where the preferred, and almost certainly more efficient, solution in .NET is different. Although this section is specifically about late binding, if you are moving from VB6 to VB.NET, then you should be very wary of using programming methodologies from the VB6 programming model. And if you have used the VB.NET wizard to transfer code across, look at that code carefully for cases where the code ought to be changed to the .NET way of doing things. Late binding is the most obvious case in point. File handling is another example: you might consider changing all those Print # commands to StreamWriter.WriteLine() calls. It's not clear from the documentation whether you'll get better performance in this particular case, but you will almost certainly gain greater flexibility and a more powerful, object-oriented, programming model with more maintainable code. (Of course, if the newly generated code works, that's a big plus, and you might have other priorities than fiddling with it to improve performance or improve your object model - like debugging some other code that doesn't work yet! But since you've got this far through a chapter about performance, I have to assume that you do have time to play with your code for performance reasons!)

Some of these problems disappear if you use Option Strict - this will prevent you declaring variables without supplying a type, and will prevent you performing arithmetic operations on Object, as well as a few other dodgy practices. It's really intended to make your code more robust, but it's quite useful in improving performance by preventing late binding. If we'd declared Option Strict On in the above code, then we'd have got a compilation error, alerting us to the problem of the late-bound variable.

Don't Use VB6 Constructs (VB)

In the last section, we investigated VB late binding, and pointed out that late binding is really just one of a number of VB6 constructs that were suitable for VB6, but are not really suitable for VB.NET. They'll work - VB.NET syntax allows them for backwards compatibility - but they won't perform as well as the VB.NET equivalents. The following table lists some of the more important performance issues related to this point:

Don't Use This...	Use This Instead...
CType() and similar cast keywords on reference types	DirectCast()
Mid() assignment in strings	StringBuilder methods
Redim Preserve	ArrayList instances for dynamic length arrays (this makes a huge performance difference since Redim Preserve always copies the entire existing array, but ArrayList doesn't, provided the ArrayList has sufficient existing capacity)
On Error	Exceptions, with Try, Catch, and Finally blocks; this not only gives higher performance than On Error, but makes your code more readable too

Use C++ If Possible

We saw in Chapter 2 when we examined the IL code produced by different compilers that the C++ compiler will tend to perform more optimizations than compilers in other languages. The C# and VB compilers perform relatively few optimizations of their own, relying on the JIT instead - which is in many ways limited in the optimizations it can perform. Moreover, the C# compiler is a very new application, so you perhaps wouldn't expect it to be particularly sophisticated. The C++ compiler, on the other hand, is the latest version of a compiler that has been around for quite a few years and has been specifically designed to generate high-performance applications. This means that if you code in managed C++, you get the same JIT compiler optimizations as for C# and VB, as well as the C++ compiler optimizations. This is significant because the JIT compiler and the C++ compiler are designed for different circumstances, and therefore will perform very different types of optimizations. We've already seen earlier in this chapter examples of the optimizations the JIT compiler performs. In general, it is geared for a speedy compile. That means it can't perform optimizations that require extensive analysis of the code. On the other hand, it can perform optimizations specific to the hardware the program is running on. The C++ compiler knows nothing about the end processor, but it has a lot of time to perform its compilation. Remember that it is designed for a market consisting of developers who generally don't care if the compiler takes a few extra seconds to compile, as long as the generated code runs quickly. This means that the C++ compiler can look at the program as a whole, and optimize based on a fairly sophisticated analysis of the program. Unfortunately, Microsoft hasn't documented what optimizations it does, beyond the fact that it will perform whole-program optimizations. This benefit is of course in addition to the ability to mix managed and unmanaged code, use the IJW mechanism, and the potential to speed up your code through sophisticated use of pointers at a source-code level.

On the other hand, using C++ will lead to more complicated source code, does demand a higher skill set from developers, and can only be used if you know that your software will only run in an environment in which your users are happy to give it full trust, because the code will be unverifiable (though this may change in future versions of .NET).

If you do code in C++, there is one other point to watch: be very careful about using C++ function pointers to call methods. This is slow, and if you feel you need to do this, consider using delegates instead. When managed code calls another managed function through a function pointer, you will have a transition from managed to unmanaged and then from unmanaged to managed. This also applies to managed virtual methods in __nogc types.

Keep Your Virtual Memory Requirements Small (All Languages)

I mention this really because of the point I made about page swapping being an important factor behind performance loss. Unfortunately, if you are using an extensive number of libraries, keeping your virtual memory requirements small may be largely beyond your control. The main things you can do to help in this area include:

Calling Dispose() where this method is available on your objects as soon as possible.
Making sure you do not reference objects once you no longer need them, since this will prevent the garbage collector from removing them from your program's memory.
Moving rarely used types in your assemblies into separate modules that can be loaded on demand. The same applies to large resources: keep them in separate files rather than embedding them in the main assembly file.
If you are using a large assembly just for one or two useful methods contained in it, you might consider implementing your own version of those methods to save having to load the assembly.
C++ developers who are invoking unmanaged DLLs could also consider using LoadLibrary() to load DLLs on demand rather than statically loading them. This advice doesn't apply to managed assemblies, which are loaded on demand anyway. It's unfortunate that at present the .NET Framework does not provide any way for the developer to request dynamic unloading of assemblies (though if necessary you can achieve this effect by loading assemblies into a separate application domain, and then unloading the application domain). You may be able to reduce the numbers of assemblies loaded by moving code that invokes an assembly to ensure it is only called if it is really needed (one example might be not performing some initialization until the user has selected the relevant menu options - though the this may impact your program's responsiveness).