CLR-Level Optimization | Special Edition Using ASP.Net

The architects of CLR realized that performance would be a huge issue when the new .NET platform was released. The architects realized that platform would be dead on arrival unless its performance was up to the level that developers and users expect. And given the number of problems that they had to solve with respect to providing a runtime that would be flexible, achieving high performance has not been easy.

To a large degree, performance issues have really hurt Java over the last five years . In my opinion, this is one of the reasons why Java has not done better. And I think that the CLR architects realized that poor performance could be the death sentence for this new framework that they were developing. In this section we'll cover the following:

Performance Monitors
Profiling
Runtime Internals
Finalization
Transitions
Value Types and Strings

Performance Monitors

For many years, the NT operating system has had performance monitors. This has been a very powerful tool to NT network administrators who needed to identify performance bottlenecks on servers and to fine tune the server so that these bottlenecks could be reduced or eliminated. Windows 2000 has the same performance monitor that NT 4.0 had, with several additions. And now the .NET framework has even more performance counters that you can use to find out how your .NET application is performing and whether there are any bottlenecks that need to be addressed.

To run the Windows 2000 performance monitor, go to the Start menu, select Programs, and then select Administrative Tools, and finally select Performance. The performance monitor snap-in appears, as shown in Figure 21.1.

Figure 21.1. The performance monitor snap-in.

graphics/21fig01.gif

When the performance monitor snap-in appears, notice the button that contains a + character (shown in Figure 21.1). The button with the + character on it brings up a list of performance counters that are available on your system. It includes items such as processor, utilization, memory utilization, and many other things. But of most interest are the things that will help in .NET common language runtime development. Eight categories are added when you install the .NET common language runtime and are listed under .NET. You can see these eight categories in Figure 21.2, and they include exceptions, interop, JIT, loading, locks and threads, memory, remoting, and security.

Figure 21.2. The eight categories added when the Common Language Runtime is installed.

graphics/21fig02.gif

Each performance counter category contains specific counters that can help you in your development and refinement. For example, under the memory category, 21 different categories show you things such as the amount of memory in each of the managed heaps, the number of objects that had not been finalized, and many other things having to do with memory.

As a test of the performance counter, I ran a simple ASP.NET application and added to the performance monitor one of the counters in the category of just-in-time compilation, or JIT. I added a counter that showed me the number of methods JITted (JITted meaning Just In Time Compiled ), and then I ran my simple application and saw how the execution was reflected in the performance monitor. When I ran my ASP application the number of methods JITted jumped up by 12. Of course, a number of methods got JITted that you are not concerned with; nevertheless, they are there working behind the scenes.

Performance counters provide you with a valuable tool in your development. They should be your first line of defense when trying to find where your application needs to be tuned up. You see they are always available and you don't need to make any code changes to use them. You can even use them in a production environment.

The performance counters are available programmatically, so you may decide to use them in this way instead of opening the performance monitor method itself. There is a PerformanceCounter namespace that falls under the system.diagnostics namespace. It is the base class that is used to create PerformanceCounter components and interact with their values. The following C# code shows you how to create a PerformanceCounter object, and then sets the CategoryName , the CounterName , the InstanceName , and Get the counter's next value.

 PerformanceCounter pc = new PerformanceCounter();  pc.CategoryName = ".NET CLR Exceptions";  pc.CounterName = "# of Exceps Thrown";  pc.InstanceName = "_Global_";  float fValue = pc.NextValue();

There is also a simplified way to do this. You can actually pass all these arguments into the PerformanceCounter 's constructor instead of setting the CategoryName , CounterName , and InstanceName using the PerformanceCounter properties. The following code shows you how to set the CategoryName and CounterName in the PerformanceCounter 's constructor:

 PerformanceCounter pc = new PerformanceCounter( ".NET CLR Exceptions",       "# of Exceps Thrown", true );

Profiling

The .NET CLR provides a profiling API. This API makes it possible to determine the performance behavior of practically every aspect of your application. Although Visual Studio .NET and the .NET compilers do not provide their own profiling capabilities, it is possible to take advantage of the profiling APIs to create profiling tools.

Two profiling utilities are available right now. One is Rational's Quantify product. You can download a demo that actually works at Rational's Web site. Just go to www.rational.com and you can follow links to the Quantify demo download.

NuMega also has a profiling tool, named TrueTime. Just like Rational's Quantify, TrueTime is a profiling tool that enables you to run your application and get detailed reports on how much time the CPU is spending in each method. This helps you identify where your application is spending most of its time, and enables you to concentrate your optimization in the areas where you will get the most payback. You can get a demo of NuMega's TrueTime at their Web site. Just go to www.NuMega.com.

Runtime Internals

Two aspects of the runtime internals are important to know when optimizing applications. The first is the way the garbage collector (GC) works and how you can best write code to take advantage of it. Also covered in this chapter is just-in-time compiling (JIT) and the impact it has on performance.

The Garbage Collector

Memory allocationin the CLR starts off by walking up the heap and simply finding the first available block of unallocated memory that is large enough to satisfy the allocation request. Subsequent requests for memory allocation just move up the stack to the next available block that can satisfy the current request. This is a very fast way of allocating memory because the overhead in doing so is very small. The problem, though, with this method of memory allocation is that eventually all the memory will be used up. Of course, as memory is de-allocated, sections within the heap are then made available.

When the heap has unallocated sections of memory that reside in varying sizes, allocation gets more complicated. The allocation process then involves walking up the heap, checking out one unallocated block of memory, seeing whether it is large enough to satisfy the request and, if it isn't, moving up the heap until a large enough block is found. This is known as memory fragmentation , and it can significantly affect the performance of memory allocation.

What is needed is to compact the heap periodically so that the fragmentation is reduced. The CLR's garbage collector does this when it deems this action necessary and appropriate.

An object's life cycle can be described in a series of steps. First, an object is created with the new operator. After the object has been created the code can reach it. This also means that the memory can possibly be moved around, unknown to the application. This is possible because references to objects are not made with pointers as is done in C++, but are done with object references.

After the code finishes with the object and the object goes out of scope, the object is no longer in use and the code cannot reach it. The garbage collector will eventually notice this, as it periodically checks the object tree and matches objects that have been allocated to objects that still have references. If it finds an object that is no longer reachable and is not in use, it marks it for collection. This means that it will eventually get around to discarding the memory for any resources that are contained within. This process of discarding memory and releasing any resources that an object contains is called finalization , and when this happens an object is said to be finalized . Actually, an object is first finalized, then the object is collected. And by collected I mean that the memory is returned to the heap so that it can be allocated again.

What I have just described is an accurate summary of what goes on. But it is a very complex process, and there are many intricacies when you study the internal implementation. This summary, though, is important for your understanding of some of the concepts that are covered soon in this chapter.

Just-in-Time Compiling (JIT)

The fact that ASP.NET code runs as compiled native code on servers is a big win for performance. The performance of classic ASP suffered because ASP pages always had to be interpreted at runtime. But ASP.NET pages execute as native code, so their performance is much better than classic ASP.

The catch, however, is that ASP.NET pages must at some point be compiled. This usually happens the very first time an ASP.NET page is requested . This can sometimes be noticed as a slight hesitation when an ASP.NET page is requested for the first time. This is known as just-in-time compiling (JIT).

It would be extremely bad if you rebooted your server and a user hit a large ASP.NET application for the first time. The user might experience excessive delays while each requested page is compiled. This does not win you any points, especially if the first user to use the application after a reboot is your boss.

A few things, though, will come to your rescue in situations such as this. First, pages that are compiled at runtime persist to disk between reboots of the server. So, if you restart your server for some reason, your users won't suffer excessive performance delays when they request pages that have already been compiled at a previous request.

Another option enables you to pre-JIT your application. When you select this option, all the application files are compiled into native code when they are deployed. The compiled files store information that lets them correctly do versioning. For example, if an application was pre-JITted and one part of that application changed, it would be important for ASP.NET to be able to track this change. So for this reason, even pre-JITted binary files carry with them information that clearly indicates the versioning information.

Finalization

Visual C++ and Java developers are used to the concept of a destructor . This is a method that is called immediately when an object goes out of scope. But the CLR does not support the concept of a destructor. Instead, it has something is called a finalizer . The finalizer is similar to a destructor because it is called immediately before an object is de-allocated. There are some differences between a finalizer and a destructor, however, that are important to know.

As the "Garbage Collection" section described, finalization can happen at any time. It does not necessarily happen the instant an object goes out of scope. It relies on the judgment of the garbage collector, which decides when the object will be finalized and de-allocated. For this reason, holding on to resources such as database connections until finalization can be very detrimental to the performance and functioning of an application.

For example, if you have a pool of 25 database connections, and they are used within an object that releases the database connections at finalization, you might run out of database connections and have to run around for the garbage collector to finalize these objects. This is a very bad practice because finalization is not deterministic; it doesn't happen at any known time.

If your object contains references to resources that might be scarce or might have an impact on your server's performance, you should seriously think about implementing a close method. The close method could then release all resources that might be expensive or scarce. You could call your close method immediately before an object goes out of scope, and in this way you would not have to rely on finalization for the resources an object contains to be released. This also would enable you to suppress finalization, which could add a performance gain for your application. Finalization requires the garbage collector to consume some overhead, and if your close method does everything that finalization would have otherwise , you don't need finalization.

It is also a good idea to null out object references after you are finished with them. Setting an object equal to null when you are finished with it can accelerate the garbage collector's final disposal of it. The following code gives an example of doing this:

 MyObject obj = new MyObject();  // Use obj here.  // Now we no longer need obj.  obj = null;  // Continue with your code.

Transitions

Transitions occur when managed code calls unmanaged code or unmanaged code calls managed code. Transitions occur during Pinvokes and COM Interoperability (Interop).

Transitions often require data marshaling, and this can vary from anywhere between 20 instructions to 50 instructions, depending on the data type. Even transitions that don't involve data marshaling slow things down by at least 10 instructions.

One of the things that you hear when listening to talks about the .NET architecture and the CLR is the expression, "It just works." This applies to things such as X copy deployment and application configuration. There is a cost to performance, though, because almost all these "It just works" miracles are done through some sort of COM interop. And if these miracles require any data parameters, the cost of just working goes up even more.

There is an approach to interop that reduces the impact of these performance hits. If you make chunky as opposed to chatty interop calls, the performance reduction to your application will be less. By chunky I mean calls that do a lot of work, as opposed to chatty calls, which each do less work. A chunky call does a lot of work and fewer of these calls are required. A chatty call does less work, but many more of these are required. Making a lot of chatty calls decreases the amount of overhead that the interop calls will make.

Value Types and Strings

Two types of objects can be created used in .NET languages: heap-allocated objects and value types. Value types are special objects that are stack allocated rather than heap allocated. We have already talked a bit about objects in Chapter 2, "ASP.NET Languages." Objects are normally created with the new operator, but value types are very different. They are normally created on a local stack. This means that memory allocation for value types is different than memory allocation for objects.

Value types include things such as int s and long s, but extend to other more complex types such as struct s. Using value types will help the performance of your application because you avoid a great portion of memory allocation and de-allocation overhead that using objects incurs. In addition to saving the overhead of memory management, value types are almost always passed to other methods by value instead of reference. (Operations to values on the stack are inherently faster than operations to objects.) They can be passed by reference, but by default and in most normal cases they are passed by value. Value types are good for small chunks of data. Such chunks of data might be things such as points, because a point structure would contain an X and a Y integer. Thus value types normally contain a very small amount of data.

Strings are immutable. That means that after a string is created it cannot be changed in any way. Of course, the .NET languages support things such as addition operations for strings. So this makes it look as if you can add additional string fragments to string objects. But what really happens under the covers is that new strings are created that replace the old strings, and the old strings are discarded and replaced by the newly created strings. This turns out to be a performance loss.

Another thing to remember about strings is that performing a lot of manipulations to strings slows your application down more than you might realize. For this reason, you should use a StringBuilder object when you are building string in several steps. Therefore, by using the StringBuilder object, you are using a mutable or changeable object that does not have to be re-created every time you perform an operation. So you can build your strings with a StringBuilder and then assign them to a string object if need be.

Another thing to realize with strings is that using the string properties does not incur the kinds of performance penalty that you might expect. For example, when I write C++ code and use the CString 's GetLength method, I normally put that into a temporary integer variable so that I don't have to call the GetLength method more than once. I do this because calling GetLength requires processor time. So, to avoid the extra processor time required to call the GetLength method, I simply call it once, put it into a local integer variable, and then use this integer variable in my loops , as shown in the following code:

 int nLength = strText.GetLength();  for( i=0; i<nLength; i++ )  {    // Do stuff here.  }

However, .NET languages can optimize the string properties. For example, rather than cache the length of the string in a local integer variable, you should simply use the Length property for your loop counter and the compiler will optimize this for you, as the following code shows:

 for( i=0; i<A.Length; i++ )  {    // Do stuff here.  }

Now that you have an understanding of how to do CLR-level optimization, we'll move on to other important topics. In the next section, we'll cover memory leaks and deadlocks.