Building Blocks of the .NET Framework | Building Portals, Intranets, and Corporate Web Sites Using Microsoft Servers

< Day Day Up >

The .NET Framework consists of several elements. Together they provide the integrated development environment we need for developing many types of applications.

Common Language Runtime

The Common Language Runtime (CLR) is the foundation of the .NET Framework. There are quite a few languages in which developers could write .NET code. Microsoft has taken the approach of supporting many old and new languages. Visual Studio .NET is shipped with Visual Basic .NET, C#, and managed extensions for C++. In addition, many other languages are available now in .NET flavors: Java, FORTRAN, COBOL, and others. The CLR supports a high degree of language interoperability: Modules written in different languages can coexist within the same project, call each other's methods , and also inherit from classes written in a different language without any additional effort from the developer. This integration is achieved through the use of a common underlying language, Microsoft Intermediate Language (MSIL). All .NET-based code compiles to MSIL, which serves as a common denominator for all higher-level languages. Other important services performed by CLR are memory management, error-handling services, and security management and type safety enforcement.

Compiled .NET Framework is packaged differently from familiar unmanaged executable files. After your .NET source code is compiled, the framework creates an assembly containing compiled code and metadata. One of the differences is that each assembly can contain several compiled managed modules and several resource files. Besides metadata used to describe classes and types, each assembly contains a manifest that describes the contents of the assembly itself. This approach allows developers to simplify deployment, versioning, and maintenance tasks by creating a single assembly file instead of multiple DLL modules.

A simplified deployment model allows .NET-based applications to be distributed and installed by copying the application directory to the target computer. No additional registration is required, which is another important .NET benefit. This model is achieved by storing metadata describing an assembly within the assembly itself along with MSIL instructions (code). The metadata included in an assembly makes assembly self-describing . There is no need for additional registration information stored in the system Registry or any other data source. When the assembly is loaded into memory, its metadata provides all the information needed by calling code to instantiate and use objects stored within the assembly. Metadata contains the following information:

Full information about classes within the assembly, including class attributes and information about class methods and properties
Information about referenced assemblies

Assemblies

Assemblies in .NET are basic units of execution, versioning, and deployment. Traditional languages, like Visual Basic or C++, compile their code to a native executable format, EXE or DLL, which can be directly loaded by the system loader. .NET compilers produce MSIL code that, along with the metadata information and assembly manifest, forms the assembly. Just like a regular EXE or DLL, each assembly has only one main entry point.

Assemblies resolve an important versioning problem: In a large application consisting of many separate executable modules, it is very hard to ensure the version consistency of all application parts . An assembly allows developers to treat such a large application as a single unit.

Each assembly contains type metadata that describes fully all objects contained within the assembly, resource files, the compiled code (in MSIL format), and assembly manifest. The assembly manifest describes the composition of the assembly.

An assembly is installed by simply copying all files constituting that assembly to a directory on a target computer. The code that uses assembly types will try to locate the assembly in the application directory, in a sub directory of an application directory, or in the global assembly cache (GAC). The GAC provides a way to share a single assembly among multiple applications just as a single registered COM component can be reused by multiple clients . The .NET Framework comes with a command-line tool, gacutil.exe, which can be used to copy an assembly to GAC, to remove an assembly from GAC, or to view GAC contents.

Base Class Library

.NET Framework class library or Base Class Library (BCL) consists of a set of classes representing the prepackaged functionality tightly integrated with the CLR. The framework classes greatly simplify and speed up many common development tasks and also allow developers to extend the library by inheriting from Base Class Library types.

The framework class library is organized in a number of namespaces, each of which contains a set of logically related classes. All namespaces in BCL derive from the common base namespace called System . .NET uses dot syntax convention to organize its types into a hierarchy. To denote the full type name , it concatenates the namespaces to which that type belongs with the type name. For example, the DataSet type (representing an in-memory cache of data) belongs to the Data namespace, which in turn belongs to the root System namespace. The full name of the DataSet type is therefore System.Data.DataSet .

Some of the most important namespaces are:

System.Web provides a rich programming framework for building web-based applications and services using .NET Active Server Pages (ASP.NET).
System.Windows.Forms contains functionality used to build client user interface applications.
System.Data contains ADO.NET classes used to connect applications to databases and other data sources.
System.Xml contains classes used when developers are working with XML data.
System.IO is used to perform stream-based input/output operations and also work with files and directories.
System.Security provides several mechanisms for controlling access to the resources and code. See Chapter 6 for an overview of various security mechanisms in the .NET Framework.

Figure 5.1 shows a conceptual overview of the .NET Framework architecture.

Figure 5.1. Overall .NET Framework Architecture

graphics/05fig01.gif

Running Managed Code

Developers can write code using any of the compilers that target the .NET runtime. After code is written, it is compiled to a set of MSIL instructions. MSIL is platform and CPU independent. Before the managed code is executed, it is converted from MSIL to a CPU- and platform-specific native code using the JIT compiler. .NET compilers produce MSIL together with metadata describing type information.

When MSIL code is compiled to the native code, it undergoes a type safety verification process. Type safety primarily ensures that the code is only accessing authorized memory locations. When an MSIL-compiled method is called during execution for the first time, it is JIT-compiled to the native code and then the new native code is executed. All subsequent execution requests will refer to the previously compiled code.

The operating system loader recognizes a managed code module by examining a special bit in the Common Object File Format (COFF) header. If the managed module is detected , the loader changes the entry point to the CLR entry point.

Garbage Collection and Automatic Memory Management

Garbage collector (GC) is the .NET mechanism responsible for the allocation and release of memory for all objects in the managed code. Garbage collector uses managed heap (a contiguous area of memory allocated for a managed process during its startup) to allocate memory for an object created using the new operator. Garbage collector contains a set of optimizer algorithms that keep track of the allocated memory and decide when the collection is being performed. During the garbage collection, unused objects are deallocated and destroyed .

The concept of garbage collection represents a significant paradigm shift for COM and C++ developers. COM developers are used to controlling the lifetime of an object using reference counting (through the IUknown interface). Reference-counting bugs are notoriously tricky to find and are the source of many COM development hardships.

Memory management is one of the most important tasks of a C++ developer. In C++, every object created on the heap using the new operator must be explicitly deleted by using the delete operator. The tasks of writing proper destructors that release all dynamic memory and of keeping track of all heap-based objects are the source of many bugs in C++ applications.

Garbage collection allows developers to concentrate on the application logic and relegate mundane memory management tasks to the framework. When the first object is created using the new operator, it is allocated at the base of the managed heap. The next object created by the code is allocated in the address space immediately following the first object. This process continues while there is available address space in the managed heap.

The garbage collector keeps track of all allocated objects. It checks periodically to see whether there are any objects that are not being used by the application anymore. If unused objects are found, they are destroyed and the memory space they had occupied is returned to the managed heap.

The tracking process employs a set of application roots, which are reference type pointers. The application roots include:

Global and static pointers
Local reference type variables on a stack
CPU registers

The JIT compiler and CLR together maintain the list of application roots. The garbage collector scans the set of roots to determine the set of objects that are reachable from the application roots. After GC finds an unreachable object, it considers this object to be garbage. Each garbage object is destroyed and its memory is freed. When garbage objects have been destroyed, the garbage collection process compacts the managed heap by copying memory allocated by objects to the new locations and then updating the application roots.

The GC process contains a number of optimizations, the most important of which is the generations-based approach to the garbage collection. The main idea behind this approach is that newer objects tend to have shorter lifetimes and should be collected first.

All objects on the heap belong to generation 0, 1, or 2. Every time a new object is created, it is assigned to generation 0. When address space belonging to generation 0 fills up, the garbage collection process is triggered. The GC algorithm tries to find garbage objects only in generation 0, thus optimizing performance. After the garbage in generation 0 is collected, its unused memory space is reclaimed and it is compacted . All generation 0 objects that survive this collection are considered to have longer lifetimes and are reassigned to generation 1. Various GC algorithms decide when to look at and compact generations 1 and 2 (for example, when insufficient memory has been freed from generation 0).

C++ and Visual Basic developers have traditionally freed object resources explicitly upon the object's destruction using class destructor methods in C++ and class terminate events in Visual Basic. In .NET, the exact moment when an object will be destroyed is not deterministic. Instead, the garbage collector calls a Finalize method on the object. This method performs an implicit cleanup; it is never called directly. To allow the consumer of an object to perform an explicit cleanup, you should implement the IDisposable interface. This interface exposes a Dispose method. Object consumers can call this method directly to clean up expensive resources without waiting for the CLR to perform a garbage collection.

.NET Common Type System

Common Type System (CTS) plays a crucial role in providing type safety, cross-language integration, and cross-language inheritance features. The CTS defines all data types that a language targeting .NET runtime can use. Each .NET language must implement at least a subset of types specified by CTS. The CTS resolves multiple cross-language integration issues, all of which are very familiar to anyone who has had to develop software systems incorporating, for example, COM components written using Active Template Library (ATL) C++ classes in conjunction with a Visual Basic frontend.

CTS compliance means that not only the compiler implements CTS types but also that it adheres to the common set of rules dictating how these types are created and used by the code. The following section describes the primary concepts and the most important features of the Common Type System.

Common Root for All Types

Each type in a language that targets CLR must inherit from the System.Object type, which provides a common set of services including support for .NET garbage collector. This inheritance is implied ; you do not have to explicitly derive your type from System.Object . The following two class definitions are identical:

Using Visual Basic .NET:

 Class C End Class Class C     Inherits System.Object End Class

Using C#:

 class C { } class C : System.Object { }

Value Types and Reference Types

Every type in CLS belongs to one of two broad categories: value types and reference types . Value types inherit from CLR class ValueType (which derives from Object ). Value types are allocated on the stack and contain the value itself. Most of the primitive types like System.Int32, System.Boolean , and System.Single are built-in value types. Structure in .NET is a value type as well, which might come as a surprise to C++ developers accustomed to treating classes (which are reference types in .NET) and structures similarly. Because value types are not allocated on the managed heap, they are not garbage-collected . The lifetime of a value type is determined by its scope. For example, all value types declared within a function are destroyed when the function exits. When the value type object is created, it is zeroed by default. The Copy operation on the value type object copies its value.

Reference types resemble C++ pointers: They contain the reference to the value contained in the object. Reference types are created on the managed heap and derive from the System.Object class (when a reference type is instantiated , the value itself is allocated on the heap and the reference to the value is placed on the managed stack). When the reference type object is created, its default value is null. The Copy operation on the reference type object copies the reference only.

The following reference types are supported by the CLS:

Classes form the basis of object-oriented features of a language. A class contains a set of fields holding data defining its state and a set of methods, properties, and events, which define the class's behavior. An object is the running instance of a class. A class can inherit from other classes. Abstract classes are classes that cannot be instantiated and whose members are not implemented. Abstract classes are primarily used in inheritance chains as base classes. A class can directly inherit from only one base class.
Interfaces resemble classes: They too can have methods, properties, and events (but not fields). All interface members must be abstract. Interfaces are used to declare the common functionality without providing the implementation. A class can implement one or many interfaces by inheriting from these interfaces and providing the implementation for each method of each interface.
Delegates build on the idea of C++ function pointers. While regular pointers point to the memory addresses where objects are located, function pointers point to the location of the function. Unlike C++ function pointers, delegates are actual objects implementing a lot of useful behavior. Delegates can reference both static and instance methods (while function pointers can only deal with static functions). Delegates support an invocation list, which allows for the execution of multiple methods when the single delegate is invoked. CLR supports the asynchronous execution of the delegates. Delegates inherit from the System.Delegate class and are widely used in .NET development to program callbacks and event handlers.
Arrays should be familiar to developers with both C++ and Visual Basic backgrounds. .NET arrays inherit from the System.Array type. .NET arrays are type safe: Each element in the array belongs to the same type.
Strings in .NET are reference type values and are immutable. After a string is created, it cannot be modified. When string operations occur, the original string object is destroyed and a new string is created. The BCL provides the System.Text.StringBuilder class to optimize string operations. This class contains various fast string-handling methods and is preferred when intensive string manipulations are required.

Boxing and Unboxing

There are cases in which the translation between value and reference types is required. The process of conversion of the value type to the reference type is called boxing . The process of conversion of the reference type to the value type is called unboxing . Generally, the value types are treated like the primitive types in C++: They reside on the stack, are not garbage-collected, and have predictable lifetimes. The ability to sometimes treat value types as if they were reference types allows for great flexibility in the code and provides for a unified way to treat all variables without sacrificing the performance benefits of the value types.

Following is a C# example of boxing and unboxing in action.

 // Boxing Int32 n = 10;     // create and assign value to a value type                   // object Object obj = n;// allocate memory on the heap to store the value                // of variable n                // copy the value of n to the heap and copy the                // value of the reference to the heap address to                // the memory allocated on the stack for object                // obj Unboxing Int32 m = (Int32)obj;     // take the value reference to by obj                           // from the managed heap and copy it                           // on the stack

Many CLR collection classes store Object types as collection members, which means that they expect reference type objects. You can easily store both value and reference types in .NET collections due to the automatic boxing/unboxing process: When a value type object is added to the collection, it is automatically converted to the reference type that could be stored as a collection member.

Application Domains

Application isolation is one of the most crucial tasks of any computer system. The infamous instability of the old 16-bit version of Windows was due to the fact that the application isolation level was low: All processes shared a common memory space. Any misbehaving application could corrupt other running processes, including system processes.

The level of application isolation in Windows currently is the process. Each application running under Windows is loaded into its own process with its own independent memory space protected from the memory used by the other processes. The .NET Framework provides more control over the application isolation level by introducing a concept of application domains. In a nutshell , just like an operating system allows you to run multiple processes simultaneously , you can run several application domains within a single CLR process. The isolation level between different application domains within a process is similar to the isolation between different processes.

Just as with processes, code in one application domain cannot directly access code in a different application domain. To pass an object from one application domain to another, either an object is marshaled (local copy of the object is created in a second application domain) or the object is accessed through a proxy using remoting technology.

Application domains have several important advantages:

When code crosses a process boundary, a context switch occurs. This resource-intensive operation is not required during cross-domain calls.
You can start and stop individual application domains without stopping the whole processes. A misbehaving application in one application domain will not affect other application domains within the same process.
A server application can perform dynamic updates without restarting by monitoring for assembly updates. When a new assembly is detected, a server can unload a single application domain and restart it with the updated assembly.
Application domains provide a unit of security and configuration. Each application domain can have its own security policy and its own configuration file.

The System.AppDomain class is used to control application domains programmatically. It allows developers to create an application domain, load an assembly into or unload an assembly from the domain, and shut down the domain.

Runtime Hosts

.NET Framework allows you create multiple types of applications, from console applications, web server applications, and web services to Windows GUI-based applications. Even though these applications might be packaged as familiar EXE and DLL files, they are not native Windows applications. Each .NET-based application must be hosted by a special Windows process called Runtime Host. The task of the Runtime Host is to create a native Win32 process, load CLR into this process, create .NET application domains, and finally load the code from the application's assembly into the application domains.

The following runtime hosts are included with the .NET Framework: ASP.NET, Microsoft Internet Explorer, and Windows Shell executables. The .NET Framework ships with an unmanaged API that allows developers to create custom Runtime Hosts.

< Day Day Up >