Understanding Assemblies | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

While namespaces can be understood as a logical grouping or encapsulation of classes, the assembly is a “physical” container for at least one built (compiled) executable or class file or module or some other resource, like an icon. If the assembly is a class library, then the class or classes it harbors are referenced by the fully qualified namespace (FQNS) name described in the preceding section. If the assembly is an executable file, an application, you reference it by the name of the physical file, which needs an entry point to allow the operating system initiate its execution.

Note

Assembly names and namespace names should not be confused. The two names, while often similar and sometimes identical, have very little to do with each other.

At the physical level an assembly is many things, and the organization of its contents- Microsoft Intermediate Language code (MSIL) and metadata-is quite complex. While you don’t need to know the ins and outs of the contents of the assembly, you need to fully understand what an assembly is and how to build it, name it, distribute it, and manage it in order to be effective in your development efforts. This section will help you achieve that so that you can navigate your software development results and the chapters of this book more easily. You will understand assemblies better if we separate them into the four types of units that the Visual Basic compiler can produce them as

Console executable This assembly is the standard, GUI-less, console Window that we have been compiling to so far in this chapter. Console assemblies have the .exe extension. OS entry into the executable is through Main. Console executable code is not supported in SQL Server.
Windows executable This assembly is the standard .NET Windows executable file. The assemblies are also given the .exe extension. OS entry into the executable is through WinMain. Windows executable code is not supported in SQL Server.
Class library This assembly is your standard .NET class library, which can be dynamically linked. These assemblies are given the .dll extension. They can contain one class or many. OS entry into the library is via DLLMain.
Class module This assembly is your standard class module, which is used as a container for compiled classes that still need to be linked into a project or as part of a formal class library before it can be used. These assemblies are given the .netmodule extension. No entry into this file is required because entry is via the DLLMain of the assembly it is linked to.

SQL Server’s use of the CLR needs only to work with class library and class modules that have been specifically tailored to SQL Server. SQL Server obviously does not need to run Windows GUI application or console applications. It can be considered good .NET programming practice to name an assembly such that it describes the purpose and provides a hint of the types inside it and the purposes of these classes. The System.dll file that ships with the Framework is a good example. However, also naming the assembly “System” tends to blur the distinction between the assembly name and the namespace name (such as System.Data, which refers to both the namespace and the assembly name). I think it’s better to give your assembly a name that does not “clash” with the root namespace name.

Before we discuss the four types of output files further and how they are produced, let’s take a closer look at how assemblies are located by the runtime, the actual makeup of an assembly, and the roles they play in .NET Framework software development.

Locating Assemblies, Anytime

Most of time, the assemblies you create-executable applications, functionality, or resources- reside in a folder you create with an installation routine or utility. The default location when you are building assemblies is the project folder for Visual Studio.

The assemblies can be stored in the root folder of your application, or in subfolders. You have a lot of flexibility in where you house your assemblies and how you get them to their folders.

The other location for your assemblies is the Global Assembly Cache, or GAC (pronunciation rhymes with wack). Assemblies placed into the GAC must be shared and given strong names (described later in this section), so these assemblies would typically be used by more than one application or user, even concurrently. The concept of “registering” with the GAC is similar to registering with the registry, just not as fragile a process or as hard to maintain. For SQL Server you might create a folder for assemblies inside the SQL Server folder hierarchy under Program Files.

There are ways of overriding the default methods for locating assemblies. You can also redirect the path to an assembly. Assemblies can also interoperate with the COM and COM+ world and are accessible from unmanaged clients, something you would not typically do for SQL Server assemblies.

Microsoft suggests keeping assemblies private and thus out of the GAC if they do not need to be shared, which is a good practice for SQL Server CLR code.

What’s in an Assembly

In the early days of developing for the Microsoft operating systems (usually one of the early shades of Windows), the compilers produced a file that was compliant with two standards, the Microsoft Portable Executable (PE) format and the Microsoft Common Object File Format (COFF). The two standards were created to enable the operating system to load and execute your applications, or link in the DLLs.

The formats specified how the compiled files were laid out, so that the OS found what it expected to find when it executed or loaded your files. The .NET assemblies adopt the PE/COFF combination to enable the runtime to process your files in the same fashion as the standard executable files you compile, and this is true on SQL Server CLR as well.

Tip

You can’t ignore this section if are in charge of deploying, packaging, or installing assemblies to SQL Server.

Metadata

Assemblies carry metadata so that they can describe themselves to the runtime environment (the CLR). The metadata describes code and class data, and other information like security. .NET assemblies are not compiled to machine code, like their native brethren, but rather to MSIL.

Metadata provides us with a simpler programming model than what we have been accustomed for so many decades. We no longer need to work with complex and finicky Interface Definition Files (IDL), dozens of cryptic header files that are so tedious and time-consuming to prepare, and external dependencies for code and components alike. This is why a .NET assembly is a no brainer to run on SQL Server.

When a .NET (PE) file is executed or loaded, the CLR scans the assembly for the metadata manifest that will allow it to interpret, process, JIT-compile (down to machine code) and then run the file. The metadata is not only for the benefit of the CLR but it identifies the assembly-allowing it to describe itself-to the SQL Server .NET environment or Framework, even across process boundaries.

Figure 11–4 illustrates how the contents of the PE/COFF assembly are assembled, hence the terms assembly-which is not a new term to computer language boffins. While it is convenient to keep calling the .exe files the compiler can produce executables, they are not really executable without the presence of the Common Language Runtime on the computer, an issue that is likely to disappear within a few years. Remember how the issue of having the Java Virtual Machine became a non-issue.

image from book
Figure 11–4: An assembly comprises several layers

More about Metadata

When you build or compile your file into the PE format, metadata is inserted into one portion of the file while your code is compiled down to MSIL and inserted into another portion of the file. Everything in the file is described by the metadata that is packed into the assembly, including inheritance, class code, class members, access restrictions, and so on.

When you execute an application and a class is referenced, the CLR loads the metadata of the respective assembly and then studies this payload to learn everything it has to know to successfully accommodate the assembly, its resources, and the requests of the contents.

The metadata describes the following:

Description of the assembly This metadata describes the identity of the assembly, such as name, version, culture, public key, and so on. It also holds references to types that are exported, the assembly’s dependencies, and security permissions.
Description of the assembly’s types This metadata describes the types in the assembly. The description includes the name, the visibility of the class, the base class, and any interfaces implemented. It also describes class members, such as methods, data fields, properties, events, and type composition or nesting.
Description of Attributes This metadata describes the additional descriptive modifiers that alter types and their members.

The metadata just described provides a sophisticated mechanism for allowing assemblies to describe themselves to the CLR. In other words the metadata includes everything the CLR needs to know about a module and its execution and interaction with other modules in the CLR. Since the assemblies do not require explicit registration to the operating system, application reliability is increased exponentially

The metadata also facilitates language interoperability and allows component code to be accessed equally by any CLS-compliant language. You can inherit from classes written in other languages by virtue of the BCL, which is mostly written in C#.

The PE file is divided into a section for metadata and a section for the MSIL code. The metadata section references the MSIL sections via a collection of tables and heap structures, which point to tokens that are embedded in the MSIL code.

This also means that you cannot change the contents of the assemblies or “fix” the MSIL code without the assembly metadata knowing about it. This provides a consistent means of checking up on the integrity of the assembly contents-that it has not been compromised.

The metadata token is a four-byte number that identifies what the token references in the MSIL-a method, a field, and so on.

The Nature of the Assembly

In additional to the logical types of assembly described earlier, assemblies can be either static or dynamic and private or shared:

Static assembly This assembly is the .NET PE file you create whenever you compile and build a class library or some type of application. The namespaces we discussed earlier are typically partitioned across such assemblies. They can be in one assembly or partitioned across multiple assemblies.
Dynamic assembly This assembly is a memory-resident module that gets loaded at runtime to provide specific runtime services. A good example of dynamic assemblies is the Reflection class collection, which allow you to reference and access runtime type information.
Private assembly This assembly is a static assembly that can only be accessed by a specific application. This assembly is visibly only to the application or other assemblies in its private folder or subfolder.
Shared assembly This assembly is given a unique or strong name and public key data so that it can be uniquely identified by the CLR. It can be used by any application. A dynamic assembly can also be shared.

Let’s now take a closer look at the contents of an assembly-and among other things its IL code. The quickest way to do that (besides reading this book) is to run the IL disassembler application that ships with the .NET Framework Software Development Kit (SDK). The file is called ILDASM. Double-click the application and the application will load.

Go to File | Open and aim the application at any assembly you might already have created. Let’s first check out the assembly manifest so that we know what we are looking at.

The Assembly Manifest

The manifest is the critical requirement of the assembly because it contains the assembly metadata. However, you can compile an assembly to MSIL without a manifest, to produce a netmodule (see the section on module assemblies later in this chapter). Assembly manifests can be stored in single-file assemblies or in multifile assemblies in stand-alone files.

The assembly manifest’s metadata satisfies the CLR’s version requirements and security identity requirements, the scope of the assembly, and resolution of resources and types.

The assembly manifest provides the following metadata:

Metadata that identifies the assembly, which includes the name, version number, culture (language and culture), public key, digital signature, and so on
Metadata that identifies all the files that compose the assembly, as a single file or as many files that form a logical unit
Metadata that provides for the resolution of the assembly’s types, their declarations, and implementations
Metadata that resolves dependencies (other assemblies on which this one depends)
Metadata that allows the assembly to describe itself to the runtime environment

The manifest code in the assembly is exposed as follows:

 .module SQLcr.dll // MVID: {} .imagebase 0x11000000 .subsystem 0x00000002 .file alignment 512 .corflags 0x00000001 // Image base: 0x03680000 .namespace SQLcr.Ch11{   .class /*02000002*/ private auto ansi sealed Welcome   extends [mscorlib/* 23000001 */]System.Object/* 01000001 */  {  .custom /*0C000001:0A000003*/ instance void [Microsoft.VisualBasic/* 23000002 */]Microsoft.VisualBasic.Globals/* 01000003 *//StandardModuleAttribute/* 01000004 */::.ctor() /* 0A000003 */ = ( 01 00 00 00 )   .method /*06000001*/ public static void Main() cil managed     // SIG: 00 00 01     {    // Method begins at RVA 0x2050    // Code size       20 (0x14)    .maxstack  8  .language '{}', '{994B45C4-E6E9–11D2–903F-00C04FA302A1}', '{00000000–0000–0000–0000– 00000000000}'

The Role of the Assembly

So now you have seen what goes into the assembly and what the manifest achieves. But what does the assembly do for you? Without getting lost in the minutiae of the Framework, let’s investigate the essential roles of an assembly An assembly is

A type boundary
A reference scope boundary
A unit of deployment
A unit of execution
A version boundary
A security boundary

Assemblies as Type Boundaries

On the file system the assembly looks like any other dynamic link library and, as discussed earlier, usually goes by the .dll extension, although it can also be a cabinet file (with the .cab extension).

First of all, you can build a class and make its source code available to any application. But you would mostly do that for your own use, and maybe for your development team members. However, I don’t suggest you provide “raw” classes to your team members either, because with access to the actual source code there’s no telling what problems can be introduced. You would only supply the raw source files if your user specifically requested or needed it, as do readers of this book, or your customers have opted to buy the source code of your components (usually as a safeguard against your going out of business).

The best examples of assemblies, as mentioned earlier, are the ones that contain the base class libraries that essentially encompass the .NET Framework. As mentioned earlier, SQL Server uses a subset of these. To compile a class to IL and package it up into an assembly is very straightforward. You simply build the class and specify to the compiler which assembly you want to put it in and under what namespace.

Classes (or types as they are known when they have been reduced to IL) are separated by the assembly in which they reside, which is why the assembly is known as a type boundary. In other words, two types can be placed onto the same namespace, but they can exist in individual assemblies. The problem arises when you try to reference the type in the IDE because you can only Import to one fully qualified namespace. The IDE, by the way, will not let you reference the second class twice but will report to you that you have already made the reference.

Assemblies as Reference Scope Boundaries

The manifest metadata specifies the level of exposure a type and its resources have outside the assembly, the dependencies of the assembly (other assemblies on which it depends), and how types are resolved and resource requests satisfied.

If the assembly depends on other assemblies that are statically linked to it, then their names and metadata are included in the manifest. Data such as the referenced assembly’s name, version, and so on are stored in the manifest.

The reference scopes of the types in the assembly are also listed in the manifest. The types can be accessed outside the assembly, which is the process that lets you reference them by their FQNS, or they can be given friend access, which means that they are hidden from the outside world-only accessible to the types within the same assembly in which the friend resides.

Assemblies as Units of Deployment

When you execute an application, the application assembly calls into any other assemblies that it depends on. These assemblies are either visible to the application assembly, .exe file, in the same folder or in subfolders, or they are visible in the runtime environment because they have been installed in the GAC.

Assemblies installed in the GAC are shared, which exposes them to other assemblies that may need access to their internals. You might also ship utility classes, culture and localization classes, components, and so on, and these can be installed in the applications installation folder or also installed into the GAC. These assemblies let you build very thin application assemblies and allow you to keep successive deployments small, where you just need to change out the assembly that is outdated.

Also, versioning in .NET lets you or your users install new versions of your assemblies, without breaking the assemblies from previous installation and so breaking applications that have already been installed on the system.

Assemblies as Units of Execution

The CLR lets all shared assemblies execute side by side or be accessed side by side. What that means is that as long as you create a shared assembly, with a strong identity and a unique version number, and you register it into the GAC, the CLR will be able to execute the assembly alongside another assembly. The DLL conflicts of the past are thus abolished under the CLR because only the version number and unique public key data allow the CLR to distinguish between the assemblies.

You will also likely avoid the problem of a new assembly overwriting an older one, thereby “breaking” the previous installation.

The CLR also has no problem referencing any dependent assemblies because all the information it needs to be sure it is executing or linking in the correct files is each assembly’s manifest. This is known as side-by-side execution. The only difference between the two assemblies is the version numbers of each.

Assemblies as Version Boundaries

The assembly is the smallest versionable unit in the CLR, which means that the types and other resources it encapsulates are versioned with the assembly as a unit. A class cannot stand alone and be accessed outside of the assembly architecture because there is no way to reference it. The class or type can be either part of the application assembly or stand alone in its own assembly, which provides the version data for it.

The version number is encapsulated in the assembly manifest, as shown earlier. The CLR uses the version number and the assembly’s public key data to find the exact assembly it needs to execute and any assemblies that may be dependent on the specific version.

In addition the CLR provides the infrastructure to allow you to enforce specific version rules.

Assemblies as Security Boundaries

The assembly is a security unit that facilitates access control to the data, type functionality, and resources it encapsulates. As a class provider, the CLR allows you to control access to your assembly’s objects by allowing you to specify a collection of permissions on an assembly The client process-rich clients, thin clients, Web forms, or otherwise-must have the permission you specify in order to access the object in the assembly

This level of security is known as code access security. When an assembly is accessed, the CLR very quickly determines the level of code access allowed on the assembly. If you have authorization, you get code; if not, you’re history. The idea of controlling code access is fairly new and in line with the model of distributed functionality that is becoming so widespread. Code access security also employs a role-based security model, which specifies to the CLR what a client is allowed to do with the code it can access.

The security identifier of an assembly is its strong name, which is discussed in the next section.

Besides client access to assemblies, system resources also require protection from assemblies. The SQL Server CLR security secures access to system resources by comparing credentials and proxies of credentials to the Windows file system’s security architecture.

Strong Names

Assemblies can be given strong names, which will guarantee their uniqueness and provide security attributes. The strong name is made up of the assembly’s standard name (such as codetimes.sqlserver.system), its version number, culture, public key data, and digital signature. The strong name is generated from all this data, which is stored in the assembly manifest. If the CLR were to encounter two assemblies with the same strong name, it would know that the two files are 100 percent identical.

Strong names are issued by Visual Studio and by development tools that ship with the .NET SDK. The idea behind strong names is to mainly protect the version lineage of an assembly, because the guaranteed uniqueness ensures that no one else can substitute their assembly for yours, which otherwise would be a major security loophole. In other words, a strong name will ensure that no other assembly, possibly packed with a hostile payload, can masquerade as your assembly.

The strong name also protects your consumers and allows them to use the types and resources of your assernblies with the knowledge that your assemblies have not been tampered with. This is a built-in integrity check that will allow consumers to trust your code. Combined with supporting certificates, this offers you the ultimate security system for the protection of enterprise and distributed code.