The External View: Assembly Logical Structure

At this point in the chapter, we have largely completed our discussion of the internal structure of assembly files. We are ready to move on to examine assemblies on a higher level, and consider how to manipulate them. We'll kick off this part of the chapter with a brief review of the subject of assembly identity - in other words, what characteristics of an assembly distinguish it from the other assemblies on your system.

Assembly Identity

The identity of an assembly consists of four items:

  • The name

  • The version

  • The public key (which of course depends on a private key being available when the assembly is signed)

  • The culture

Assemblies that have not been signed do not of course have a public/private key combination. They do, however, still have a hash of the assembly contents, which can be used to verify that the assembly has not been corrupted. The hash means that you can detect if some random corruption has happened to the assembly or one of its files, though obviously it can't guard against the possibility of some malicious person replacing the entire assembly.

All four of the above items form an essential part of the identity of an assembly. Two assemblies whose identity differs in any one of these items will be considered as completely different assemblies by the CLR.

Here we'll quickly review these items.

Name

You should think of the name as the filename for the file that contains the prime module, but without the .exe or .dll extension. Although the assembly name is stored separately inside the file, you should normally keep the assembly name the same as the name of the prime module file (minus extension).

Version

There are no surprises here. The version of the assembly is simply a set of four numbers, known respectively as major version, minor version, build, and revision. For example, version 2.1.345.0 should be interpreted as version 2.1, with build number 345. The version is normally indicated as part of the assembly manifest, and is indicated in this way in the IL source:

 .assembly MyAssembly {    .ver 2:1:345:0 } 

Dependent assemblies will normally specify that they require a particular version of the referenced assembly:

 .assembly extern someLibrary {    .ver 2:1:0:0 } 

How you interpret the four numbers in your code is up to you. One good way of working is to use the major and minor version numbers to indicate breaking changes - and changes in build and revision number should be non-breaking bug fixes. Typical use of the build and revision numbers might run something like this: the build number gets incremented each time a full build of the application is done, and the revision number will be modified if for some reason an unusual extra build is done. For example, if it's normal practice in your group to do a new build of some large application each day, but on one day due to an unexpected bug an extra build needs to be done, then this would be indicated by the revision number. However, these are only recommendations - in the end it's up to your organization how it treats these numbers.

Public/Private Key

This is the one aspect of an assembly's identity that is optional. For private assemblies, you can choose whether or not to sign the assembly with a key, although signing is compulsory for assemblies that are to be placed in the Global Assembly Cache and made publicly available to any application. Assemblies that have been signed are said to have strong names. The unusual aspect of the key is that it is not stored in its entirety as separate data in the assembly manifest in the way that is done for other aspects of the assembly identity. Instead, only the public part of the key is stored. The private part of the key is instead used to encrypt the hash of the assembly contents using a technique known as public-key cryptography. We'll examine how this technique works in Chapter 13. For now, we'll simply say that the public and private keys have been computed in such a way that decrypting the hash using the public key (which is readily available) will only yield the correct decrypted hash if the hash was originally encrypted with the correct private key. Assuming that an organization keeps its private key secret, then that provides a guarantee that the assembly was produced by the correct company.

Culture

Culture, roughly speaking, amounts to .NET's implementation of an old concept - that of indicating the intended language and area in the world that an assembly is intended to be used in. In pre-.NET days, the Windows locale ID provided this information. While the locale is still used as the way that Windows identifies the area, .NET has refined the concept into the culture, which not only identifies the language and the geographical region, but also is supported by a number of .NET classes in the System.Globalization namespace.

As far as assemblies are concerned, the main use for the culture is to identify resources to be loaded. The way this is normally implemented in .NET is that the code for an assembly that contains code should have what is known as the invariant culture - a default culture that doesn't indicate anywhere or any language in particular. But then resources such as strings that will be displayed by the application are stored separately in related assemblies known as satellite assemblies, each of which has an associated culture. An application therefore loads its main assembly containing the code, then, based on the culture it is running under (usually taken by the .NET Framework from the LCID that Windows thinks it's running under), identifies the appropriate satellite assembly from which to load resources - ensuring for example that text is displayed in Japanese if the application is running on a Japanese installation of Windows, or French if it's a French installation of Windows. This all means that the main assembly should be able to run satisfactorily anywhere in the world.

The culture can consists of two parts:

  • The language

  • The region

These items are each indicated by two- or three-character strings. The first two letters, which indicate the language, are normally lowercase, are separated by a hyphen from the final letters, which indicate the geographical region and are normally in uppercase. For example, en-GB indicates English as used in the United Kingdom, en-US English as used in the USA. The format for the strings used are an industry-wide standard, and is defined in Request For Comments (RFC) number 3066 at http://www.ietf.org/rfc/rfc3066.txt, while the language codes is defined by ISO standard 639, at http://Icweb.loc.gov/standards/iso639-2/langhome.html. However, for the full list of cultures recognized by the relevant .NET classes, you're best off looking in the MSDN documentation.

Although cultures can contain designations for both language and region, this is not essential. For example, if you prefer, you can supply a satellite assembly in straight English (en) without further localizing it to the country. Cultures such as en are referred to as neutral cultures, and those such as en-GB as specific cultures. The region itself incidentally is used independently of the language to - determine such things as formatting of numbers, dates, and currencies by the relevant .NET classes.

Referencing Other Assemblies

Assemblies are normally referenced from other assemblies that are dependent on them - for example, an executable will contain references to the libraries that it uses - AssemblyRefs. An AssemblyRef is introduced into the IL source by the .assembly extern directive as we saw in Chapter 1.

 .assembly extern MyLibrary {    .ver 1:0:0:0 } 

We'll simply remark here that you won't necessarily include the entire identity of the referenced assembly in your IL or high level language source code, although you can do so if you wish. Provided you specify the name, high-level language compilers are generally quite capable of extracting the remaining information from the referenced assembly as they compile, and inserting it into the emitted dependant assembly. You will need to make sure that the version of the assembly located and checked by the compiler is the same as the version that will be loaded at run-time (We'll examine how the compiler and CLR locate assemblies soon). You'll also find that, even in the emitted assembly, the full public keys of referenced assemblies are not stored. That's because they are quite large, so storing the full keys for all referenced assemblies would bloat an assembly (remember that an assembly might typically reference many other assemblies). Instead, a public key token is stored - this contains 8 bytes of a hash of the key, and is sufficient to verify with near certainty whether the referenced assembly that is loaded does in fact have the correct public key. Of course, each assembly does indeed store its own public key in its entirety.

Reading Assembly Contents

There are generally two situations in which you will want to read the contents of an assembly to see what the assembly contains - for example, what types are defined in it, what methods are implemented, etc. You may want to do this at development time, when you want to know more about the libraries your code might reference (or simply if you're exploring to find out more about the .NET Framework), or your code may need to do this at run time, if its operation depends on reflection. If you are working at development time, you'll want to use one of the utilities Microsoft has with written with which to explore assemblies and PE files, while at run time you'll need a programmatic API (which is what the utilities will use internally anyway).

In this section we'll quickly review some of the available Microsoft tools and APIs. You will also find there may be third-party tools available on the Internet, but here we'll confine ourselves to the ones that come with VS.NET or with the .NET Framework.

ildasm

ildasm.exe is probably the tool you've used most - indeed we've already been routinely showing ildasm screenshots where appropriate in earlier chapters in this book. It is excellent for providing a relatively high-level, logical view of an assembly, including IL code and metadata.

There are a couple of subtleties to be aware of when using ildasm. One is that by default ildasm starts up in basic mode, which means you don't get quite all the available options to view the file. To start it in advanced mode you should use the /adv option:

 ildasm /adv 

This gives you a couple of extra options on the view menu to look at the statistics for the file (how many bytes each part of it occupies, etc.) and to look at the raw header information.

Another issue with ildasm is that it is unable to directly view assemblies that are in the assembly cache. Why Microsoft put in this restriction is frankly beyond me. If they meant it as a security precaution to protect the code in shared assemblies from being viewed then it's so easy to circumvent as to be virtually worthless, and it does serve as a minor irritant if you want to examine such code.

If you want to use ildasm.exe to examine the DLLs in the assembly cache, then you have a number of options:

  1. Use the command prompt. If you know where the assembly you want is located then you can navigate into the assembly cache with the command prompt and copy the file out. Unfortunately, because of the strange folder names, it's not often that you'll know the location of the file you want.

  2. Copy the assemblies out en masse. It takes a few minutes to write a short program in C# or VB.NET which recursively searches through all the folders in the assembly cache and copies out every .dll or related file into some other file of your choice.

  3. Use the copies in the CLR system folder. This is the folder in which the .NET Framework is installed - in version 1.0, it's %windir%\Microsof t.NET\Framework\vl.0.3705 - obviously for future versions this version number will change. You'll find this folder contains all the unmanaged DLL's that implement the CLR (mscorjit.dll, mscorwks.dll, etc.) as well as many of the compilers and tools you are used to using (csc.exe, ilasm.exe, gacutil.exe). The folder also contains copies of every .NET Framework Class Library DLL (as we'll see soon, copies are needed here for compilers to look up when they resolve references in your code). You'll also incidentally find the only IL copy of mscorlib.dll here. This DLL is so fundamental to the operation of managed code that it is kept in the CLR's install folder, and always loaded from there. The assembly cache contains only the ngen'd native version of mscorlib.dll.

DumpBin

DumpBin.exe is a useful utility supplied by Microsoft that displays the contents of PE files, as well as providing some degree of interpretation of them. It works at a lower level than ildasm - for example, if you want, it will display the raw binary data. Unlike ildasm.exe, DumpBin is designed for all PE and COFF files, not specifically for managed assemblies. This has the advantage that it lets you see all the PE header information that is skipped by ildasm (if you want to see that stuff of course), and the disadvantage that it is able to do very little CLR-based interpretation of metadata, etc. Microsoft has added a /CLR option to DumpBin that lets you view the CLR header, but the information you get from that is pretty limited. There's also the disadvantage that DumpBin is a command-line, not a GUI-based, tool. Still, as an overall tool, DumpBin is very useful for displaying the generic contents of a PE file. If you want to use this utility, simply type dumpbin <filename> at the command prompt.

Reflection

The .NET System.Reflection classes allow you to programmatically examine assemblies or types, as well as to instantiate instances of types and invoke methods on them. If you want to use reflection to - examine an assembly, then your starting point is likely to be the System.Reflection.Assembly class, and in particular one of the static methods Assembly.GetExecutingAssembly(), Assembly.LoadFrom() or Assembly.Load():

 Assembly thisAssembly = Assembly.GetExecutingAssembly(); 

GetExecutingAssembly(), as the name suggests, returns an Assembly reference that can be used to find out about the assembly that is currently being executed, while LoadFrom() and Load() load an assembly given respectively its filename or assembly name. As is suggested by their names, they actually load the assembly into the current process if it's not already loaded. Analyzing the data in an assembly with this managed API requires the whole assembly to be loaded. You'll need to be aware of this, as it could cause your working set to rise considerably if you are loading and analyzing a large number of assemblies - and at present .NET does not support unloading of individual assemblies. You can avoid this problem by loading assemblies into a separate application domain and unloading the application domain.

We won't go into the Reflection classes in detail - they are adequately documented in MSDN and in many other .NET books. However, we'll present a very quick example to show you how to get started analyzing an assembly.

The example is called ReflectionDemo. It is a simple C# console application that uses Assembly.LoadFrom() to load up the System.Drawing.dll assembly from the folder that contains copies of shared assemblies for VS.NET's use, and then displays a list of the types defined in that assembly. The code for the Main() method for this sample looks like this:

 static void Main() {    string windir = Environment.GetEnvironmentVariable("windir");    Assembly ass = Assembly.LoadFrom(windir +                  @"\Microsoft.NET\Framework\v1.0.3705\System.Drawing.dll");    foreach(Type type in ass.GetTypes())       Console.WriteLine(type.ToString()); } 

A quick sample of some of the output from ReflectionDemo (a small part of the output - the list of types goes on for several pages!) looks like this:

 ThisAssembly AssemblyRef System.Drawing.SRDescriptionAttribute System.Drawing.SRCategoryAttribute System.Drawing.SR System.ExternDII System.Drawing.Image System.Drawing.Image+GetThumbnailImageAbort System.Drawing.Image+ImageTypeEnum System.Drawing.Bitmap System.Drawing.Brush System.Drawing.Brushes System.Drawing.Imaging.CachedBitmap System.Drawing.Color 

Once you have a Type reference you can go on to manipulate instances of the type, as detailed in the documentation for System.Reflection.

The System.Reflection classes are designed to support working with the various managed types defined in an assembly, and as such do not include much support for examining the assembly at a lower level, for example examining its file structure or the CLR headers. As an example, there is no way to use reflection to find out whether an executable assembly is a console or Windows application. To obtain that kind of information you'll need to manually examine the Subsystem field in the PE header. You can also gain some more low-level information using the unmanaged reflection API, which we'll examine next.

The Unmanaged Reflection API

The unmanaged reflection API consists of a small number of COM components that are able to examine and extract information from an assembly. It is not nearly as sophisticated as its managed equivalent - for example, it does not support instantiation of objects. However, it does allow more access to the metadata and header information in an assembly - which means you can use the unmanaged reflection API to access information not available using the System.Reflection classes. It is the unmanaged reflection API, and not the System.Reflection classes, which is used internally by ildasm.exe.

The API is not really documented in MSDN, but you can find a Word document, Metadata Unmanaged API.doc, that defines the components available as well as an example (the metainfo example) in the .NET Framework SDK, under the Tool Developers Guide folder. If the lack of documentation wasn't enough to dissuade you from using the unmanaged reflection API, the COM components it contains do not have an associated type library, which means that if you want to invoke methods in this API from managed code, you can't use tlbimp.exe or any of the built-in support in VS.NET for COM interop to help you. However, there's nothing to stop you from using IJW to access these methods from managed C++ code, or from writing some managed C++ wrappers around the components.

If you do want to use the unmanaged reflection API, then you'll find the starting point is to instantiate the COM object called the CorMetaDataDispenser using a call to CoCreateInstance() - you can then manipulate this object to extract metadata information and related objects from an assembly.

Exploring the Assembly Cache

The assembly cache contains the shared assemblies (in the GAC), as well as native images of assemblies that have been ngen'd, and copies of assemblies that have been downloaded from remote machines to be executed, for example from inside Internet Explorer.

The ShFusion View of the Cache

You'll no doubt be well aware that the assembly cache appears to Windows Explorer as a structure that contains assemblies but whose internal details are hidden. This is thanks to a shell extension, shfusion.dll, which is installed with .NET and which takes control of the user interface Windows Explorer presents for the files in this folder. Thus, opening Windows Explorer and navigating to the assembly cache gives us something like this:

click to expand

The shell extension has of course been written for your own protection - it stops you fiddling with Global Assembly Cache and breaking it! Beyond looking to see what files are there, there really is very little the shell extension will let you do with the cache. If you right-click on an assembly you get the option to either view the assembly's properties (basically the same information that is already in the list view) or delete the assembly - something which I wouldn't recommend doing for any of the Microsoft-supplied assemblies! The view supplied by the shell extension tells you the name and version of each assembly. You also get an indication if the assembly is actually a native image, that is to say, an ngen'd assembly. All ngen'd assemblies are here, whether shared or private. However, any such private assemblies located here can still only be accessed by the applications for which they were intended, since the native image has to be loaded in conjunction with the original assembly, which - if private - won't be in the assembly cache. The culture of each assembly and its public key token are also listed. Obviously, you need the public key token to be able to use the assembly. In the above screenshot, most of the assemblies are the Microsoft framework base class libraries, which therefore have the same Microsoft public key.mscorlib, has a different key - this is because mscorlib contains ECMA standard libraries and is therefore signed with the ECMA private key. In the screenshot there are also a couple of assemblies of my own in the cache, and notice that two of the Microsoft assemblies are present as ngen'd files.

The shell extension also displays a "folder" called Download. You won't be surprised to learn that this is where assemblies downloaded from the Internet or intranets are placed. I say "folder" in quotes because (as we'll see soon) this isn't a real folder on the file system at all - it's rather a logical folder within the cache. In fact, the Download area maps to a user-specific location so that you don't see files downloaded by other users. If we examine this area, we'll find the information displayed is rather different from that for the shared assemblies:

click to expand

Instead of the public key token we see a field termed CodeBase. This is simply the URI from which the file was downloaded. This URI is important as it forms a crucial part of the evidence that the CLR's security system uses to assess how far the application should be trusted and therefore which permissions it should be granted, as we'll see in Chapter 12. The assemblies displayed here are the ones that I've installed and then downloaded and executed from the Internet - there are multiple versions of them because I had been testing and recompiling the assembly.

The Actual Assembly Cache Structure

While the Global Assembly Cache shell extension might be great for stopping people who don't know what they're doing from hacking into the cache, it's not so great if you want find out what the structure of the assembly cache is and how it works. If you want to do that, then you have a number of options:

  • Windows Explorer - you can disable the shell extension for the assembly folder. The shell extension is in a file, shfusion.dll which is located (in version 1.0 of the framework) in the CLR system folder (as noted earlier, in version 1.0 of the framework that's %windir%\Microsoft.NET\Framework\v1.0.3705). This file hosts a standard COM component, which means it can easily be unregistered by removing its COM-related entries from the Registry. To do this you'll need to navigate to this folder, and from the command prompt type regsvr32 -u shfusion.dll.

    When you've finished browsing the GAC, you should reinstall the shell extension by typing regsvr32 shfusion.dll.

  • Command Prompt - you can use the command prompt, which is unaffected by the shell extension, but obviously you don't get such a convenient user interface.

  • Custom Explorer - this is my favored technique. It's very simple to write a Windows Forms application with a simplified Windows Explorer-style user interface, specifically for the purpose of navigating around the assembly cache. An example of this type of application is included with the code download for this chapter. The example uses the System.IO classes to enumerate subfolders, and is therefore impervious to the Windows Explorer shell extension. I'm not going to present any of the code for this utility in the chapter, since it's the results rather than the code that is important here. But if you wish to use it, it's called GACExplore. The example gives less information than Windows Explorer, but it saves you from having to keep unregistering the shell extension.

  • You can open Windows Explorer directly within the assembly cache (though without the treeview pane), by clicking on the Start Menu, selecting Run, and typing in the path to the assembly cache - usually c:\Windows\assembly\gac.

If we do disable the shell extension, then this is what Windows Explorer shows us is in the assembly cache:

click to expand

We see that, besides a couple of temporary folders, the assembly folder contains two subfolders, which hold respectively public assemblies that have been installed to the GAC, and native images (both private and shared). Drilling down into the GAC reveals this kind of structure:

click to expand

This structure is not too hard to figure out and a lot less scary than you might have expected. Below the GAC folder is a folder for each assembly name - for example, the adodb.dll assembly is located somewhere under the folder ADODB. But there is a second folder level: under the ADODB folder is a folder whose name reflects the version number of the assembly and its public key. In other words, all we have is an elaborate arrangement to arrange assemblies according to their identity (there are no subfolders for culture in this screenshot only because all the assemblies here have neutral culture). The file structure of the Global Assembly Cache with the current version of .NET is nothing more than a simple hierarchy that allows assemblies with different versions and cultures to coexist side-by-side. And when you use gacutil.exe to place an assembly in the cache, gacutil.exe is not doing any magic behind the scenes - it's just reading the name, public key, version, and culture of the assembly, creating the appropriate folders and copying the file across.

There is one other file packaged with each assembly: all assemblies have an associated file called __AssemblyInfo__.ini. This is just a short file that conveniently stores information about a couple of the properties of the assembly, to make it slightly easier for the CLR to grab the information. The __AssemblyInfo__.ini for the adodb.dll file shown in the above screenshot looks like this:

 [AssemblyInfo] MVID=4b18ed8eb3823d41a6633f1d2232e919 DisplayName=ADODB, Version=7.0.3300.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a 

Although I've not shown the structure for assemblies in the native images cache, you can probably gather that the folder structure for these assemblies is quite similar.



Advanced  .NET Programming
Advanced .NET Programming
ISBN: 1861006292
EAN: 2147483647
Year: 2002
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net