9 Metadata | The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)

This section and its subsections contain only informative text, with the exception of the CLS rules introduced here and repeated in , Partition I, section 11. The metadata format is specified in Partition II.

New types value types and reference types are introduced into the CTS via type declarations expressed in metadata. In addition, metadata is a structured way to represent all information that the CLI uses to locate and load classes, lay out instances in memory, resolve method invocations, translate CIL to native code, enforce security, and set up runtime context boundaries. Every CLI PE/COFF module (see Partition II) carries a compact metadata binary that is emitted into the module by the CLI-enabled development tool or compiler.

Each CLI-enabled language will expose a language-appropriate syntax for declaring types and members and for annotating them with attributes that express which services they require of the infrastructure. Type imports are also handled in a language-appropriate way, and it is the development tool or compiler that consumes the metadata to expose the types that the developer sees.

Note that the typical component or application developer will not need to be aware of the rules for emitting and consuming CLI metadata. While it may help a developer to understand the structure of metadata, the rules outlined in this section are primarily of interest to tool builders and compiler writers.

9.1 Components and Assemblies

Each CLI component carries the metadata for declarations, implementations, and references specific to that component. Therefore, the component-specific metadata is referred to as component metadata, and the resulting component is said to be self-describing. In object models such as COM or CORBA, this information is represented by a combination of typelibs, IDL files, DLLRegisterServer, and a myriad of custom files in disparate formats and separate from the actual executable file. In contrast, the metadata is a fundamental part of a CLI component.

Collections of CLI components and other files are packaged together for deployment into assemblies, discussed in more detail in a later section [Partition II, section 6]. An assembly is a logical unit of functionality that serves as the primary unit of reuse in the CLI. Assemblies establish a name scope for types.

Types declared and implemented in individual components are exported for use by other implementations via the assembly in which the component participates. All references to a type are scoped by the identity of the assembly in whose context the type is being used. The CLI provides services to locate a referenced assembly and request resolution of the type reference. It is this mechanism that provides an isolation scope for applications: the assembly alone controls its composition.

9.2 Accessing Metadata

Metadata is emitted into and read from a CLI module using either direct access to the file format as described in Partition II or through the Reflection Library. It is possible to create a tool that verifies a CLI module, including the metadata, during development, based on the specifications supplied in Partition III and Partition II.

When a class is loaded at runtime, the CLI loader imports the metadata into its own in-memory data structures, which can be browsed via the CLI Reflection services. The Reflection services should be considered as similar to a compiler; they automatically walk the inheritance hierarchy to obtain information about inherited methods and fields, they have rules about hiding by name or name-and-signature, rules about inheritance of methods and properties, and so forth.

ANNOTATION

Reflection is mentioned in many places in this part of the standard but is not discussed at length. That is because it is a set of services implemented in the Base Class Library (see the .NET Framework Standard Library Annotated Reference), rather than a part of the CTS or VES, which is the focus of this book, and of the standard, except for the XML description of the Base Class Library in Partition IV. The importance of Reflection, however, as described in the preceding paragraph, cannot be overstated.

9.2.1 Metadata Tokens

A metadata token is an implementation-dependent encoding mechanism. Partition II describes the manner in which metadata tokens are embedded in various sections of a CLI PE/COFF module. Metadata tokens are embedded in CIL and native code to encode method invocations and field accesses at call sites; the token is used by various infrastructure services to retrieve information from metadata about the reference and the type on which it was scoped in order to resolve the reference.

A metadata token is a typed identifier of a metadata object (type declaration, member declaration, etc.). Given a token, its type can be determined and it is possible to retrieve the specific metadata attributes for that metadata object. However, a metadata token is not a persistent identifier. Rather it is scoped to a specific metadata binary. A metadata token is represented as an index into a metadata data structure, so access is fast and direct.

9.2.2 Member Signatures in Metadata

Every location including fields, parameters, method return values, and properties has a type, and a specification for its type is carried in metadata.

A value type describes values that are represented as a sequence of bits. A reference type describes values that are represented as the location of a sequence of bits. The CLI provides an explicit set of built-in types, each of which has a default runtime form as either a value type or a reference type. The metadata APIs may be used to declare additional types, and part of the type specification of a variable encodes the identity of the type as well as which form (value or reference) the type is to take at runtime.

Metadata tokens representing encoded types are passed to CIL instructions that accept a type (newobj, newarray, ldtoken). See the CIL instruction set specification in Partition III.

These encoded type metadata tokens are also embedded in member signatures. To optimize runtime binding of field accesses and method invocations, the type and location signatures associated with fields and methods are encoded into member signatures in metadata. A member signature embodies all of the contract information that is used to decide whether a reference to a member succeeds or fails.

9.3 Unmanaged Code

It is possible to pass data from CLI managed code to unmanaged code. This always involves a transition from managed to unmanaged code, which has some runtime cost, but data can often be transferred without copying. When data must be reformatted the VES provides a reasonable specification of default behavior, but it is possible to use metadata to explicitly require other forms of marshalling (i.e., reformatted copying). The metadata also allows access to unmanaged methods through implementation-specific pre-existing mechanisms.

ANNOTATION

One of the major design features of the CLI is to make it possible to describe pre-existing native code data structures, and enable calling to and from native code. There is very little description of this in the standard because the standard focuses on producing and running managed code.

However, an important part of the standard is making provision for dealing with unmanaged code. Metadata is one of those places. Metadata can describe data structures for native code, and methods that are implemented in native code. For example, metadata can describe what appears to be a managed method, the actual implementation of which is unmanaged. The marshalling information is carried in the metadata, to tell you how to marshal to and from that method. The mechanism used to call unmanaged code is PInvoke (Partition II, section 14.5.2).

Although native interoperation services are part of the standard, operating systems are likely to implement extensions that allow CLI implementations access to platform-specific data types. For example, Microsoft has in its Common Language Runtime implementation of the CLI a number of extensions for a pre-existing wider set of data types, and a full COM interoperation implementation.

9.4 Method Implementation Metadata

For each method for which an implementation is supplied in the current CLI module, the tool or compiler will emit information used by the CIL-to-native-code compilers, the CLI loader, and other infrastructure services. This information includes:

Whether the code is managed or unmanaged.
Whether the implementation is in native code or CIL (note that all CIL code is managed).
The location of the method body in the current module, as an address relative to the start of the module file in which it is located (a Relative Virtual Address, or RVA). Or, alternatively, the RVA is encoded as 0 and other metadata is used to tell the infrastructure where the method implementation will be found, including:
- An implementation to be located via the CLI Interoperability Services.
- Forwarding calls through an imported global static method.

ANNOTATION

For more information on the RVA, see Chapter 4 of this book, and Partition II, section 21.

9.5 Class Layout

In the general case, the CLI loader is free to lay out the instances of a class in any way it chooses, consistent with the rules of the CTS. However, there are times when a tool or compiler needs more control over the layout. In the metadata, a class is marked with an attribute indicating whether its layout rule is:

autolayout: A class marked "autolayout" indicates that the loader is free to lay out the class in any way it sees fit; any layout information that may have been specified is ignored. This is the default.
layoutsequential: A class marked "layoutsequential" guides the loader to preserve field order as emitted, but otherwise the specific offsets are calculated based on the CLI type of the field; these may be shifted by explicit offset, padding, and/or alignment information.
explicitlayout: A class marked "explicitlayout" causes the loader to ignore field sequence and to use the explicit layout rules provided, in the form of field offsets and/or overall class size or alignment. There are restrictions on legal layouts, specified in Partition II.

It is also possible to specify an overall size for a class. This enables a tool or compiler to emit a value type specification where only the size of the type is supplied. This is useful in declaring CLI built-in types (such as 32-bit integer). It is also useful in situations where the data type of a member of a structured value type does not have a representation in CLI metadata (e.g., C++ bit fields). In the latter case, as long as the tool or compiler controls the layout, and CLI doesn't need to know the details or play a role in the layout, this is sufficient. Note that this means that the VES can move bits around but can't marshal across machines the emitting tool or compiler will need to handle the marshalling.

Optionally, a developer may specify a packing size for a class. This is layout information that is not often used, but it allows a developer to control the alignment of the fields. It is not an alignment specification, per se, but rather serves as a modifier that places a ceiling on all alignments. Typical values are 1, 2, 4, 8, or 16.

For the full specification of class layout attributes, see the classes in System.Runtime.InteropServices in the .NET Framework Standard Library Annotated Reference.

ANNOTATION

The published standard refers to descriptions of the standardized framework as a reference to Partition IV. This is because the International Standard contains the complete standardized descriptions of the framework in XML as a part of Partition IV. This book refers to the .NET Framework Standard Library Annotated Reference in these cases.

9.6 Assemblies: Name Scopes for Types

An assembly is a collection of resources that are built to work together to deliver a cohesive set of functionality. An assembly carries all of the rules necessary to ensure that cohesion. It is the unit of access to resources in the CLI.

Externally, an assembly is a collection of exported resources, including types. Resources are exported by name. Internally, an assembly is a collection of public (exported) and private (internal to the assembly) resources. It is the assembly that determines which resources are to be exposed outside of the assembly and which resources are accessible only within the current assembly scope. It is the assembly that controls how a reference to a resource, public or private, is mapped onto the bits that implement the resource. For types in particular, the assembly may also supply runtime configuration information. A CLI module can be thought of as a packaging of type declarations and implementations, where the packaging decisions may change under the covers without affecting clients of the assembly.

The identity of a type is its assembly scope and its declared name. A type defined identically in two different assemblies is considered two different types.

ANNOTATION

The previous paragraph is accurate, but only for defined types. The identity of types that are created by the VES from a reference, such as arrays and pointers, is tied to the assembly in which the underlying type (array of <type>, or pointer to <type>) is defined, not the assembly in which they are referenced.

Although giving every type a unique identity in itself would be very cumbersome, we can get close, by tying the type's identity to the identity of the assembly. The assembly's identity consists of the name of the assembly, the public key used to sign the assembly, the version number of the assembly, and the culture for which that assembly was specialized. That is generally enough to uniquely identify the assembly. On agreement that the assembly is uniquely identified, we have uniquely identified the types within it because the CLI specifies that there cannot be two types of the same name within one assembly. For example, a subtype cannot have the same name as its parent type within the same assembly.

Assembly Dependencies: An assembly may depend on other assemblies. This happens when implementations in the scope of one assembly reference resources that are scoped in or owned by another assembly.

All references to other assemblies are resolved under the control of the current assembly scope. This gives an assembly an opportunity to control how a reference to another assembly is mapped onto a particular version (or other characteristic) of that referenced assembly (although that target assembly has sole control over how the referenced resource is resolved to an implementation).
It is always possible to determine which assembly scope a particular implementation is running in. All requests originating from that assembly scope are resolved relative to that scope.

From a deployment perspective, an assembly may be deployed by itself, with the assumption that any other referenced assemblies will be available in the deployed environment. Or, it may be deployed with its dependent assemblies.

Manifests: Every assembly has a manifest that declares what files make up the assembly, what types are exported, and what other assemblies are required to resolve type references within the assembly. Just as CLI components are self-describing via metadata in the CLI component, so are assemblies self-describing via their manifests. When a single file makes up an assembly, it contains both the metadata describing the types defined in the assembly and the metadata describing the assembly itself. When an assembly contains more than one file with metadata, each of the files describes the types defined in the file, if any, and one of these files also contains the metadata describing the assembly (including the names of the other files, their cryptographic hashes, and the types they export outside of the assembly).

Applications: Assemblies introduce isolation semantics for applications. An application is simply an assembly that has an external entry point that triggers (or causes a hosting environment such as a browser to trigger) the creation of a new Application Domain. This entry point is effectively the root of a tree of request invocations and resolutions. Some applications are a single, self-contained assembly. Others require the availability of other assemblies to provide needed resources. In either case, when a request is resolved to a module to load, the module is loaded into the same Application Domain from which the request originated. It is possible to monitor or stop an application via the Application Domain.

References: A reference to a type always qualifies a type name with the assembly scope within which the reference is to be resolved; that is, an assembly establishes the name scope of available resources. However, rather than establishing relationships between individual modules and referenced assemblies, every reference is resolved through the current assembly. This allows each assembly to have absolute control over how references are resolved. See Partition II.

9.7 Metadata Extensibility

CLI metadata is extensible. There are three reasons this is important:

The Common Language Specification (CLS) is a specification for conventions that languages and tools agree to support in a uniform way for better language integration. The CLS constrains parts of the CTS model, and the CLS introduces higher-level abstractions that are layered over the CTS. It is important that the metadata be able to capture these sorts of development-time abstractions that are used by tools even though they are not recognized or supported explicitly by the CLI.
It should be possible to represent language-specific abstractions in metadata that are neither CLI nor CLS language abstractions. For example, it should be possible, over time, to enable languages like C++ to not require separate header files or IDL files in order to use types, methods, and data members exported by compiled modules.
It should be possible, in member signatures, to encode types and type modifiers that are used in language-specific overloading for example, to allow C++ to distinguish int from long even on 32-bit machines where both map to the underlying type int32.

This extensibility comes in the following forms:

Every metadata object can carry custom attributes, and the metadata APIs provide a way to declare, enumerate, and retrieve custom attributes. Custom attributes may be identified by a simple name, where the value encoding is opaque and known only to the specific tool, language, or service that defined it. Or, custom attributes may be identified by a type reference, where the structure of the attribute is self-describing (via data members declared on the type) and any tool including the CLI Reflection services may browse the value encoding.

CLS Rule 34: The CLS only allows a subset of the encodings of custom attributes. The only types that shall appear in these encodings are (see the .NET Framework Standard Library Annotated Reference): System.Type, System.String, System.Char, System.Boolean, System.Byte, System.Int16, System.Int32, System.Int64, System.Single, System.Double, and any enumeration type based on a CLS-compliant base integer type.

NOTE

CLS (consumer): Shall be able to read attributes encoded using the restricted scheme.

CLS (extender): Must meet all requirements for CLS consumer and be able to author new classes and new attributes. Shall be able to attach attributes based on existing attribute classes to any metadata that is emitted. Shall implement the rules for the System.AttributeUsageAttribute (see the .NET Framework Standard Library Annotated Reference).

CLS (framework): Shall externally expose only attributes that are encoded within the CLS rules and following the conventions specified for System.AttributeUsageAttribute.
In addition to CTS type extensibility, it is possible to emit custom modifiers into member signatures (see Types in Partition II, section 7.1). The CLI will honor these modifiers for purposes of method overloading and hiding, as well as for binding, but will not enforce any of the language-specific semantics. These modifiers can reference the return type or any parameter of a method, or the type of a field. They come in two kinds: required modifiers that anyone using the member must understand in order to correctly use it, and optional modifiers that may be ignored if the modifier is not understood.

CLS Rule 35: The CLS does not allow publicly visible required modifiers (modreq; see Partition II, section 7.1.1), but does allow optional modifiers (modopt; see Partition II, section 7.1.1) they do not understand.

NOTE

CLS (consumer): Shall be able to read metadata containing optional modifiers and correctly copy signatures that include them. May ignore these modifiers in type matching and overload resolution. May ignore types that become ambiguous when the optional modifiers are ignored, or that use required modifiers.

CLS (extender): Shall be able to author overrides for inherited methods with signatures that include optional modifiers. Consequently, an extender must be able to copy such modifiers from metadata that it imports. There is no requirement to support required modifiers, nor to author new methods that have any kind of modifier in their signature.

CLS (framework): Shall not use required modifiers in externally visible signatures unless they are marked as not CLS-compliant. Shall not expose two members on a class that differ only by the use of optional modifiers in their signature unless only one is marked CLS-compliant.

ANNOTATION

It would be good to briefly describe the difference between attributes and modifiers. Metadata is partly represented as a series of tables (in the database sense), that describe different parts of the program defined types, referenced types, and type members. Custom attributes can be attached to any of these. Custom attributes have an encoding that lets the programmer define an object, and the custom attribute looks to the user like a call to a constructor for that object.

Another part of the metadata is an area concerned with space-efficient encoding of a lot of data in a few bytes. These are called "signatures." Signatures are used to describe the types of arguments or the types of fields. You might want to put attributes on the information in signatures as well, but there is not enough space. Instead, there is a special compacted form called "modifier" (CLI-specified modifiers are called constraints). Modifiers have a much simpler structure than attributes, and they are not objects. Usually a modifier is just a type name.

For more information on custom attributes, see Partition II, section 20 and its subsections.

9.8 Globals, Imports, and Exports

The CTS does not have the notion of global statics: all statics are associated with a particular class. Nonetheless, the metadata is designed to support languages that rely on static data that is stored directly in a PE/COFF file and accessed by its relative virtual address. In addition, while access to managed data and managed functions is mediated entirely through the metadata itself, the metadata provides a mechanism for accessing unmanaged data and unmanaged code.

ANNOTATION

Even though the CTS does not have the notion of global statics, the CLI supports languages that support global statics by creating a special class named <module>, into which it puts what are defined in a language as global static fields and global static methods. There are special rules for how that module is treated, described in Partition II, section 9.8.

CLS Rule 36: Global static fields and methods are not CLS-compliant.

NOTE

CLS (consumer): Need not support global static fields or methods.

CLS (extender): Need not author global static fields or methods.

CLS (framework): Shall not define global static fields or methods.

9.9 Scoped Statics

The CTS does not include a model for file- or function-scoped static functions or data members. However, there are times when a compiler needs a metadata token to emit into CIL for a scoped function or data member. The metadata allows members to be marked so that they are never visible/accessible outside of the PE/COFF file in which they are declared and for which the compiler guarantees to enforce all access rules.

ANNOTATION

The accessibility referred to in the final sentence of the previous paragraph is compiler-controlled. For more information, see Partition I, section 8.5.3.2.

End informative text