What Is Metadata? | Inside Microsoft .NET IL Assembler

What Is Metadata?

Metadata is, by definition, data that describes data. Like any general definition, however, this one is hardly informative. In the context of the common language runtime, metadata means a system of descriptors of all items that are declared or referenced in a module. Because the common language runtime programming model is inherently object-oriented, the items represented in metadata are classes and their members, with their accompanying attributes, properties, and relationships.

From a pragmatic point of view, the role played by metadata is similar to that played by type libraries in the COM world. At this general level, however, the similarities end and the differences begin. Metadata, which describes the structural aspect of a module or an assembly in minute detail, is vastly richer than the data provided by type libraries, which carry only information regarding the COM interfaces exposed by the module. The important difference, of course, is that metadata is embedded in a managed module, which allows each managed module to carry a complete formal description of its logical structure.

Structurally, metadata is a normalized relational database. This means that metadata is organized as a set of cross-referencing rectangular tables—as opposed to, for example, a hierarchical database that has a tree structure. Each column of each row of a metadata table contains either data or a reference to a row of another table. Metadata does not contain any duplicate data fields; each category of data resides in only one table of the metadata database. If another table needs to employ the same data, it references the table that holds the data.

For example, as Chapter 1, “Simple Sample,” explained, a class definition carries certain binary attributes (flags). Because the behavior and features of member methods of this class are affected by the class’s flags, it would be tempting to duplicate some of the class attributes, including flags, in a metadata record describing one of the methods. But data duplication leads not only to increased database size but also to the problem of keeping all the duplications synchronized.

Instead, a method descriptor contains a reference to the descriptor of the method’s parent class. Such referencing does require resolving additional levels of indirection, which results in burning more processor cycles. But for massively distributed systems (and Microsoft .NET based applications obviously target such systems), processor speed is not the problem—communication bandwidth and data integrity are.

But what do you do if, for instance, you need to find all the methods a certain class implements? Browse the entire method descriptor table to find the methods referring to this class’s descriptor? No, that would be no fun at all. Instead, the class descriptor (record) carries a reference to the record of the method table that represents the first method of this class. The end of the method records belonging to this class is defined by the beginning of the next class’s method records or (for the last class) by the end of the method table.

Obviously, this technique requires that the records in the method table must be ordered by their parent class. The same applies to other table-to-table relationships (class-to-field, method-to-parameter, and so on). If this requirement is met, the metadata is referred to as optimized, or compressed. Figure 4-1 shows an example of such metadata. The ILAsm compiler always emits optimized metadata.

Figure 4-1 An example of optimized metadata.

It is possible, however—perhaps as a result of sloppy metadata emission—to have the child tables interleaved with regard to their parent classes. For example, class record A might be emitted first, followed by class record B, the method records of class B, and then the method records of class A; or the sequence might be class record A, then some of the method records of class A, followed by class record B, the method records of class B, and then the rest of the method records of class A.

In such a case, additional intermediate metadata tables are engaged, providing noninterleaved and ordered lookup tables. Instead of referencing the method records, class records reference the records of an intermediate table (a pointer table), which in turn reference the method records, as diagrammed in Figure 4-2. Metadata that uses such intermediate lookup tables is referred to as unoptimized, or uncompressed.


	Uncompressed metadata structure is characteristic of an “edit-and-continue” scenario, in which metadata and the IL code of a module are modified while the module is loaded in memory.

Figure 4-2 An example of unoptimized metadata.