5 General Syntax | The Common Language Infrastructure Annotated Standard (Microsoft. NET Development Series)

This section describes aspects of the ilasm syntax that are common to many parts of the grammar. The term "ASCII" refers to the American Standard Code for Information Interchange, a standard 7-bit code that was proposed by ANSI in 1963, and finalized in 1968. The ASCII repertoire of Unicode is the set of 128 Unicode characters from U+0000 to U+007F.

5.1 General Syntax Notation

This document uses a modified form of the BNF syntax notation. The following is a brief summary of this notation.

Bold items are terminals. Items placed in angle brackets (e.g., <int64>) are names of syntax classes and shall be replaced by actual instances of the class. Items placed in square brackets (e.g., [<float>]) are optional, and any item followed by * can appear zero or more times. The character "|" means that the items on either side of it are acceptable. The options are sorted in alphabetical order (to be more specific: in ASCII order, ignoring "<" for syntax classes, and case-insensitive). If a rule starts with an optional term, the optional term is not considered for sorting purposes.

ilasm is a case-sensitive language. All terminals shall be used with the same case as specified in this reference.

 Example (informative): A grammar such as <top> ::= <int32> | float <float> |           floats [<float> [, <float>]*] | else <QSTRING> would consider the following all to be legal:      12      float 3      float  4.3e7      floats      floats 2.4      floats 2.4, 3.7      else "Something \t weird" but all of the following to be illegal:      else 3      3, 4      float 4.3, 2.4      float else      stuff

5.2 Terminals

The basic syntax classes used in the grammar are used to describe syntactic constraints on the input intended to convey logical restrictions on the information encoded in the metadata.

The syntactic constraints described in this section are informative only. The semantic constraints (e.g., "shall be represented in 32 bits") are normative.

<int32> is either a decimal number or "0x" followed by a hexadecimal number, and shall be represented in 32 bits.

<int64> is either a decimal number or "0x" followed by a hexadecimal number, and shall be represented in 64 bits.

<hexbyte> is a 2-digit hexadecimal number that fits into one byte.

<realnumber> is any syntactic representation for a floating point number that is distinct from that for all other terminal nodes. In this document, a period (.) is used to separate the integer and fractional parts, and "e" or "E" separates the mantissa from the exponent. Either (but not both) may be omitted.

NOTE

A complete assembler may also provide syntax for infinities and NaNs.

<QSTRING> is a string surrounded by double quote (") marks. Within the quoted string the character "\" can be used as an escape character, with "\t" for a tab character, "\n" for a new line character, or followed by three octal digits in order to insert an arbitrary byte into the string. The "+" operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using "+" and a new string on each line. An alternative is using "\" as the last character in a line, in which case the line break is not entered into the generated string. Any white characters (space, line feed, carriage return, and tab) between the "\" and the first character on the next line are ignored. See also examples below.

NOTE

A complete assembler will need to deal with the full set of issues required to support Unicode encodings, see Partition I (especially CLS Rule 4, in section 8.5.1).

<SQSTRING> is similar to <QSTRING>, with the difference that it is surrounded by single quote (') marks instead of double quote marks.

<ID> is a contiguous string of characters which starts with either an alphabetic character or one of "_", "$", "@", or "?" and is followed by any number of alphanumeric characters or any of "_", "$", "@", or "?". An <ID> is used in only two ways:

As a label of a CIL instruction
As an <id> which can either be an <ID> or an <SQSTRING>, so that special characters can be included.

 Example (informative): The following examples show breaking of strings:     ldstr "Hello " + "World " +     "from CIL!" and     ldstr "Hello World\          \040from CIL!" become both "Hello World from CIL!".

5.3 Identifiers

Identifiers are used to name entities. Simple identifiers are just equivalent to an <ID>. However, the ilasm syntax allows the use of any identifier that can be formed using the Unicode character set (see Partition I, section 10.1). To achieve this, an identifier is placed within single quotation marks. This is summarized in the following grammar.

`<id> ::=`
	`<ID>`
		`\| <SQSTRING>`

Keywords may only be used as identifiers if they appear in single quotes (see Partition V for a list of all keywords).

Several <id>'s may be combined to form a larger <id>. The <id>'s are separated by a dot (.). An <id> formed in this way is called a <dottedname>.

`<dottedname> ::= <id> [. <id>]*`

RATIONALE

<dottedname> is provided for convenience, since "." can be included in an <id> using the <SQSTRING> syntax. <dottedname> is used in the grammar where "." is considered a common character (e.g., fully qualified type names).

ANNOTATION

Implementation-Specific (Microsoft): Names that end with $PST followed by a hexadecimal number have a special meaning. The assembler will automatically truncate the part starting with $PST, to support compiler-controlled accessibility, described in Partition I, section 8.5.3.2. Also the first release of the Microsoft CLR limits the length of identifiers; see Partition II, section 21 for details.

 Examples (informative): The following shows some simple identifiers:      A      Test      $Test      @Foo?      ?_X_ The following shows identifiers in single quotes:      'Weird Identifier'      'Odd\102Char'      'Embedded\nReturn' The following shows dotted names:      System.Console      A.B.C      'My Project'.'My Component'.'My Name'

5.4 Labels and Lists of Labels

Labels are provided as a programming convenience; they represent a number that is encoded in the metadata. The value represented by a label is typically an offset in bytes from the beginning of the current method, although the precise encoding differs depending on where in the logical metadata structure or CIL stream the label occurs. For details of how labels are encoded in the metadata, see Partition II, sections 21 24; for their encoding in CIL instructions, see Partition III.

A simple label is a special name that represents an address. Syntactically, a label is equivalent to an <id>. Thus, labels may be also single quoted and may contain Unicode characters.

A list of labels is comma separated, and can be any combination of these simple labels:

`<labeloroffset> ::= <id>`
`<labels> ::= <labeloroffset> [, <labeloroffset>]*`

RATIONALE

In a real assembler, the syntax for <labeloroffset> might allow the direct specification of a number rather than requiring symbolic labels.

ANNOTATION

Implementation-Specific (Microsoft): The following syntax is also supported, for round-tripping purposes:

 <labeloroffset> ::= <int32> | <label>

ilasm distinguishes between two kinds of labels: code labels and data labels. Code labels are followed by a colon (":") and represent the address of an instruction to be executed. Code labels appear before an instruction and they represent the address of the instruction that immediately follows the label. A particular code label name may not be declared more than once in a method.

In contrast to code labels, data labels specify the location of a piece of data and do not include the colon character. The data label may not be used as a code label, and a code label may not be used as a data label. A particular code label name may not be declared more than once in a module.

`<codeLabel> ::= <id> :`
`<dataLabel> ::= <id>`

Example (informative): The following defines a code label, ldstr_label, that represents the address of the ldstr

instruction: ldstr_label: ldstr "A label"

5.5 Lists of Hex Bytes

A list of bytes consists simply of one or more hex bytes. Hex bytes are pairs of characters 0 9, a f, and A F.

`<bytes> ::= <hexbyte> [<hexbyte>*]`

5.6 Floating Point Numbers

There are two different ways to specify a floating point number:

Use the dot (".") for the decimal point and "e" or "E" in front of the exponent. Both the decimal point and the exponent are optional.
Indicate that the floating point value is derived from an integer using the keyword float32 or float64 and indicating the integer in parentheses.

`<float64> ::=`
	`float32 (` `<int32>` `)`
`\|` `float64 (` `<int64>` `)`
`\| <realnumber>`

 Example (informative): 5.5 1.1e10 float64(128)        // note: this converts the integer 128 to its fp value

5.7 Source Line Information

The metadata does not encode information about the lexical scope of variables or the mapping from source line numbers to CIL instructions. Nonetheless, it is useful to specify an assembler syntax for providing this information for use in creating alternate encodings of the information.

ANNOTATION

See Partition I, section 9.7 for more information on attributes and modifiers.

Implementation-Specific (Microsoft): Source line information is stored in the PDB (Portable Debug) file associated with each module.

.line takes a line number, an optional column number (preceded by a colon), and a single-quoted string that specifies the name of the file the line number is referring to:

`<externSourceDecl> ::=` `.line` `<int32> [ : <int32> ] [<SQSTRING>]`

ANNOTATION

Implementation-Specific (Microsoft): For compatibility reasons, ilasm allows the following:

 <externSourceDecl> ::= ... | #line <int32> <QSTRING>

Notice that this requires the file name and that it shall be double quoted, not single quoted as with .line.

5.8 File Names

Some grammar elements require that a file name be supplied. A file name is like any other name where "." is considered a normal constituent character. The specific syntax for file names follows the specifications of the underlying operating system.

`<filename> ::=`		Section in Partition II
	`<dottedname>`	5.3

5.9 Attributes and Metadata

Attributes of types and their members attach descriptive information to their definition. The most common attributes are predefined and have a specific encoding in the metadata associated with them (see Partition II, section 22). In addition, the metadata provides a way of attaching user-defined attributes to metadata, using several different encodings.

From a syntactic point of view, there are several ways for specifying attributes in ilasm:

Using special syntax built into ilasm. For example the keyword private in a <classAttr> specifies that the visibility attribute on a type should be set to allow access only within the defining assembly.
Using a general-purpose syntax in ilasm. The non-terminal <customDecl> describes this grammar (see Partition II, section 20). For some attributes, called pseudo custom attributes, this grammar actually results in setting special encodings within the metadata (see Partition II, section 20.2.1).
Some attributes are required to be set based on the settings of other attributes or information within the metadata and are not visible from the syntax of ilasm at all. These attributes are called hidden attributes.
Security attributes are treated specially. There is special syntax in ilasm that allows the XML representing security attributes to be described directly (see Partition II, section 19). While all other attributes defined either in the standard library or by user-provided extension are encoded in the metadata using one common mechanism described in Partition II, section 21.10, security attributes (distinguished by the fact that they inherit, directly or indirectly from System.Security.Permissions. SecurityAttribute; see the .NET Framework Standard Library Annotated Reference) shall be encoded as described in Partition II, 21.11.

5.10 ilasm Source Files

An input to ilasm is a sequence of declarations, defined as follows:

`<ILFile> ::=`		Section in Partition II
	`<decl>*`	5.10

The complete grammar for a top-level declaration is shown below. The following sections will concentrate on the various parts of this grammar.

`<decl> ::=`		Section in Partition II
	`.assembly` `<dottedname>` `{` `<asmDecl>*` `}`	6.2
`\|` `.assembly extern` `<dottedname>` `{` `<asmRefDecl>*` `}`		6.3
`\|` `.class` `<classHead>` `{` `<classMember>*` `}`		9
`\|` `.class extern` `<exportAttr> <dottedname>` `{` `<externClassDecl>*` `}`		6.7
`\|` `.corflags` `<int32>`		6.2
`\|` `.custom` `<customDecl>`		20
`\|` `.data` `<datadecl>`		15.3.1
`\|` `.field` `<fieldDecl>`		15
\| .file [nometadata] <filename> [.hash = ( <bytes> )] [.entrypoint ]		6.2.3
\| .mresource [public \| private] <dottedname> [( <QSTRING> )] { <manResDecl>* }		6.2.2
`\|` `.method` `<methodHead>` `{` `<methodBodyItem>*` `}`		14
`\|` `.module` `[<filename>]`		6.4
`\|` `.module extern` `<filename>`		6.5
`\|` `.subsystem` `<int32>`		6.2
`\|` `.vtfixup` `<vtfixupDecl>`		14.5.1
`\| <externSourceDecl>`		5.7
`\| <securityDecl>`		19

ANNOTATION

Implementation-Specific (Microsoft): The grammar for declarations also includes the following. These are described in a separate product specification.

`<decl> ::=`
`.file alignment` `<int32>`
`\| .imagebase <int64>`
`\|` `.language` `<languageDecl>`
`\|` `.namespace` `<id>`
`\| ...`