This section describes aspects of the ilasm syntax that are common to many parts of the grammar. The term "ASCII" refers to the American Standard Code for Information Interchange, a standard 7-bit code that was proposed by ANSI in 1963, and finalized in 1968. The ASCII repertoire of Unicode is the set of 128 Unicode characters from U+0000 to U+007F. 5.1 General Syntax NotationThis document uses a modified form of the BNF syntax notation. The following is a brief summary of this notation. Bold items are terminals. Items placed in angle brackets (e.g., <int64>) are names of syntax classes and shall be replaced by actual instances of the class. Items placed in square brackets (e.g., [<float>]) are optional, and any item followed by * can appear zero or more times. The character "|" means that the items on either side of it are acceptable. The options are sorted in alphabetical order (to be more specific: in ASCII order, ignoring "<" for syntax classes, and case-insensitive). If a rule starts with an optional term, the optional term is not considered for sorting purposes. ilasm is a case-sensitive language. All terminals shall be used with the same case as specified in this reference. Example (informative): A grammar such as <top> ::= <int32> | float <float> | floats [<float> [, <float>]*] | else <QSTRING> would consider the following all to be legal: 12 float 3 float 4.3e7 floats floats 2.4 floats 2.4, 3.7 else "Something \t weird" but all of the following to be illegal: else 3 3, 4 float 4.3, 2.4 float else stuff 5.2 TerminalsThe basic syntax classes used in the grammar are used to describe syntactic constraints on the input intended to convey logical restrictions on the information encoded in the metadata.
<int32> is either a decimal number or "0x" followed by a hexadecimal number, and shall be represented in 32 bits. <int64> is either a decimal number or "0x" followed by a hexadecimal number, and shall be represented in 64 bits. <hexbyte> is a 2-digit hexadecimal number that fits into one byte. <realnumber> is any syntactic representation for a floating point number that is distinct from that for all other terminal nodes. In this document, a period (.) is used to separate the integer and fractional parts, and "e" or "E" separates the mantissa from the exponent. Either (but not both) may be omitted. NOTE A complete assembler may also provide syntax for infinities and NaNs. <QSTRING> is a string surrounded by double quote (") marks. Within the quoted string the character "\" can be used as an escape character, with "\t" for a tab character, "\n" for a new line character, or followed by three octal digits in order to insert an arbitrary byte into the string. The "+" operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using "+" and a new string on each line. An alternative is using "\" as the last character in a line, in which case the line break is not entered into the generated string. Any white characters (space, line feed, carriage return, and tab) between the "\" and the first character on the next line are ignored. See also examples below. NOTE A complete assembler will need to deal with the full set of issues required to support Unicode encodings, see Partition I (especially CLS Rule 4, in section 8.5.1). <SQSTRING> is similar to <QSTRING>, with the difference that it is surrounded by single quote (') marks instead of double quote marks. <ID> is a contiguous string of characters which starts with either an alphabetic character or one of "_", "$", "@", or "?" and is followed by any number of alphanumeric characters or any of "_", "$", "@", or "?". An <ID> is used in only two ways:
Example (informative): The following examples show breaking of strings: ldstr "Hello " + "World " + "from CIL!" and ldstr "Hello World\ \040from CIL!" become both "Hello World from CIL!". 5.3 IdentifiersIdentifiers are used to name entities. Simple identifiers are just equivalent to an <ID>. However, the ilasm syntax allows the use of any identifier that can be formed using the Unicode character set (see Partition I, section 10.1). To achieve this, an identifier is placed within single quotation marks. This is summarized in the following grammar.
Keywords may only be used as identifiers if they appear in single quotes (see Partition V for a list of all keywords). Several <id>'s may be combined to form a larger <id>. The <id>'s are separated by a dot (.). An <id> formed in this way is called a <dottedname>.
RATIONALE <dottedname> is provided for convenience, since "." can be included in an <id> using the <SQSTRING> syntax. <dottedname> is used in the grammar where "." is considered a common character (e.g., fully qualified type names).
Examples (informative): The following shows some simple identifiers: A Test $Test @Foo? ?_X_ The following shows identifiers in single quotes: 'Weird Identifier' 'Odd\102Char' 'Embedded\nReturn' The following shows dotted names: System.Console A.B.C 'My Project'.'My Component'.'My Name' 5.4 Labels and Lists of LabelsLabels are provided as a programming convenience; they represent a number that is encoded in the metadata. The value represented by a label is typically an offset in bytes from the beginning of the current method, although the precise encoding differs depending on where in the logical metadata structure or CIL stream the label occurs. For details of how labels are encoded in the metadata, see Partition II, sections 21 24; for their encoding in CIL instructions, see Partition III. A simple label is a special name that represents an address. Syntactically, a label is equivalent to an <id>. Thus, labels may be also single quoted and may contain Unicode characters. A list of labels is comma separated, and can be any combination of these simple labels:
RATIONALE In a real assembler, the syntax for <labeloroffset> might allow the direct specification of a number rather than requiring symbolic labels.
ilasm distinguishes between two kinds of labels: code labels and data labels. Code labels are followed by a colon (":") and represent the address of an instruction to be executed. Code labels appear before an instruction and they represent the address of the instruction that immediately follows the label. A particular code label name may not be declared more than once in a method. In contrast to code labels, data labels specify the location of a piece of data and do not include the colon character. The data label may not be used as a code label, and a code label may not be used as a data label. A particular code label name may not be declared more than once in a module.
Example (informative): The following defines a code label, ldstr_label, that represents the address of the ldstr instruction: ldstr_label: ldstr "A label" 5.5 Lists of Hex BytesA list of bytes consists simply of one or more hex bytes. Hex bytes are pairs of characters 0 9, a f, and A F.
5.6 Floating Point NumbersThere are two different ways to specify a floating point number:
Example (informative): 5.5 1.1e10 float64(128) // note: this converts the integer 128 to its fp value 5.7 Source Line InformationThe metadata does not encode information about the lexical scope of variables or the mapping from source line numbers to CIL instructions. Nonetheless, it is useful to specify an assembler syntax for providing this information for use in creating alternate encodings of the information.
.line takes a line number, an optional column number (preceded by a colon), and a single-quoted string that specifies the name of the file the line number is referring to:
5.8 File NamesSome grammar elements require that a file name be supplied. A file name is like any other name where "." is considered a normal constituent character. The specific syntax for file names follows the specifications of the underlying operating system.
5.9 Attributes and MetadataAttributes of types and their members attach descriptive information to their definition. The most common attributes are predefined and have a specific encoding in the metadata associated with them (see Partition II, section 22). In addition, the metadata provides a way of attaching user-defined attributes to metadata, using several different encodings. From a syntactic point of view, there are several ways for specifying attributes in ilasm:
5.10 ilasm Source FilesAn input to ilasm is a sequence of declarations, defined as follows:
The complete grammar for a top-level declaration is shown below. The following sections will concentrate on the various parts of this grammar.
|