Lexical Structure

When the C# compiler receives a piece of source code to compile, it is faced with the seemingly daunting task of deciphering a long list of characters (more specifically Unicode characters, presented in Appendix E, "Unicode Character Set") which can be found at www.samspublishing.com and turn them into matching MSIL with exactly the same meaning as the original source code. To make sense of this mass of source code, it must recognize the atomic elements of C# the unbreakable pieces making up the C# source code. Examples of atomic elements are a brace ({), a parenthesis ((), and keywords like class and if. The task performed by the compiler, associated with differentiating opening and closing braces, keywords, parentheses, and so on is called lexical analysis. Essentially, the lexical issues dealt with by the compiler pertain to how source code characters can be translated into tokens that are comprehensible to the compiler.

C# programs are a collection of identifiers, keywords, whitespace, comments, literals, operators, and separators. You have already met many of these C# elements. The following provides a structured overview of these and, whenever relevant, will introduce a few more aspects.

Identifiers and CaPitaLIzaTioN Styles

Identifiers are used to name classes, methods, and variables. We have already looked at the rules any identifier must follow to be valid and how well-chosen identifiers can enhance the clarity of source code and make it self-documenting. Now we will introduce another aspect related to identifiers namely CaPitaLIzaTioN sTyLe.

Often programmers choose identifiers that are made up of several words to increase the clarity and the self-documentation of the source code. For example, the words could be child-births-per-year. However, because of the compiler's sensitivity to whitespace, any identifier broken up into words by means of whitespace will be misinterpreted. For example, a variable to represent the average speed per hour cannot be named average speed per hour. We need to discard whitespace to form one proper token, while maintaining a style allowing the reader of the source code to distinguish the individual words in the identifier. Some computer languages have agreed on the convention average_speed_per_hour. In C#, however, most programmers utilizes an agreed-upon sequence of upper- and lowercase characters to distinguish between individual words in an identifier.

A couple of important capitalization styles are applied in the C# world:

Pascal casing The first letter of each word in the name is capitalized, as in AverageSpeedPerHour.
Camel casing Same as Pascal casing, with the exception of the first word of the identifier that is lowercase, as in averageSpeedPerHour.

Pascal casing is recommended when naming classes and methods, whereas Camel casing is used for variables.

Tip

Not all computer languages are case sensitive. In these languages, AVERAGE and average are identical to the compiler. For compatibility with these languages, you should avoid using case as the distinguishing factor between public identifiers accessible from other languages.

Literals

Consider the following two lines of source code:

 int number; number = 10;

number is clearly a variable. In the first line, we declare number to be of type int. In the second line, we assign 10 to number. But what is 10? Well, 10 is incapable of changing its value and is named a literal. Literals are not confined to numbers. They can also be characters, such as B, $, and z or text, such as "This is a literal." Literals can be stored by any variable with a type compatible with that of the literal.

Comments and Source Code Documentation

The main characteristic of comments is the compiler's ability to totally ignore them. We have so far seen two ways of making comments single line with // and multi-line using /* */.

In fact, there is a third type that allows you to write the documentation as part of the source code as shown in this chapter, but with the added ability of extracting this documentation into separate Extensible Markup Language (XML) files. For now, you can appreciate a particular useful end result of this feature; you just need to take a look at the .NET Framework class library documentation, which was created by extracting XML files from the comments/documentation sitting inside the original source code.

Separators

Separators are used to separate various elements in C# from each other. You have already met many of them. An example is the commonly used semicolon ; that is required to terminate a statement. Table 5.1 summarizes the separators we have presented to so far.

Table 5.1. Important Separators in C#
Name	Symbol	Purpose
Braces	`{ }`	Used to confine a block of code for classes, methods, and the, yet to be presented, branching and looping statements.
Parentheses	`( )`	Contains lists of formal parameters in method headers and lists of arguments in method invocation statements. Also required to contain the Boolean expression of an `if` statement and other, yet to be presented, branching and looping statements.
Semicolon	`;`	Terminates a statement.
Comma	`,`	Separates formal parameters inside the parentheses of a method header and separates arguments in a method invocation statement.
Period (dot operator)	`.`	Used to reference namespaces contained inside other namespaces and to specify classes inside namespaces and methods (if accessible) inside classes and objects. It can also be used to specify instance variables inside classes and objects (if accessible), but this practice should be avoided.

Operators

Operators are represented by symbols such as + , = , ==, and *. Operators act on operands, which are found next to the operator. For example

 SumTotal + 10

contains the + operator surrounded by the two operands sumTotal and 10. In this context, the + operator combines two operands to produce a result and so it is classified as a binary operator. Some operators act on only one operand; they are termed unary operators.

Operators, together with their operands, form expressions. A literal or a variable by itself is also an expression, as are combinations of literals and variables with operators. Consequently, expressions can be used as operands as long as the rules, which apply to each individual operator, are adhered to, as shown in the following example:

a, 5, and 10, d are all expressions acted on by the + operator. However, (a + 5) and (10 + d) are also expressions acted on by the * operator. Finally, (a + 5) * (10 + d) can be regarded as one expression. The assignment operator = acts on this latter expression and the expression mySum. Expressions are often nested inside each other to form hierarchies of expressions, as in the previous example.

Operators can be divided into the following categories assignment, arithmetic, unary, equality, relational, logical, conditional, shift, and primary operators.

We will spend more time on operators in later chapters, but the following is a quick summary of the operators you have encountered so far:

Assignment operator (=) Causes the operand on its left side to have its value changed to the expression on the right side of the assignment operator as in
```
 29:         sumTotal = a + b; 
```
where a + b can be regarded as being one operand.
Binary arithmetic operators (+ and *) The following example
```
 a * b 
```
multiplies a and b without changing their values.
Concatenation operator (+) Concatenates two strings into one string.
Equality operator (==) Compares two expressions to test whether they are equal. For example,
```
 leftExpression == rightExpression 
```
will only be true if the two expressions are equal; otherwise, it is false.

Keywords

Appendix C at www.samspublishing.com lists all 77 different keywords of C#. We have so far met the keywords if, class, public, static, void, string, int, and return. The syntax (language rules) of the operators and separators combined with the keywords form the definition of the C# language.