The char Type

   


The char Type

The char type (see Table 6.2 of Chapter 6) is used to represent Unicode characters, each taking up 16 bits of memory. Unicode is an international, standardized character set enabling computers to represent the characters found in most human languages. For more information about Unicode, see Appendix E, "Unicode Character Set."

A char literal can be

  • A standard letter represented with single quotes, such as the lowercase 'a' or the uppercase 'E'.

    You could accordingly assign the character 'T' to a variable, as shown in the following:

     char myChar; myChar = 'T'; 
  • A single digit, such as '4'. Notice that to the C# compiler, a single digit character is not a number but just another character and so it cannot participate in any arithmetic calculations without prior conversions.

  • A special character, such as '@', '$' or '&'.

  • Represented in Unicode format by writing \u (backslash u) followed by the hexadecimal code of the character, as shown in Appendix E, see samspublishing.com. For example, if you look up the hexadecimal code for 'T' you will find 0x54. Thus 'T' can be represented as '\u0054'. Consequently, instead of the two previous source code lines, 'T' can be assigned to myChar by writing

     char myChar; myChar = '\u0054'; 
  • An escape character, which is written with a backslash followed by a character. '\n' is an example of an escape character. The sequence of characters forming an escape character (\and n in\n) is called an escape sequence.

We need to explore the escape character concept a little further to realize its usefulness. Suppose that you wanted to print a string containing quotation marks as part of the text. An example could be


graphics/07infig27.gif

Our usual method of printing to the console


graphics/07infig28.gif

is invalid and generates a compiler error. The problem is that the compiler views the second quotation mark as a termination point for the string and thus stops "reading" at this point. So the remaining text:

 To err is human, but to really stuff up requires a computer"" 

merely becomes incomprehensible garbage. The compiler follows very strict syntax when interpreting your string, and does not comprehend that the quotation mark is inserted to signify a quotation. Fortunately, you can make the computer understand your intentions by using the escape character \" (backslash double quote) as in the following:


graphics/07infig29.gif

Creating a valid statement.

Note

graphics/common.gif

The word escape in escape sequence originates from the escape characters typical ability to "escape" from the conventional connotation of a character.


The full list of escape sequences is displayed in Table 7.5. Each of the escape sequences has a special meaning. For example, \n will move the cursor to the next line when inside a string. You can also represent an escape sequence by its Unicode value displayed in the right column of the table.

Table 7.5. Escape Sequences
Escape sequence Meaning Unicode Value
\' Single quote \u0027
\" Double quote \u0022
\\ Backslash \u005C
\0 Null \u0000
\a Alert (Beep) \u0007
\b Backspace \u0008
\f Form feed \u000C
\n New line \u000A
\r Carriage return \u000D
\t Horizontal Tab \u0009
\v Vertical Tab \u000B

In the third row of Table 7.5, you will find the escape sequence \\. You might wonder about its purpose. Suppose that you needed to write the path C:\MyFiles\temp\BliposClock.cs of your Windows files system as a string literal. You might try to write the following:


graphics/07infig30.gif

However, the compiler would believe all the backslash characters were attempts to include escape characters in the string. In this case, it would see \M, \t, and \G, of which only \t is valid. This would cause an error message saying Invalid Escape Sequence Provided. By employing the \\escape sequence, you can ask the compiler to include a genuine backslash character inside the string. Thus, the correct way to write the file path is

 "D:\\MyFiles\\temp\\BliposClock.cs" 

In "The string Type" section later in this chapter, we will see an alternative way to handle this problem by using verbatim strings.

The Double Life of char

The char type lives a bit of a double life. It represents characters but is considered to be part of the integer family and its underlying value is an unsigned integer with the range 0 65536. It can take part in arithmetic calculations and implicitly be converted to an int, long, ushort, uint, and ulong, as shown previously in Figure 6.17 of Chapter 6. Note, though, that it is the underlying integer value that will be used for these calculations and conversions, not the digits, letters, and so on it is representing. Listing 7.6 demonstrates the double life of the two characters '4' and '9' with the underlying integer values 52 and 57. By adding these two characters together, we do not add together the numbers 4 and 9, but instead 52 and 57. The result, 109, is the underlying value for the character 'm'.

Listing 7.6 Code of CharacterArithmetic.cs
01: using System; 02: 03: /* 04:  *  This class demonstrates how char variables can be added 05:  *  together as if they were 'normal' integers. 06:  */ 07: 08: public class Characters 09: { 10:     public static void Main() 11:     { 12:         char firstSymbol; 13:         char secondSymbol; 14:         int intFirstSymbol; 15:         int intSecondSymbol; 16:         int result; 17: 18:         firstSymbol = '\u0034'; 19:         secondSymbol = '\u0039'; 20:         Console.WriteLine("firstSymbol as character: " + firstSymbol); 21:         Console.WriteLine("secondSymbol as character: " + secondSymbol); 22:         intFirstSymbol = firstSymbol; 23:         intSecondSymbol = secondSymbol; 24:         Console.WriteLine("firstSymbol as int: " + intFirstSymbol); 25:         Console.WriteLine("secondSymbol as int: " + intSecondSymbol); 26:         result = firstSymbol + secondSymbol; 27:         Console.WriteLine("Result as int: " + result); 28:         Console.WriteLine("Result as character: " + (char)result); 29:     } 30: } firstSymbol as character: 4 secondSymbol as character: 9 firstSymbol as int: 52 secondSymbol as int: 57 Result as int: 109 Result as character: m 

The literal '\u0034' in line 18 is the Unicode value for the character '4'. '\u0039' in line 19 represents the character '9'.

Line 20 prints the firstSymbol variable, which is being treated as a char variable and results in the output: '4'

In lines 22 and 23, firstSymbol and secondSymbol are implicitly converted to type int by assigning them to the int variables intFirstSymbol and intSecondSymbol.

intFirstSymbol and intSecondSymbol are printed in lines 24 and 25. The resulting output is the base 10 equivalents of 0x34 and 0x39, which are 52 and 57, respectively.

In line 26, firstSymbol and secondSymbol are implicitly converted to type int and added together. The result is assigned to the variable result. Note that the characters '4' and '9' are not converted to the numbers 4 and 9 but, instead, 52 is added to 57, resulting in 109 which is stored in result.

By adding the cast operator (char) in front of result in line 28, we perform an explicit conversion and tell the compiler to write result as a char type. In this case, the program converts 109 in base 10 to the equivalent Unicode character, which is 'm'.


   


C# Primer Plus
C Primer Plus (5th Edition)
ISBN: 0672326965
EAN: 2147483647
Year: 2000
Pages: 286
Authors: Stephen Prata

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net