Strings

for RuBoard

Characters and strings are very important data types in practical programming. C# provides a string type, which is an alias for the String class in the System namespace. As a class type, string is a reference type. Much string functionality, available in all .NET languages, is provided by the String class. The C# compiler provides additional support to make working with strings more concise and intuitive. In this section we will first look at characters and then outline the main features of the String class. We will look at string input, at the additional support provided by C#, and at the issues of string equality. The section that follows surveys some of the useful methods of the String class. The section after that discusses the StringBuilder class.

Characters

C# provides the primitive data type char to represent individual characters. A character literal is represented by a character enclosed in single quotes.

 char ch1 = 'a'; 

A C# char is represented internally as an unsigned two-byte integer. You can cast back and forth between char and integer data types.

 char ch1 = 'a';  int n = (int) ch1;  n++;  ch1 = (char) n;          // ch1 is now 'b' 

The relational operators == , < , > , and so on apply to char .

 char ch1 = 'a';  char ch2 = 'b'  if (ch1 < ch2)           // expression is true        ... 
ASCII and Unicode

Traditionally, a one-byte character code called ASCII has been used to represent characters. ASCII code is simple and compact. But ASCII cannot be employed to represent many different alphabets used throughout the world.

Modern computer systems prefer to use a two-byte character code called Unicode. Most modern (and many ancient) alphabets can be represented by Unicode characters. ASCII is a subset of Unicode, corresponding to the first 255 Unicode character codes. For more information on Unicode, you can visit the Web site www.unicode.org. C# uses Unicode to represent characters.

Escape Sequences

You can represent any Unicode character in a C# program by using the special escape sequence beginning with \u followed by hexadecimal digits.

 char A = '\u0041'; // 41 (hex) is 65 (dec) or 'A' 

Special escape sequences are provided for a number of standard non-printing characters and for characters like quotation marks that would be difficult to represent otherwise . Table 3-4 shows the standard escape sequences in C#.

Table 3-4. Escape Characters in C#
 

Escape Character

Name Value

\'

Single quote

0x0027

\"

Double quote

0x0022

\\

Backslash

0x005C

\0

Null

0x0000

\a

Alert

0x0007

\b

Backspace

0x0008

\f

Form feed

0x000C

\n

New line

0x000A

\r

Carriage return

0x000D

\t

Horizontal tab

0x0009

\v

Vertical tab

0x000B

String Class

The String class inherits directly from Object and is a sealed class, which means that you cannot further inherit from String . We will discuss inheritance and sealed classes in Chapter 4. When a class is sealed, the compiler can perform certain optimizations to make methods in the class more efficient.

Instances of String are immutable, which means that once a string object is created, it cannot be changed during its lifetime. Operations that appear to modify a string actually return a new string object. If, for the sake of efficiency, you need to modify a stringlike object directly, you can make use of the StringBuilder class, which we will discuss in a later section.

A string has a zero-based index, which can be used to access individual characters in a string. That means that the first character of the string str is str[0] , the second character is str[1] , and so on.

By default, comparison operations on strings are case-sensitive, although there is an overloaded version of the Compare method that permits case-insensitive comparisons.

The empty string should be distinguished from null . If a string has not been assigned, it will be a null reference. Any string, including the empty string, compares greater than a null reference. Two null references compare equal to each other.

Language Support

The C# language provides a number of features to make working with strings easier and more intuitive.

String Literals and Initialization

You can define a string literal by enclosing a string of characters in double quotes. Special characters can be represented using an escape sequence, as discussed earlier in the chapter. You may also define a "verbatim" string literal using the @ symbol. In a verbatim string, escape sequences are not converted but are used exactly as they appear. If you want to represent a double quote inside a verbatim string, use two double quotes.

The proper way to initialize a string variable with a literal value is to supply the literal after an equals sign. You do not need to use new as you do with other data types. Here are some examples of string literals and initializing string variables .

 string s1 = "bat";  string path1 = "c:\OI\NetCs\Chap3\Concat";  string path = @"c:\OI\NetCs\Chap3\Concat\";  string greeting = @"""Hello, world"""; 
Concatenation

The String class provides a method Concat for concatenating strings. In C# you can use the operators + and += to perform concatenation. The following program illustrates string literals and concatenation.

 // Concat.cs  using System;  public class Concat  {     public static void Main(string[] args)     {        str[0],string s1 = "bat";        Console.WriteLine("s1 = {0}", s1);        string s2 = "man";        Console.WriteLine("s2 = {0}", s2);        s1 += s2;        Console.WriteLine(s1);        string path1 = "c:\OI\NetCs\Chap3\Concat";        Console.WriteLine("path1 = {0}", path1);        string path = @"c:\OI\NetCs\Chap3\Concat\";        string file = "Concat.cs";        path = path + file;        Console.WriteLine(path);        string greeting = @"""Hello, world""";        Console.WriteLine(greeting);     }  } 

Here is the output:

 s1 = bat  s2 = man  batman  path1 = c:\OI\NetCs\Chap3\Concat  c:\OI\NetCs\Chap3\Concat\Concat.cs  "Hello, world" 
Index

You can extract an individual character from a string using a square bracket and a zero-based index.

 string s1 = "bat";  char ch = s1[0];   // contains 'b' 
Relational Operators

In general, for reference types, the == and != operators check if the object references are the same, not whether the contents of the memory locations referred to are the same. However, the String class overloads these operators, so that the textual content of the strings is compared. The program StringRelation illustrates using these relational operators on strings. The inequality operators, such as < , are not available for strings; use the Compare method.

String Equality

To fully understand issues of string equality, you should be aware of how the compiler stores strings. When string literals are encountered, they are entered into an internal table of string identities. If a second literal is encountered with the same string data, an object reference will be returned to the existing string in the table; no second copy will be made. As a result of this compiler optimization, the two object references will be the same, as represented in Figure 3-3.

Figure 3-3. Object references to a string literal refer to the same storage.

graphics/03fig03.gif

You should not be misled by this fact to conclude that two object references to the same string data will always be the same. If the contents of the string get determined at runtime, for example, by the user inputting the data, the compiler has no way of knowing that the second string should have an identical object reference. Hence you will have two distinct object references, which happen to refer to the same data, as illustrated in Figure 3-4.

Figure 3-4. Two distinct object references, which happen to refer to the same data.

graphics/03fig04.gif

As discussed, when strings are checked for equality, either through the relational operator == or through the Equals method, a comparison is made of the contents of the strings, not of the object references. So in both the previous cases the strings a and b will check out as equal. You have to be more careful with other reference types, where reference equality is not the same as content equality.

String Comparison

The fundamental way to compare strings for equality is to use the Equals method of the String class. There are several overloaded versions of this function, including a static version that takes two string parameters and a nonstatic version that takes one string parameter that is compared with the current instance. These methods perform a case-sensitive comparison of the contents of the strings. A bool value of true or false is returned.

If you wish to perform a case-insensitive comparison, you may use the Compare method. This method has several overloaded versions, all of them static. Two strings, s1 and s2, are compared. An integer is returned expressing the lexical relationship between the two strings, as shown in Table 3-5.

Table 3-5. Return Values of the Compare Method

Relationship

Return Value

s1 less than s2

negative integer

s1 equal to s2

s1 greater than s2

positive integer

A third parameter allows you to control the case sensitivity of the comparison. If you use only two parameters, a case-sensitive comparison is performed. The third parameter is a bool . A value of false calls for a case-sensitive comparison, and a value of true calls for ignoring case.

The program StringCompare illustrates a number of comparisons, using both the Equal and Compare methods.

String Input

The Console class has methods for inputting characters and strings. The Read method reads in a single character (as an int ). The ReadLine method reads in a line of input, terminated by a carriage return, line feed, or combination, and will return a string . In general, the ReadLine method is the easier to use and synchronizes nicely with Write and WriteLine . The program ReadStrings illustrates reading in a first name, a middle initial, and a last name. All input is done via ReadLine . The middle initial as a character is determined by extracting the character at position 0.

Our InputWrapper class has a method getString , which provides a prompt and reads in a string.

String Methods and Properties

In this section we will survey a few useful methods and properties of the String class. Many of the methods have various overloaded versions. We show a representative version. Consult the online documentation for details on these and other methods. The program StringMethods demonstrates all the examples that follow.

Length
 public int Length {get;} 

This property returns the length of a string. Notice the convenient shorthand notation that is used for declaring a property.

 string str = "hello";  int n = str.Length;                 // 5 
ToUpper
 public string ToUpper (); 

This method returns a new string in which all characters of the original string have been converted to uppercase.

 str = "goodbye";  str = str.ToUpper();              // GOODBYE 
ToLower
 public string ToLower (); 

This method returns a new string in which all characters of the original string have been converted to lowercase.

 str = str.ToLower();                  // goodbye 
Substring
 public string Substring(int startIndex, int length); 

This method returns a substring that starts from a specified index position in the value and continues for a specified length. Remember that in C# the index of the first character in a string is 0.

 string sub = str.Substring(4,3);      // bye 
IndexOf
 public int IndexOf(string value); 

This method returns the index of the first occurrence of the specified string. If the string is not found, -1 is returned.

 str = "goodbye";  int n1 = str.IndexOf("bye");    // 4  int n2 = str.IndexOf("boo");    // -1 

StringBuilder Class

As we have discussed, instances of the String class are immutable. As a result, when you manipulate instances of String , you are frequently obtaining new String instances. Depending on your applications, creating all these instances may be expensive. The .NET library provides a special class StringBuilder (located in the System.Text namespace) in which you may directly manipulate the underlying string without creating a new instance. When you are done, you can create a String instance out of an instance of StringBuilder by using the ToString method.

A StringBuilder instance has a capacity and a maximum capacity. These capacities can be specified in a constructor when the instance is created. By default, an empty StringBuilder instance starts out with a capacity of 16. As the stored string expands, the capacity will be increased automatically. The program StringBuilderDemo provides a simple demonstration of using the StringBuilder class. It shows the starting capacity and the capacity after strings are appended. At the end, a String is returned.

 // StringBuilderDemo.cs  using System;  using System.Text;  public class StringBuilderDemo  {     public static void Main(string[] args)     {        StringBuilder build = new StringBuilder();        Console.WriteLine("capacity = {0}", build.Capacity);        build.Append("This is the first sentence.\n");        Console.WriteLine("capacity = {0}", build.Capacity);        build.Append("This is the second sentence.\n");        Console.WriteLine("capacity = {0}", build.Capacity);        build.Append("This is the last sentence.\n");        Console.WriteLine("capacity = {0}", build.Capacity);        string str = build.ToString();        Console.Write(str);     }  } 

Here is the output:

 capacity = 16  capacity = 34  capacity = 70  capacity = 142  This is the first sentence.  This is the second sentence.  This is the last sentence. 
for RuBoard


Application Development Using C# and .NET
Application Development Using C# and .NET
ISBN: 013093383X
EAN: 2147483647
Year: 2001
Pages: 158

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net