Section 10.1. Strings

10.1. Strings

C# treats strings as first-class types that are flexible, powerful, and easy to use.

In C# programming you typically use the C# alias for a Framework type (e.g., int for Int32) but you are always free to use the underlying type. C# programmers thus use string (lowercase) and the underlying Framework type String (uppercase) interchangeably.

The declaration of the String class is:

public sealed class String :     IComparable<T>, ICloneable, IConvertible, IEnumerable<T>

This declaration reveals that the class is sealed, meaning that it is not possible to derive from the String class. The class also implements four system interfacesIComparable<T>, ICloneable, IConvertible, and IEnumerable<T>that dictate functionality that String shares with other classes in the .NET Framework.

Each string object is an immutable sequence of Unicode characters. The fact that String is immutable means that methods that appear to change the string actually return a modified copy; the original string remains intact in memory until it is garbage-collected. This may have performance implications; if you plan to do significant repeated string manipulation, use a StringBuilder (described later).

As seen in Chapter 9, the IComparable<T> interface is implemented by types whose values can be ordered. Strings, for example, can be alphabetized; any given string can be compared with another string to determine which should come first in an ordered list.^[1] IComparable classes implement the CompareTo method. IEnumerable , also discussed in Chapter 9, lets you use the foreach construct to enumerate a string as a collection of chars.

^[1] Ordering the string is one of a number of lexical operations that act on the value of the string and take into account culture-specific information based on the explicitly declared culture or the implicit current culture. Therefore, if the current culture is U.S. English (as is assumed throughout this book), the Compare method considers 'a' less than 'A'. CompareOrdinal performs an ordinal comparison, and thus regardless of culture, 'a' is greater than 'A'.

ICloneable objects can create new instances with the same value as the original instance. In this case, it is possible to clone a string to produce a new string with the same values (characters) as the original. ICloneable classes implement the Clone() method.

Actually, because strings are immutable, the Clone( ) method on String just returns a reference to the original string. If you change the cloned string, a new String is then created:

string s1 = "One Two Three Four"; string sx = (string)s1.Clone(); Console.WriteLine(     Object.ReferenceEquals(s1,sx)); sx += " Five"; Console.WriteLine(     Object.ReferenceEquals(s1, sx)); Console.WriteLine(sx);

In this case, sx is created as a clone of s1. The first WriteLine statement will print the word TRue; the two strings variables refer to the same string in memory. When you change sx you actually create a new string from the first, and when the ReferenceEquals method returns false, the final WriteLine statement returns the contents of the original string with the word "Five" appended.

IConvertible classes provide methods to facilitate conversion to other primitive types such as ToInt32( ), ToDouble( ), ToDecimal(), etc.

10.1.1. Creating Strings

The most common way to create a string is to assign a quoted string of characters, known as a string literal, to a user-defined variable of type string:

string newString = "This is a string literal";

Quoted strings can include escape characters, such as \n or \t, which begin with a backslash character (\). The two shown are used to indicate where line breaks or tabs are to appear, respectively.

Because the backslash is the escape character, if you want to put a backslash into a string (e.g., to create a path listing), you must quote the backslash with a second backslash (\\).

Strings can also be created using verbatim string literals, which start with the (@) symbol. This tells the String constructor that the string should be used verbatim, even if it spans multiple lines or includes escape characters. In a verbatim string literal, backslashes and the characters that follow them are simply considered additional characters of the string. Thus, the following two definitions are equivalent:

string literalOne = "\\\\MySystem\\MyDirectory\\ProgrammingC#.cs"; string verbatimLiteralOne =    @"\\MySystem\MyDirectory\ProgrammingC#.cs";

In the first line, a nonverbatim string literal is used, and so the backslash character (\) must be escaped. This means it must be preceded by a second backslash character. In the second line, a verbatim literal string is used, so the extra backslash is not needed. A second example illustrates multiline verbatim strings:

string literalTwo = "Line One\nLine Two"; string verbatimLiteralTwo = @"Line One Line Two";

If you have double quotes within a verbatim string, you must escape them so that the compiler knows when the verbatim string ends.

Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style.

10.1.2. The ToString() Method

Another common way to create a string is to call the ToString( ) method on an object and assign the result to a string variable. All the built-in types override this method to simplify the task of converting a value (often a numeric value) to a string representation of that value. In the following example, the ToString( ) method of an integer type is called to store its value in a string:

int myInteger = 5; string integerString = myInteger.ToString();

The call to myInteger.ToString( ) returns a String object, which is then assigned to integerString.

The .NET String class provides a wealth of overloaded constructors that support a variety of techniques for assigning string values to string types. Some of these constructors enable you to create a string by passing in a character array or character pointer. Passing in a character array as a parameter to the constructor of the String creates a CLR-compliant new instance of a string. Passing in a character pointer requires the unsafe marker as explained in Chapter 22.

10.1.3. Manipulating Strings

The string class provides a host of methods for comparing, searching, and manipulating strings, the most important of which are shown in Table 10-1.

Table 10-1. Methods and fields for the string class
Method or field	Purpose
`Empty`	Public static field that represents the empty string.
`Compare( )`	Overloaded public static method that compares two strings.
`CompareOrdinal( )`	Overloaded public static method that compares two strings without regard to locale or culture.
`Concat( )`	Overloaded public static method that creates a new string from one or more strings.
`Copy( )`	Public static method that creates a new string by copying another.
`Equals( )`	Overloaded public static and instance method that determines if two strings have the same value.
`Format( )`	Overloaded public static method that formats a string using a format specification.
`Join( )`	Overloaded public static method that concatenates a specified string between each element of a string array.
`Chars`	The string indexer.
`Length`	The number of characters in the instance.
`CompareTo( )`	Compares this string with another.
`CopyTo( )`	Copies the specified number of characters to an array of Unicode characters.
`EndsWith( )`	Indicates whether the specified string matches the end of this string.
`Equals( )`	Determines if two strings have the same value.
`Insert( )`	Returns a new string with the specified string inserted.
`LastIndexOf( )`	Reports the index of the last occurrence of a specified character or string within the string.
`PadLeft( )`	Right-aligns the characters in the string, padding to the left with spaces or a specified character.
`PadRight( )`	Left-aligns the characters in the string, padding to the right with spaces or a specified character.
`Remove( )`	Deletes the specified number of characters.
`Split( )`	Returns the substrings delimited by the specified characters in a string array.
`StartsWith( )`	Indicates if the string starts with the specified characters.
`Substring( )`	Retrieves a substring.
`ToCharArray( )`	Copies the characters from the string to a character array.
`ToLower( )`	Returns a copy of the string in lowercase.
`ToUpper( )`	Returns a copy of the string in uppercase.
`trim( )`	Removes all occurrences of a set of specified characters from beginning and end of the string.
`trimEnd( )`	Behaves like `TRim()`, but only at the end.
`TRimStart( )`	Behaves like `TRim()`, but only at the start.

Example 10-1 illustrates the use of some of these methods, including Compare( ), Concat() (and the overloaded + operator), Copy( ) (and the = operator), Insert(), EndsWith( ), and IndexOf( ).

Example 10-1. Working with strings

#region Using directives using System; using System.Collections.Generic; using System.Text; #endregion namespace WorkingWithStrings {    public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "abcd";          string s2 = "ABCD";          string s3 = @"Liberty Associates, Inc.                  provides custom .NET development,                  on-site Training and Consulting";          int result;  // hold the results of comparisons          // compare two strings, case sensitive          result = string.Compare( s1, s2 );          Console.WriteLine(             "compare s1: {0}, s2: {1}, result: {2}\n",             s1, s2, result );          // overloaded compare, takes boolean "ignore case"           //(true = ignore case)          result = string.Compare( s1, s2, true );          Console.WriteLine( "compare insensitive\n" );          Console.WriteLine( "s4: {0}, s2: {1}, result: {2}\n",             s1, s2, result );          // concatenation method          string s6 = string.Concat( s1, s2 );          Console.WriteLine(             "s6 concatenated from s1 and s2: {0}", s6 );          // use the overloaded operator          string s7 = s1 + s2;          Console.WriteLine(             "s7 concatenated from s1 + s2: {0}", s7 );          // the string copy method          string s8 = string.Copy( s7 );          Console.WriteLine(             "s8 copied from s7: {0}", s8 );          // use the overloaded operator          string s9 = s8;          Console.WriteLine( "s9 = s8: {0}", s9 );          // three ways to compare.           Console.WriteLine(             "\nDoes s9.Equals(s8)?: {0}",             s9.Equals( s8 ) );          Console.WriteLine(             "Does Equals(s9,s8)?: {0}",             string.Equals( s9, s8 ) );          Console.WriteLine(             "Does s9==s8?: {0}", s9 == s8 );          // Two useful properties: the index and the length          Console.WriteLine(             "\nString s9 is {0} characters long. ",             s9.Length );          Console.WriteLine(             "The 5th character is {1}\n",             s9.Length, s9[4] );          // test whether a string ends with a set of characters          Console.WriteLine( "s3:{0}\nEnds with Training?: {1}\n",             s3,             s3.EndsWith( "Training" ) );          Console.WriteLine(             "Ends with Consulting?: {0}",             s3.EndsWith( "Consulting" ) );          // return the index of the substring          Console.WriteLine(             "\nThe first occurrence of Training " );          Console.WriteLine( "in s3 is {0}\n",             s3.IndexOf( "Training" ) );          // insert the word excellent before "training"          string s10 = s3.Insert( 101, "excellent " );          Console.WriteLine( "s10: {0}\n", s10 );          // you can combine the two as follows:          string s11 = s3.Insert( s3.IndexOf( "Training" ),             "excellent " );          Console.WriteLine( "s11: {0}\n", s11 );       }    } } Output: compare s1: abcd, s2: ABCD, result: -1 compare insensitive s4: abcd, s2: ABCD, result: 0 s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD s8 copied from s7: abcdABCD s9 = s8: abcdABCD Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True String s9 is 8 characters long. The 5th character is A s3:Liberty Associates, Inc.                 provides custom .NET development,                 on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True The first occurrence of Training in s3 is 101 s10: Liberty Associates, Inc.                 provides custom .NET development,                 on-site excellent Training and Consulting s11: Liberty Associates, Inc.                 provides custom .NET development,                 on-site excellent Training and Consulting

Example 10-1 begins by declaring three strings:

string s1 = "abcd"; string s2 = "ABCD"; string s3 = @"Liberty Associates, Inc.        provides custom .NET development,        on-site Training and Consulting";

The first two are string literals, and the third is a verbatim string literal. We begin by comparing s1 to s2. The Compare( ) method is a public static method of string, and it is overloaded. The first overloaded version takes two strings and compares them:

// compare two strings, case sensitive result = string.Compare(s1, s2); Console.WriteLine("compare s1: {0}, s2: {1}, result: {2}\n",     s1, s2, result);

This is a case-sensitive comparison and returns different values, depending on the results of the comparison:

A negative integer, if the first string is less than the second string
0, if the strings are equal
A positive integer, if the first string is greater than the second string

In this case, the output properly indicates that s1 is "less than" s2. In Unicode (as in ASCII), a lowercase letter has a smaller value than an uppercase letter:

compare s1: abcd, s2: ABCD, result: -1

The second comparison uses an overloaded version of Compare( ) that takes a third, Boolean parameter, whose value determines whether case should be ignored in the comparison. If the value of this "ignore case" parameter is true, the comparison is made without regard to case, as in the following:

result = string.Compare(s1,s2, true); Console.WriteLine("compare insensitive\n"); Console.WriteLine("s4: {0}, s2: {1}, result: {2}\n",      s1, s2, result);

The result is written with two WriteLine( ) statements to keep the lines short enough to print properly in this book.

This time the case is ignored and the result is 0, indicating that the two strings are identical (without regard to case):

compare insensitive s4: abcd, s2: ABCD, result: 0

Example 10-1 then concatenates some strings. There are a couple of ways to accomplish this. You can use the Concat( ) method, which is a static public method of string:

string s6 = string.Concat(s1,s2);

or you can simply use the overloaded concatenation (+) operator:

string s7 = s1 + s2;

In both cases, the output reflects that the concatenation was successful:

s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD

Similarly, creating a new copy of a string can be accomplished in two ways. First, you can use the static Copy( ) method:

string s8 = string.Copy(s7);

This actually creates two separate strings with the same values. Since strings are immutable, this is wasteful. Better is either to use the overloaded assignment operator or the Clone method (mentioned earlier), both of which leave you with two variables pointing to the same string in memory:

string s9 = s8;

The .NET String class provides three ways to test for the equality of two strings. First, you can use the overloaded Equals( ) method and ask s9 directly whether s8 is of equal value:

Console.WriteLine("\nDoes s9.Equals(s8)?: {0}",     s9.Equals(s8));

A second technique is to pass both strings to String's static method Equals():

Console.WriteLine("Does Equals(s9,s8)?: {0}",        string.Equals(s9,s8));

A final method is to use the equality operator (==) of String:

Console.WriteLine("Does s9==s8?: {0}", s9 == s8);

In each case, the returned result is a Boolean value, as shown in the output:

Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True

The next several lines in Example 10-1 use the index operator ([]) to find a particular character within a string, and use the Length property to return the length of the entire string:

Console.WriteLine("\nString s9 is {0} characters long.",     s9.Length); Console.WriteLine("The 5th character is {1}\n",      s9.Length, s9[4]);

Here's the output:

String s9 is 8 characters long. The 5th character is A

The EndsWith( ) method asks a string whether a substring is found at the end of the string. Thus, you might first ask s3 if it ends with training (which it doesn't) and then if it ends with Consulting (which it does):

// test whether a string ends with a set of characters Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n",     s3, s3.EndsWith("Training") ); Console.WriteLine("Ends with Consulting?: {0}",     s3.EndsWith("Consulting"));

The output reflects that the first test fails and the second succeeds:

s3:Liberty Associates, Inc.                provides custom .NET development,                on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True

The IndexOf( ) method locates a substring within our string, and the Insert( ) method inserts a new substring into a copy of the original string.

The following code locates the first occurrence of training in s3:

Console.WriteLine("\nThe first occurrence of Training "); Console.WriteLine ("in s3 is {0}\n",      s3.IndexOf("Training"));

The output indicates that the offset is 101:

The first occurrence of Training in s3 is 101

You can then use that value to insert the word excellent, followed by a space, into that string. Actually, the insertion is into a copy of the string returned by the Insert( ) method and assigned to s10:

string s10 = s3.Insert(101,"excellent"); Console.WriteLine("s10: {0}\n",s10);

Here's the output:

s10: Liberty Associates, Inc.                provides custom .NET development,                on-site excellent Training and Consulting

Finally, you can combine these operations:

string s11 = s3.Insert(s3.IndexOf("Training"),"excellent "); Console.WriteLine("s11: {0}\n",s11);

to obtain the identical output:

s11: Liberty Associates, Inc.                provides custom .NET development,                on-site excellent Training and Consulting

10.1.4. Finding Substrings

The String type provides an overloaded Substring( ) method for extracting substrings from within strings. Both versions take an index indicating where to begin the extraction, and one of the two versions takes a second index to indicate where to end the operation. The Substring( ) method is illustrated in Example 10-2.

Example 10-2. Using the Substring() method

#region Using directives using System; using System.Collections.Generic; using System.Text; #endregion namespace SubString {    public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One Two Three Four";          int ix;          // get the index of the last space          ix = s1.LastIndexOf( " " );          // get the last word.          string s2 = s1.Substring( ix + 1 );          // set s1 to the substring starting at 0          // and ending at ix (the start of the last word          // thus s1 has one two three          s1 = s1.Substring( 0, ix );          // find the last space in s1 (after two)          ix = s1.LastIndexOf( " " );          // set s3 to the substring starting at           // ix, the space after "two" plus one more          // thus s3 = "three"          string s3 = s1.Substring( ix + 1 );          // reset s1 to the substring starting at 0          // and ending at ix, thus the string "one two"          s1 = s1.Substring( 0, ix );          // reset ix to the space between           // "one" and "two"          ix = s1.LastIndexOf( " " );          // set s4 to the substring starting one          // space after ix, thus the substring "two"          string s4 = s1.Substring( ix + 1 );          // reset s1 to the substring starting at 0          // and ending at ix, thus "one"          s1 = s1.Substring( 0, ix );          // set ix to the last space, but there is           // none so ix now = -1          ix = s1.LastIndexOf( " " );          // set s5 to the substring at one past          // the last space. there was no last space          // so this sets s5 to the substring starting          // at zero          string s5 = s1.Substring( ix + 1 );          Console.WriteLine( "s2: {0}\ns3: {1}", s2, s3 );          Console.WriteLine( "s4: {0}\ns5: {1}\n", s4, s5 );          Console.WriteLine( "s1: {0}\n", s1 );       }    } } Output: s2: Four s3: Three s4: Two s5: One s1: One

Example 10-2 is not an elegant solution to the problem of extracting words from a string, but it is a good first approximation, and it illustrates a useful technique. The example begins by creating a string, s1:

string s1 = "One Two Three Four";

Then ix is assigned the value of the last space in the string:

ix=s1.LastIndexOf(" ");

Then the substring that begins one space later is assigned to the new string, s2:

string s2 = s1.Substring(ix+1);

This extracts ix+1 to the end of the line, assigning to s2 the value Four.

The next step is to remove the word Four from s1. You can do this by assigning to s1 the substring of s1, which begins at 0 and ends at ix:

s1 = s1.Substring(0,ix);

Reassign ix to the last (remaining) space, which points you to the beginning of the word Three, which we then extract into string s3. Continue like this until s4 and s5 are populated. Finally, print the results:

s2: Four s3: Three s4: Two s5: One s1: One

This isn't elegant, but it works and it illustrates the use of Substring. This is not unlike using pointer arithmetic in C++, but without the pointers and unsafe code.

10.1.5. Splitting Strings

A more effective solution to the problem illustrated in Example 10-2 is to use the Split( ) method of String, whose job is to parse a string into substrings. To use Split( ), pass in an array of delimiters (characters that will indicate a split in the words), and the method returns an array of substrings. Example 10-3 illustrates.

Example 10-3. Using the Split() method

#region Using directives using System; using System.Collections.Generic; using System.Text; #endregion namespace StringSplit {    public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One,Two,Three Liberty Associates, Inc.";          // constants for the space and comma characters          const char Space = ' ';          const char Comma = ',';          // array of delimiters to split the sentence with          char[] delimiters = new char[]              {                Space,                Comma             };          string output = "";          int ctr = 1;          // split the string and then iterate over the          // resulting array of strings          foreach ( string subString in s1.Split( delimiters ) )          {             output += ctr++;             output += ": ";             output += subString;             output += "\n";          }          Console.WriteLine( output );       }    } } Output: 1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.

You start by creating a string to parse:

string s1 = "One,Two,Three Liberty Associates, Inc.";

The delimiters are set to the space and comma characters. You then call Split( ) on this string, and pass the results to the foreach loop:

foreach (string subString in s1.Split(delimiters))

Because Split uses the params keyword, you can reduce your code to:

foreach (string subString in s1.Split(' ', ','))

This eliminates the declaration of the array entirely.

Start by initializing output to an empty string and then build up the output string in four steps. Concatenate the value of ctr. Next add the colon, then the substring returned by split, then the newline. With each concatenation, a new copy of the string is made, and all four steps are repeated for each substring found by Split( ). This repeated copying of string is terribly inefficient.

The problem is that the string type is not designed for this kind of operation. What you want is to create a new string by appending a formatted string each time through the loop. The class you need is StringBuilder.

10.1.6. Manipulating Dynamic Strings

The System.Text.StringBuilder class is used for creating and modifying strings. The important members of StringBuilder are summarized in Table 10-2.

Table 10-2. StringBuilder methods
Method	Explanation
`Chars`	The indexer.
`Length`	Retrieves or assigns the length of the `StringBuilder`.
`Append( )`	Overloaded public method that appends a string of characters to the end of the current `StringBuilder`.
`AppendFormat( )`	Overloaded public method that replaces format specifiers with the formatted value of an object .
`Insert( )`	Overloaded public method that inserts a string of characters at the specified position.
`Remove( )`	Removes the specified characters.
`Replace( )`	Overloaded public method that replaces all instances of specified characters with new characters.

Unlike String, StringBuilder is mutable; when you modify a StringBuilder, you modify the actual string, not a copy. Example 10-4 replaces the String object in Example 10-3 with a StringBuilder object.

Example 10-4. Using a StringBuilder

#region Using directives using System; using System.Collections.Generic; using System.Text; #endregion namespace UsingStringBuilder {    public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One,Two,Three Liberty Associates, Inc.";          // constants for the space and comma characters          const char Space = ' ';          const char Comma = ',';          // array of delimiters to split the sentence with          char[] delimiters = new char[]           {                Space,                Comma          };          // use a StringBuilder class to build the          // output string          StringBuilder output = new StringBuilder( );          int ctr = 1;          // split the string and then iterate over the          // resulting array of strings          foreach ( string subString in s1.Split( delimiters ) )          {             // AppendFormat appends a formatted string             output.AppendFormat( "{0}: {1}\n", ctr++, subString );          }          Console.WriteLine( output );       }    } }

Only the last part of the program is modified. Instead of using the concatenation operator to modify the string, use the AppendFormat( ) method of StringBuilder to append new, formatted strings as you create them. This is more efficient. The output is identical to that of Example 10-3:

1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.

Delimiter Limitations

Because you passed in delimiters of both comma and space, the space after the comma between "Associates" and "Inc." is returned as a word, numbered 6 as shown. That is not what you want. To eliminate this you need to tell split to match a comma (as between One, Two, and Three), or a space (as between Liberty and Associates), or a comma followed by a space. It is that last bit that is tricky and requires that you use a regular expression.