10.1 Strings

C# treats strings as first-class types that are flexible, powerful, and easy to use. Each string object is an immutable sequence of Unicode characters. In other words, methods that appear to change the string actually return a modified copy; the original string remains intact.

When you declare a C# string using the string keyword, you are in fact declaring the object to be of the type System.String, one of the built-in types provided by the .NET Framework Class Library. A C# string type is a System.String type, and we will use the names interchangeably throughout the chapter.

The declaration of the System.String class is:

public sealed class String :     IComparable, ICloneable, IConvertible, IEnumerable

This declaration reveals that the class is sealed, meaning that it is not possible to derive from the string class. The class also implements four system interfaces IComparable, ICloneable, IConvertible, and IEnumerable that dictate functionality that System.String shares with other classes in the .NET Framework.

C and C++ programmers take note: C# strings are immutable; modifying a string does not modify it in place. This may have performance implications. If you plan to do significant string manipulation, use a StringBuilder.

As seen in Chapter 9, the IComparable interface is implemented by types whose values can be ordered. Strings, for example, can be alphabetized; any given string can be compared with another string to determine which should come first in an ordered list. IComparable classes implement the CompareTo method. IEnumerable, also discussed in Chapter 9, lets you use the foreach construct to enumerate a string as a collection of chars.

ICloneable objects can create new instances with the same value as the original instance. In this case, it is possible to clone a string to produce a new string with the same values (characters) as the original. ICloneable classes implement the Clone( ) method.

IConvertible classes provide methods to facilitate conversion to other primitive types such as ToInt32( ), ToDouble( ), ToDecimal( ), etc.

10.1.1 Creating Strings

The most common way to create a string is to assign a quoted string of characters, known as a string literal, to a user-defined variable of type string:

string newString = "This is a string literal";

Quoted strings can include escape characters, such as "\n" or "\t," which begin with a backslash character (\) and are used to indicate where line breaks or tabs are to appear. Because the backslash is itself used in some command-line syntaxes, such as URLs or directory paths, in a quoted string the backslash must be preceded by another backslash.

Strings can also be created using verbatim string literals, which start with the (@) symbol. This tells the String constructor that the string should be used verbatim, even if it spans multiple lines or includes escape characters. In a verbatim string literal, backslashes and the characters that follow them are simply considered additional characters of the string. Thus, the following two definitions are equivalent:

string literalOne = "\\\\MySystem\\MyDirectory\\ProgrammingC#.cs"; string verbatimLiteralOne = @"\\MySystem\MyDirectory\ProgrammingC#.cs";

In the first line, a nonverbatim string literal is used, and so the backslash characters (\) must be escaped. This means it must be preceded by a second backslash character. In the second line, a verbatim literal string is used, so the extra backslash is not needed. A second example illustrates multiline verbatim strings:

string literalTwo = "Line One\nLine Two"; string verbatimLiteralTwo = @"Line One Line Two";

Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style.

10.1.2 The ToString( ) Method

Another common way to create a string is to call the ToString( ) method on an object and assign the result to a string variable. All the built-in types override this method to simplify the task of converting a value (often a numeric value) to a string representation of that value. In the following example, the ToString( ) method of an integer type is called to store its value in a string:

int myInteger = 5; string integerString = myInteger.ToString( );

The call to myInteger.ToString( ) returns a String object, which is then assigned to integerString.

The .NET String class provides a wealth of overloaded constructors that support a variety of techniques for assigning string values to string types. Some of these constructors enable you to create a string by passing in a character array or character pointer. Passing in a character array as a parameter to the constructor of the String creates a CLR-compliant new instance of a string. Passing in a character pointer creates a noncompliant, "unsafe" instance.

10.1.3 Manipulating Strings

The string class provides a host of methods for comparing, searching, and manipulating strings, as shown in Table 10-1.

Table 10-1. Methods and fields for the string class

Method or field

Purpose

Empty

Public static field that represents the empty string.

Compare( )

Overloaded public static method that compares two strings.

CompareOrdinal( )

Overloaded public static method that compares two strings without regard to local or culture.

Concat( )

Overloaded public static method that creates a new string from one or more strings.

Copy( )

Public static method that creates a new string by copying another.

Equals( )

Overloaded public static and instance method that determines if two strings have the same value.

Format( )

Overloaded public static method that formats a string using a format specification.

Intern( )

Public static method that returns a reference to the specified instance of a string.

IsInterned( )

Public static method that returns a reference for the string.

Join( )

Overloaded public static method that concatenates a specified string between each element of a string array.

Chars

The string indexer.

Length

The number of characters in the instance.

Clone( )

Returns the string.

CompareTo( )

Compares this string with another.

CopyTo( )

Copies the specified number of characters to an array of Unicode characters.

EndsWith( )

Indicates whether the specified string matches the end of this string.

Equals( )

Determines if two strings have the same value.

Insert( )

Returns a new string with the specified string inserted.

LastIndexOf( )

Reports the index of the last occurrence of a specified character or string within the string.

PadLeft( )

Right-aligns the characters in the string, padding to the left with spaces or a specified character.

PadRight( )

Left-aligns the characters in the string, padding to the right with spaces or a specified character.

Remove( )

Deletes the specified number of characters.

Split( )

Returns the substrings delimited by the specified characters in a string array.

StartsWith( )

Indicates if the string starts with the specified characters.

Substring( )

Retrieves a substring.

ToCharArray( )

Copies the characters from the string to a character array.

ToLower( )

Returns a copy of the string in lowercase.

ToUpper( )

Returns a copy of the string in uppercase.

Trim( )

Removes all occurrences of a set of specified characters from beginning and end of the string.

TrimEnd( )

Behaves like Trim( ), but only at the end.

TrimStart( )

Behaves like Trim( ), but only at the start.

Example 10-1 illustrates the use of some of these methods, including Compare( ), Concat( ) (and the overloaded + operator), Copy( ) (and the = operator), Insert( ), EndsWith( ), and IndexOf( ).

Example 10-1. Working with strings
namespace Programming_CSharp {    using System;    public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "abcd";          string s2 = "ABCD";          string s3 = @"Liberty Associates, Inc.                  provides custom .NET development,                  on-site Training and Consulting";               int result;  // hold the results of comparisons          // compare two strings, case sensitive          result = string.Compare(s1, s2);          Console.WriteLine(             "compare s1: {0}, s2: {1}, result: {2}\n",              s1, s2, result);                      // overloaded compare, takes boolean "ignore case"           //(true = ignore case)          result = string.Compare(s1,s2, true);          Console.WriteLine("compare insensitive\n");          Console.WriteLine("s4: {0}, s2: {1}, result: {2}\n",              s1, s2, result);                      // concatenation method          string s6 = string.Concat(s1,s2);          Console.WriteLine(             "s6 concatenated from s1 and s2: {0}", s6);          // use the overloaded operator          string s7 = s1 + s2;          Console.WriteLine(             "s7 concatenated from s1 + s2: {0}", s7);          // the string copy method          string s8 = string.Copy(s7);          Console.WriteLine(             "s8 copied from s7: {0}", s8);          // use the overloaded operator          string s9 = s8;          Console.WriteLine("s9 = s8: {0}", s9);          // three ways to compare.           Console.WriteLine(             "\nDoes s9.Equals(s8)?: {0}",              s9.Equals(s8));          Console.WriteLine(                "Does Equals(s9,s8)?: {0}",              string.Equals(s9,s8));          Console.WriteLine(             "Does s9==s8?: {0}", s9 == s8);          // Two useful properties: the index and the length          Console.WriteLine(             "\nString s9 is {0} characters long. ",              s9.Length);          Console.WriteLine(             "The 5th character is {1}\n",              s9.Length, s9[4]);          // test whether a string ends with a set of characters          Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n",             s3,              s3.EndsWith("Training") );          Console.WriteLine(             "Ends with Consulting?: {0}",             s3.EndsWith("Consulting"));          // return the index of the substring          Console.WriteLine(             "\nThe first occurrence of Training ");          Console.WriteLine ("in s3 is {0}\n",              s3.IndexOf("Training"));          // insert the word excellent before "training"          string s10 = s3.Insert(101,"excellent ");          Console.WriteLine("s10: {0}\n",s10);          // you can combine the two as follows:          string s11 = s3.Insert(s3.IndexOf("Training"),             "excellent ");          Console.WriteLine("s11: {0}\n",s11);       }    } } Output: compare s1: abcd, s2: ABCD, result: -1 compare insensitive s4: abcd, s2: ABCD, result: 0 s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD s8 copied from s7: abcdABCD s9 = s8: abcdABCD Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True String s9 is 8 characters long. The 5th character is A s3:Liberty Associates, Inc.                 provides custom .NET development,                 on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True The first occurrence of Training in s3 is 101 s10: Liberty Associates, Inc.                 provides custom .NET development,                 on-site excellent Training and Consulting s11: Liberty Associates, Inc.                 provides custom .NET development,                 on-site excellent Training and Consulting

Example 10-1 begins by declaring three strings:

string s1 = "abcd"; string s2 = "ABCD"; string s3 = @"Liberty Associates, Inc.        provides custom .NET development,        on-site Training and Consulting";

The first two are string literals, and the third is a verbatim string literal. We begin by comparing s1 to s2. The Compare( ) method is a public static method of string, and it is overloaded. The first overloaded version takes two strings and compares them:

// compare two strings, case sensitive result = string.Compare(s1, s2); Console.WriteLine("compare s1: {0}, s2: {1}, result: {2}\n",     s1, s2, result);            

This is a case-sensitive comparison and returns different values, depending on the results of the comparison:

  • A negative integer, if the first string is less than the second string

  • 0, if the strings are equal

  • A positive integer, if the first string is greater than the second string

In this case, the output properly indicates that s1 is "less than" s2. In Unicode (as in ASCII), a lowercase letter has a smaller value than an uppercase letter:

compare s1: abcd, s2: ABCD, result: -1

The second comparison uses an overloaded version of Compare( ) that takes a third, Boolean parameter, whose value determines whether case should be ignored in the comparison. If the value of this "ignore case" parameter is true, the comparison is made without regard to case, as in the following:

result = string.Compare(s1,s2, true); Console.WriteLine("compare insensitive\n"); Console.WriteLine("s4: {0}, s2: {1}, result: {2}\n",      s1, s2, result);            

The result is written with two WriteLine( ) statements to keep the lines short enough to print properly in this book.

This time the case is ignored and the result is 0, indicating that the two strings are identical (without regard to case):

compare insensitive s4: abcd, s2: ABCD, result: 0

Example 10-1 then concatenates some strings. There are a couple of ways to accomplish this. You can use the Concat( ) method, which is a static public method of string:

string s6 = string.Concat(s1,s2);

or you can simply use the overloaded concatenation (+) operator:

string s7 = s1 + s2;

In both cases, the output reflects that the concatenation was successful:

s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD

Similarly, creating a new copy of a string can be accomplished in two ways. First, you can use the static Copy( ) method:

string s8 = string.Copy(s7);

Otherwise, for convenience, you might instead use the overloaded assignment operator (=), which will implicitly make a copy:

string s9 = s8;

Once again, the output reflects that each method has worked:

s8 copied from s7: abcdABCD s9 = s8: abcdABCD

The .NET String class provides three ways to test for the equality of two strings. First, you can use the overloaded Equals( ) method and ask s9 directly whether s8 is of equal value:

Console.WriteLine("\nDoes s9.Equals(s8)?: {0}",     s9.Equals(s8));

A second technique is to pass both strings to String's static method Equals( ):

Console.WriteLine("Does Equals(s9,s8)?: {0}",        string.Equals(s9,s8));

A final method is to use the overloaded equality operator (==) of String:

Console.WriteLine("Does s9==s8?: {0}", s9 == s8);

In each of these cases, the returned result is a Boolean value, as shown in the output:

Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True

The equality operator is the most natural when you have two string objects. However, some languages, such as VB.NET, do not support operator overloading, so be sure to override the Equals( ) instance method as well.

The next several lines in Example 10-1 use the index operator ([]) to find a particular character within a string, and use the Length property to return the length of the entire string:

Console.WriteLine("\nString s9 is {0} characters long.,     s9.Length); Console.WriteLine("The 5th character is {1}\n",      s9.Length, s9[4]);

Here's the output:

String s9 is 8 characters long. The 5th character is A

The EndsWith( ) method asks a string whether a substring is found at the end of the string. Thus, you might first ask s3 if it ends with Training (which it does not) and then if it ends with Consulting (which it does):

// test whether a string ends with a set of characters Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n",     s3, s3.EndsWith("Training") ); Console.WriteLine("Ends with Consulting?: {0}",     s3.EndsWith("Consulting"));

The output reflects that the first test fails and the second succeeds:

s3:Liberty Associates, Inc.                provides custom .NET development,                on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True

The IndexOf( ) method locates a substring within our string, and the Insert( ) method inserts a new substring into a copy of the original string.

The following code locates the first occurrence of Training in s3:

Console.WriteLine("\nThe first occurrence of Training "); Console.WriteLine ("in s3 is {0}\n",      s3.IndexOf("Training"));

The output indicates that the offset is 101:

The first occurrence of Training in s3 is 101

You can then use that value to insert the word excellent, followed by a space, into that string. Actually, the insertion is into a copy of the string returned by the Insert( ) method and assigned to s10:

string s10 = s3.Insert(101,"excellent"); Console.WriteLine("s10: {0}\n",s10);

Here's the output:

s10: Liberty Associates, Inc.                provides custom .NET development,                on-site excellent Training and Consulting

Finally, you can combine these operations to make a more efficient insertion statement:

string s11 = s3.Insert(s3.IndexOf("Training"),"excellent "); Console.WriteLine("s11: {0}\n",s11);

with the identical output:

s11: Liberty Associates, Inc.                provides custom .NET development,                on-site excellent Training and Consulting   

10.1.4 Finding Substrings

The String type provides an overloaded Substring( ) method for extracting substrings from within strings. Both versions take an index indicating where to begin the extraction, and one of the two versions takes a second index to indicate where to end the search. The Substring( ) method is illustrated in Example 10-2.

Example 10-2. Using the Substring( ) method
namespace Programming_CSharp {    using System;    using System.Text;         public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One Two Three Four";           int ix;          // get the index of the last space          ix=s1.LastIndexOf(" ");                       // get the last word.          string s2 = s1.Substring(ix+1);                        // set s1 to the substring starting at 0          // and ending at ix (the start of the last word          // thus s1 has one two three          s1 = s1.Substring(0,ix);                      // find the last space in s1 (after two)          ix = s1.LastIndexOf(" ");          // set s3 to the substring starting at           // ix, the space after "two" plus one more          // thus s3 = "three"          string s3 = s1.Substring(ix+1);          // reset s1 to the substring starting at 0          // and ending at ix, thus the string "one two"          s1 = s1.Substring(0,ix);          // reset ix to the space between           // "one" and "two"          ix = s1.LastIndexOf(" ");          // set s4 to the substring starting one          // space after ix, thus the substring "two"          string s4 = s1.Substring(ix+1);          // reset s1 to the substring starting at 0          // and ending at ix, thus "one"          s1 = s1.Substring(0,ix);          // set ix to the last space, but there is           // none so ix now = -1          ix = s1.LastIndexOf(" ");          // set s5 to the substring at one past          // the last space. there was no last space          // so this sets s5 to the substring starting          // at zero          string s5 = s1.Substring(ix+1);                       Console.WriteLine ("s2: {0}\ns3: {1}",s2,s3);          Console.WriteLine ("s4: {0}\ns5: {1}\n",s4,s5);          Console.WriteLine ("s1: {0}\n",s1);       }    } } Output: s2: Four s3: Three s4: Two s5: One s1: One

Example 10-2 is not an elegant solution to the problem of extracting words from a string, but it is a good first approximation, and it illustrates a useful technique. The example begins by creating a string, s1:

string s1 = "One Two Three Four"; 

Then ix is assigned the value of the last space in the string:

ix=s1.LastIndexOf(" "); 

Then the substring that begins one space later is assigned to the new string, s2:

string s2 = s1.Substring(ix+1); 

This extracts from x1+1 to the end of the line, assigning to s2 the value Four.

The next step is to remove the word Four from s1. You can do this by assigning to s1 the substring of s1, which begins at 0 and ends at ix:

s1 = s1.Substring(0,ix);           

Reassign ix to the last (remaining) space, which points you to the beginning of the word Three, which we then extract into string s3. Continue like this until s4 and s5 are populated. Finally, print the results:

s2: Four s3: Three s4: Two s5: One s1: One

This isn't elegant, but it works and it illustrates the use of Substring. This is not unlike using pointer arithmetic in C++, but without the pointers and unsafe code.

10.1.5 Splitting Strings

A more effective solution to the problem illustrated in Example 10-2 is to use the Split( ) method of String, whose job is to parse a string into substrings. To use Split( ), pass in an array of delimiters (characters that will indicate a split in the words), and the method returns an array of substrings. Example 10-3 illustrates.

Example 10-3. Using the Split( ) method
namespace Programming_CSharp {    using System;    using System.Text;         public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One,Two,Three Liberty Associates, Inc.";           // constants for the space and comma characters          const char Space = ' ';          const char Comma = ',';               // array of delimiters to split the sentence with          char[] delimiters = new char[]              {                Space,                Comma             };          string output = "";          int ctr = 1;          // split the string and then iterate over the          // resulting array of strings          foreach (string subString in s1.Split(delimiters))          {             output += ctr++;             output += ": ";             output += subString;             output += "\n";          }          Console.WriteLine(output);       }    } } Output: 1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.

You start by creating a string to parse:

string s1 = "One,Two,Three Liberty Associates, Inc."; 

The delimiters are set to the space and comma characters. You then call Split( ) on this string, and pass the results to the foreach loop:

foreach (string subString in s1.Split(delimiters))

Start by initializing output to an empty string and then build up the output string in four steps. Concatenate the value of ctr. Next add the colon, then the substring returned by split, then the newline. With each concatenation, a new copy of the string is made, and all four steps are repeated for each substring found by Split( ). This repeated copying of string is terribly inefficient.

The problem is that the string type is not designed for this kind of operation. What you want is to create a new string by appending a formatted string each time through the loop. The class you need is StringBuilder.

10.1.6 Manipulating Dynamic Strings

The System.Text.StringBuilder class is used for creating and modifying strings. Semantically, it is the encapsulation of a constructor for a String. The important members of StringBuilder are summarized in Table 10-2.

Table 10-2. StringBuilder methods

Method

Explanation

Capacity

Retrieves or assigns the number of characters the StringBuilder is capable of holding.

Chars

The indexer.

Length

Retrieves or assigns the length of the StringBuilder.

MaxCapacity

Retrieves the maximum capacity of the StringBuilder.

Append( )

Overloaded public method that appends a typed object to the end of the current StringBuilder.

AppendFormat( )

Overloaded public method that replaces format specifiers with the formatted value of an object.

EnsureCapacity( )

Ensures the current StringBuilder has a capacity at least as large as the specified value.

Insert( )

Overloaded public method that inserts an object at the specified position.

Remove( )

Removes the specified characters.

Replace( )

Overloaded public method that replaces all instances of specified characters with new characters.

Unlike String, StringBuilder is mutable; when you modify a StringBuilder, you modify the actual string, not a copy. Example 10-4 replaces the String object in Example 10-3 with a StringBuilder object.

Example 10-4. Using a StringBuilder
namespace Programming_CSharp {    using System;    using System.Text;         public class StringTester    {       static void Main( )       {          // create some strings to work with          string s1 = "One,Two,Three Liberty Associates, Inc.";           // constants for the space and comma characters          const char Space = ' ';          const char Comma = ',';               // array of delimiters to split the sentence with          char[] delimiters = new char[]           {                Space,                Comma          };          // use a StringBuilder class to build the          // output string          StringBuilder output = new StringBuilder( );          int ctr = 1;          // split the string and then iterate over the          // resulting array of strings          foreach (string subString in s1.Split(delimiters))          {             // AppendFormat appends a formatted string             output.AppendFormat("{0}: {1}\n",ctr++,subString);                      }          Console.WriteLine(output);       }    } }

Only the last part of the program is modified. Rather than using the concatenation operator to modify the string, use the AppendFormat( ) method of StringBuilder to append new, formatted strings as you create them. This is much easier and far more efficient. The output is identical:

1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.

Delimiter Limitations

Because you passed in delimiters of both comma and space, the space after the comma between "Associates" and "Inc." is returned as a word, numbered 6 as shown. That is not what you want. To eliminate this you need to tell split to match a comma (as between One, Two, and Three), or a space (as between Liberty and Associates), or a comma followed by a space. It is that last bit that is tricky and requires that you use a regular expression.



Programming C#
C# Programming: From Problem Analysis to Program Design
ISBN: 1423901460
EAN: 2147483647
Year: 2003
Pages: 182
Authors: Barbara Doyle

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net