5.4. Searching, Modifying, and Encoding a String s Content

 < Day Day Up > 

5.4. Searching, Modifying, and Encoding a String's Content

This section describes string methods that are used to perform diverse but familiar tasks such as locating a substring within a string, changing the case of a string, replacing or removing text, splitting a string into delimited substrings, and trimming leading and trailing spaces.

Searching the Contents of a String

A string is an implicit zero-based array of chars that can be searched using the array syntax string[n], where n is a character position within the string. For locating a substring of one or more characters in a string, the string class offers the IndexOf and IndexOfAny methods. Table 5-2 summarizes these.

Table 5-2. Ways to Examine Characters Within a String

String Member

Description

[ n ]

Indexes a 16-bit character located at position n within a string.

 int ndx= 0; while (ndx < poem.Length) {    Console.Write(poem[ndx]); //Kubla Khan    ndx += 1; } 

 IndexOf/LastIndexOf (string, [int start], [int count]) 

count. Number of chars to examine.

Returns the index of the first/last occurrence of a specified string within an instance. Returns 1 if no match.

 string poem = "Kubla Khan"; int n = poem.IndexOf("la");  // 3 n = poem.IndexOf('K');       // 0 n = poem.IndexOf('K',4);     // 6 

IndexOfAny/LastIndexOfAny

Returns the index of the first/last character in an array of Unicode characters.

 string poem = "Kubla Khan"; char[] vowels = new char[5]       {'a', 'e', 'i', 'o', 'u'}; n = poem.IndexOfAny(vowels);     // 1 n = poem.LastIndexOfAny(vowels); // 8 n = poem.IndexOfAny(vowels,2);   // 4 


Searching a String That Contains Surrogates

All of these techniques assume that a string consists of a sequence of 16-bit characters. Suppose, however, that your application must work with a Far Eastern character set of 32-bit characters. These are represented in storage as a surrogate pair consisting of a high and low 16-bit value. Clearly, this presents a problem for an expression such as poem[ndx], which would return only half of a surrogate pair.

For applications that must work with surrogates, .NET provides the StringInfo class that treats all characters as text elements and can automatically detect whether a character is 16 bits or a surrogate. Its most important member is the GetTextElementEnumerator method, which returns an enumerator that can be used to iterate through text elements in a string.

 TextElementEnumerator tEnum =        StringInfo.GetTextElementEnumerator(poem) ; while (tEnum.MoveNext())  // Step through the string {    Console.WriteLine(tEnum.Current);  // Print current char } 

Recall from the discussion of enumerators in Chapter 4, "Working with Objects in C#," that MoveNext() and Current are members implemented by all enumerators.

String Transformations

Table 5-3 summarizes the most important string class methods for modifying a string. Because the original string is immutable, any string constructed by these methods is actually a new string with its own allocated memory.

Table 5-3. Methods for Manipulating and Transforming Strings

Tag

Description

Insert (int, string)

Inserts a string at the specified position.

 string mariner = "and he stoppeth three"; string verse = mariner.Insert(       mariner.IndexOf(" three")," one of"); // verse --> "and he stoppeth one of three" 

PadRight/PadLeft

Pads a string with a given character until it is a specified width. If no character is specified, whitespace is used.

 string rem = "and so on"; rem =rem.PadRight(rem.Length+3,'.); // rem --> "and so on..." 

Remove(p , n)

Removes n characters beginning at position p.

 string verse = "It is an Ancient Mariner"; string newverse = (verse.Remove(0,9)); // newverse --> "Ancient Mariner" 

Replace (A , B)

Replaces all occurrences of A with B, where A and B are chars or strings.

 string aString = "nap ace sap path"; string iString = aString.Replace('a','i'); // iString --> "nip ice sip pith" 

Split( char[])

The char array contains delimiters that are used to break a string into substrings that are returned as elements in a string array.

 string words = "red,blue orange "; string [] split = words.Split(new Char []                   {' ', ','}); Console.WriteLine(split[2]); // orange 

 ToUpper() ToUpper(CultureInfo) ToLower() ToLower(CultureInfo) 

Returns an upper- or lowercase copy of the string.

 string poem2="Kubla Khan"; poem2= poem2.ToUpper(       CultureInfo.InvariantCulture); 

 Trim() Trim(params char[]) 

Removes all leading and trailing whitespaces. If a char array is provided, all leading and trailing characters in the array are removed.

 string name = "  Samuel Coleridge"; name = name.Trim(); // "Samuel Coleridge" 

 TrimEnd (params char[]) TrimStart(params char[]) 

Removes all leading or trailing characters specified in a char array. If null is specified, whitespaces are removed.

 string name = "  Samuel Coleridge"; trimName    = name.TrimStart(null); shortname   = name.TrimEnd('e','g','i'); // shortName --> "Samuel Colerid" 

 Substring(n) Substring(n, l) 

Extracts the string beginning at a specified position (n) and of length l, if specified.

 string title="Kubla Khan"; Console.WriteLine(title.Substring(2,3)); //bla 

 ToCharArray() ToCharArray(n, l) 

Extracts characters from a string and places in an array of Unicode characters.

 string myVowels = "aeiou"; char[] vowelArr; vowelArr = myVowels.ToCharArray(); Console.WriteLine(vowelArr[1]);  // "e" 


Most of these methods have analogues in other languages and behave as you would expect. Somewhat surprisingly, as we see in the next section, most of these methods are not available in the StringBuilder class. Only Replace, Remove, and Insert are included.

String Encoding

Encoding comes into play when you need to convert between strings and bytes for operations such as writing a string to a file or streaming it across a network. Character encoding and decoding offer two major benefits: efficiency and interoperability. Most strings read in English consist of characters that can be represented by 8 bits. Encoding can be used to strip an extra byte (from the 16-bit Unicode memory representation) for transmission and storage. The flexibility of encoding is also important in allowing an application to interoperate with legacy data or third-party data encoded in different formats.

The .NET Framework supports many forms of character encoding and decoding. The most frequently used include the following:

  • UTF-8. Each character is encoded as a sequence of 1 to 4 bytes, based on its underlying value. ASCII compatible characters are stored in 1 byte; characters between 0x0080 and 0x07ff are stored in 2 bytes; and characters having a value greater than or equal to 0x0800 are converted to 3 bytes. Surrogates are written as 4 bytes. UTF-8 (which stands for UCS Transformation Format, 8-bit form) is usually the default for .NET classes when no encoding is specified.

  • UTF-16. Each character is encoded as 2 bytes (except surrogates), which is how characters are represented internally in .NET. This is also referred to as Unicode encoding.

  • ASCII. Encodes each character as an 8-bit ASCII character. This should be used when all characters are in the ASCII range (0x00 to 0x7F). Attempting to encode a character outside of the ACII range yields whatever value is in the character's low byte.

Encoding and decoding are performed using the Encoding class found in the System.Text namespace. This abstract class has several static properties that return an object used to implement a specific encoding technique. These properties include ASCII, UTF8, and Unicode. The latter is used for UTF-16 encoding.

An encoding object offers several methods each having several overloads for converting between characters and bytes. Here is an example that illustrates two of the most useful methods: GetBytes, which converts a text string to bytes, and GetString, which reverses the process and converts a byte array to a string.

 string text= "In Xanadu did Kubla Khan"; Encoding UTF8Encoder = Encoding.UTF8; byte[] textChars = UTF8Encoder.GetBytes(text); Console.WriteLine(textChars.Length);         // 24 // Store using UTF-16 textChars = Encoding.Unicode.GetBytes(text); Console.WriteLine(textChars.Length);         // 48 // Treat characters as two bytes string decodedText = Encoding.Unicode.GetString(textChars); Console.WriteLine(decodedText); // "In Xanadu did ...  " 

You can also instantiate the encoding objects directly. In this example, the UTF-8 object could be created with

 UTF8Encoding UTF8Encoder = new UTF8Encoding(); 

With the exception of ASCIIEncoding, the constructor for these classes defines parameters that allow more control over the encoding process. For example, you can specify whether an exception is thrown when invalid encoding is detected.

     < Day Day Up > 


    Core C# and  .NET
    Core C# and .NET
    ISBN: 131472275
    EAN: N/A
    Year: 2005
    Pages: 219

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net