Recipe 5.14. Removing Extra Whitespace


Problem

You want to remove all extra whitespace characters from a string, leaving a single space character between each word.

Solution

Sample code folder: Chapter 05\RemoveWhitespace

There are several possible ways to remove extra whitespace from a string. One approach, presented here, is to test each character of the string to see if it is whitespace and to build up the resulting string using a StringBuilder:

 Dim source As String = _    Space(17) & "This string had " & Chr(12) & _    StrDup(5, Chr(9)) & "extra whitespace. " & Space(27) Dim thisIsWhiteSpace As Boolean Dim prevIsWhiteSpace As Boolean Dim result As New System.Text.StringBuilder(source.Length) Dim counter As Integer For counter = 0 To source.Length - 1    prevIsWhiteSpace = thisIsWhiteSpace    thisIsWhiteSpace = _       Char.IsWhiteSpace(source.Chars(counter))    If (thisIsWhiteSpace = False) Then       If (prevIsWhiteSpace = True) AndAlso _          (result.Length > 0) Then result.Append(Space(1))       result.Append(source.Chars(counter))    End If Next counter MsgBox("<" & result.ToString( ) & ">") 

Discussion

The previous code first builds a test string comprised of words separated by extra spaces, tabs, and other whitespace characters. After processing to replace runs of whitespace characters with single spaces, the resulting string is displayed for inspection, as shown in Figure 5-12.

Figure 5-12. The test string after zapping extra whitespace characters


Another straightforward approach to removing extra whitespace is to use a series of Replace() functions, first to replace tabs and other whitespace characters with spaces, and finally to replace multiple spaces with single ones. This will work fine, but the disadvantage is that many temporary strings are built in memory as the immutable strings are processed. The code presented here moves each character in memory only once, or not at all if the character is an extra whitespace.

Another good approach is to use regular expressions to grab an array of the words and then piece them back together with single spaces using a StringBuilder.

See Also

Recipe 5.42 shows how to use regular expressions to attack the multiwhitespace problem.




Visual Basic 2005 Cookbook(c) Solutions for VB 2005 Programmers
Visual Basic 2005 Cookbook: Solutions for VB 2005 Programmers (Cookbooks (OReilly))
ISBN: 0596101775
EAN: 2147483647
Year: 2006
Pages: 400

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net