You want to remove all extra whitespace characters from a string, leaving a single space character between each word.
Sample code folder: Chapter 05\RemoveWhitespace
There are several possible ways to remove extra whitespace from a string. One approach, presented here, is to test each character of the string to see if it is whitespace and to build up the resulting string using a StringBuilder:
Dim source As String = _ Space(17) & "This string had " & Chr(12) & _ StrDup(5, Chr(9)) & "extra whitespace. " & Space(27) Dim thisIsWhiteSpace As Boolean Dim prevIsWhiteSpace As Boolean Dim result As New System.Text.StringBuilder(source.Length) Dim counter As Integer For counter = 0 To source.Length - 1 prevIsWhiteSpace = thisIsWhiteSpace thisIsWhiteSpace = _ Char.IsWhiteSpace(source.Chars(counter)) If (thisIsWhiteSpace = False) Then If (prevIsWhiteSpace = True) AndAlso _ (result.Length > 0) Then result.Append(Space(1)) result.Append(source.Chars(counter)) End If Next counter MsgBox("<" & result.ToString( ) & ">")
The previous code first builds a test string comprised of words separated by extra spaces, tabs, and other whitespace characters. After processing to replace runs of whitespace characters with single spaces, the resulting string is displayed for inspection, as shown in Figure 5-12.
Figure 5-12. The test string after zapping extra whitespace characters
Another straightforward approach to removing extra whitespace is to use a series of Replace() functions, first to replace tabs and other whitespace characters with spaces, and finally to replace multiple spaces with single ones. This will work fine, but the disadvantage is that many temporary strings are built in memory as the immutable strings are processed. The code presented here moves each character in memory only once, or not at all if the character is an extra whitespace.
Another good approach is to use regular expressions to grab an array of the words and then piece them back together with single spaces using a StringBuilder.
Recipe 5.42 shows how to use regular expressions to attack the multiwhitespace problem.