Recipe 5.11. Converting Strings to and from Byte Arrays


Problem

You need to convert a string to bytes, and back to a string from a byte array. This enables you to work with the exact binary data comprising the string.

Solution

Sample code folder: Chapter 05\StringsAndByteArrays

Use shared methods of the System.Text. Encoding object to convert to and from bytes. If you know the string data to be comprised entirely of ASCII characters, use UTF8 encoding to minimize the length of the byte array. Unicode encoding, which results in two bytes per character instead of one, can be used to guarantee no loss of data when making these conversions.

Discussion

The following sample code shows both UTF8 and Unicode encoding methods:

 Dim quote As String = "The important thing is not to " & _    "stop questioning. --Albert Einstein" Dim bytes() As Byte Dim result As String ' ----- Assumed to be all ASCII character. bytes = System.Text.Encoding.UTF8.GetBytes(quote) bytes(46) = 33  ' ASCII exclamation point result = System.Text.Encoding.UTF8.GetString(bytes) MsgBox(result) ' ----- Works with all character sets. bytes = System.Text.Encoding.Unicode.GetBytes(quote) bytes(92) = 63  ' ASCII question mark bytes(93) = 0 result = System.Text.Encoding.Unicode.GetString(bytes) MsgBox(result) 

When using UTF8 encoding, the number of bytes in the array is the same as the number of characters in the string. The character at indexed position 46 in the string is a period. During the first conversion, this period is changed to an exclamation point, and the resulting string is displayed, a result identical to that previously shown in Figure 5-8.

A Unicode-encoded byte array contains twice as many bytes as the number of characters in the original string. This makes sense when you consider that Unicode characters are 16 bits each (or two bytes) in size. Take a close look at the byte array modifications in the second part of the example code. The byte at position 92 (twice as far into the array as the ASCII variation) is set to the desired ASCII value (63 in this case, for the question mark). But because each character now consumes two bytes in the array, you must set both bytes. Setting the byte at position 93 clears the other half of the two-byte set. Figure 5-9 shows the resulting string, now sporting a question mark at the 46th character position.

Figure 5-9. Changing the Unicode character at byte locations 92 and 93 to a question mark





Visual Basic 2005 Cookbook(c) Solutions for VB 2005 Programmers
Visual Basic 2005 Cookbook: Solutions for VB 2005 Programmers (Cookbooks (OReilly))
ISBN: 0596101775
EAN: 2147483647
Year: 2006
Pages: 400

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net