Recipe 16.6. Compressing and Decompressing a String


Problem

You want to compress and later decompress a string to save memory or file space.

Solution

Sample code folder: Chapter 16\ Compression

Use Gzip stream compression and decompression, new in Version 2.0 of the .NET Framework.

Discussion

The System.IO.Compression namespace contains the GZipStream class, which can compress or decompress bytes as they move through the stream. The compression algorithm is similar to the standard ZIP compression found in many programs, providing decent lossless compression at a high speed.

This compression works best on longer strings. In the following sample code, the contents of the workText string are repeated several times in order to build a redundant string resulting in a lot of compression.

The compression and decompression calls are wrapped in the functions StringCompress() and BytesDecompress(), contained in a module named Compress.vb.

The compression function accepts a string and returns a byte array, and the decompression function accepts a byte array and returns a string. The compressed byte array contains just about any and all possible byte values, and keeping this data in the form of a byte array prevents subtle problems from arising when you attempt to convert the array directly to a string:

 Public Function StringCompress( _       ByVal originalText As String) As Byte( )    ' ----- Generate a compressed version of a string.    '       First, convert the string to a byte array.    Dim workBytes( ) As Byte = _       Encoding.UTF8.GetBytes(originalText)    ' ----- Bytes will flow through a memory stream.    Dim memoryStream As New MemoryStream( )    ' ----- Use the newly created memory stream for the    '       compressed data.    Dim zipStream As New GZipStream(memoryStream, _       CompressionMode.Compress, True)    zipStream.Write(workBytes, 0, workBytes.Length)    zipStream.Flush( )    ' ----- Close the compression stream.    zipStream.Close( )    ' ----- Return the compressed bytes.    Return memoryStream.ToArray End Function Public Function BytesDecompress( _       ByVal compressed( ) As Byte) As String    ' ----- Uncompress a previously compressed string.    '       Extract the length for the decompressed string.    Dim lastFour(3) As Byte    Array.Copy(compressed, compressed.Length - 4, _       lastFour, 0, 4)    Dim bufferLength As Integer = _       BitConverter.ToInt32(lastFour, 0)    ' ----- Create an uncompressed bytes buffer.    Dim buffer(bufferLength - 1) As Byte    ' ----- Bytes will flow through a memory stream.    Dim memoryStream As New MemoryStream(compressed)    ' ----- Create the decompression stream.    Dim decompressedStream As New GZipStream( _       memoryStream, CompressionMode.Decompress, True)    ' ----- Read and decompress the data into the buffer.    decompressedStream.Read(buffer, 0, bufferLength)    ' ----- Convert the bytes to a string.    Return Encoding.UTF8.GetString(buffer) End Function 

The following code demonstrates these functions by building a moderately long redundant string, passing it to CompressString(), then passing the compressed byte array back to BytesDecompress() to recover the original string:

 Dim result As New System.Text.StringBuilder Dim workText As String = "" For counter As Integer = 1 To 9    workText &= "This redundant string will be compressed" & _       vbNewLine Next counter Dim compressed( ) As Byte = StringCompress(workText) Dim uncompressed As String = BytesDecompress(compressed) result.AppendLine(workText) result.Append("Original size: ") result.AppendLine(workText.Length) result.AppendLine( ) result.Append("Compressed size: ") result.AppendLine(compressed.Length) result.AppendLine( ) result.AppendLine(uncompressed) result.AppendLine( ) result.Append("Uncompressed size: ") result.Append(uncompressed.Length) MsgBox(result.ToString( )) 

Figure 16-5 displays the original string and its length, followed by the length of the compressed byte array, and finally the resulting decompressed string and its length. Longer strings with redundancies, such as this one, compress better than shorter ones.

See Also

Recipe 16.9 includes the full source code for the Compress module.




Visual Basic 2005 Cookbook(c) Solutions for VB 2005 Programmers
Visual Basic 2005 Cookbook: Solutions for VB 2005 Programmers (Cookbooks (OReilly))
ISBN: 0596101775
EAN: 2147483647
Year: 2006
Pages: 400

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net