Recipe12.28.Compressing and Decompressing Your Files

Recipe 12.28. Compressing and Decompressing Your Files

Problem

You need a way to compress the data you write to a file using one of the stream-based classes. In addition, you need a way to decompress the data from this compressed file when you read it back in.

Solution

Use the System.IO.Compression.DeflateStream or the System.IO.Compression. GZipStream classes to read and write compressed data to a file. The CompressFile, DeCompressFile, and DeCompress methods shown in Example 12-16 demonstrate how to use these classes to compress and expand data on the fly.

Example 12-16. The CompressFile, DeCompressFile, and DeCompress methods

 public static void CompressFile(Stream strm, byte[] data,                                 CompressionType compressionType) {     // Determine how to compress the file.     Stream deflate = null;     if (compressionType == CompressionType.Deflate)     {         using (deflate = new DeflateStream(strm, CompressionMode.Compress))         {             // Write compressed data to the file.             deflate.Write(data, 0, data.Length);         }     }     else     {         using (deflate = new GZipStream(strm, CompressionMode.Compress))         {             // Write compressed data to the file.             deflate.Write(data, 0, data.Length);         }     } } public static byte[] DeCompressFile(Stream strm,                                     CompressionType compressionType) {     // Determine how to decompress the file.     Stream reInflate = null;     if (compressionType == CompressionType.Deflate)     {         using (reInflate = new DeflateStream(strm, CompressionMode.Decompress))         {             return (Decompress(reInflate));         }     }     else     {         using (reInflate = new GZipStream(strm, CompressionMode.Decompress))         {             return (Decompress(reInflate));         }     } } public static byte[] Decompress(Stream reInflate) {     List<byte> data = new List<byte>();     int retVal = 0;     // Read all data in and uncompress it.     while (retVal >= 0)     {         retVal = reInflate.ReadByte();         if (retVal != -1)             data.Add((byte)retVal);     }     return (data.ToArray()); }

The CompressionType enumeration is defined as follows:

 public enum CompressionType {     Deflate,     GZip }

Discussion

The CompressFile method accepts a Stream object, data in the form of a byte array, and a CompressionType enumeration value indicating which type of compression algorithm to use (Deflate or GZip). This method produces a file containing the compressed data.

The DeCompressFile method accepts a Stream object and a CompressionType enumeration value indicating which type of decompression algorithm to use (Deflate or GZip). This method calls the Decompress method, which reads from a compressed file and places the data, uncompressed and in the form of bytes, into a generic List<byte> collection object. This collection object is then converted to a byte[] and returned with the data to the calling method.

The TestCompressNewFile method shown in Example 12-17 exercises the CompressFile and DeCompressFile methods defined in the Solution section of this recipe. It also uses another method, NormalFile (shown first), that creates an uncompressed file to show how the file sizes differ.

Example 12-17. Using the CompressFile and DecompressFile methods

 // Method to write out an uncompressed file to compare sizes public static void NormalFile(Stream strm, byte[] data) {     BinaryWriter normal = new BinaryWriter(strm);     normal.Write(data);     normal.Close(); } public static void TestCompressNewFile() {     byte[] data = new byte[10000000];     for (int i = 0; i < 10000000; i++)         data[i] = 254;     using (FileStream fs = new FileStream(@"C:\NewNormalFile.txt",            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))         NormalFile(fs, data);     using (FileStream fs = new FileStream(@"C:\NewCompressedFile.txt",            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))         CompressFile(fs, data, CompressionType.Deflate);     using (FileStream fs = new FileStream(@"C:\NewCompressedFile.txt",            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))     {         byte[] retData = DeCompressFile(fs, CompressionType.Deflate);         Console.WriteLine("Deflated file bytes count == " + retData.Length);     }     using (FileStream fs = new FileStream(@"C:\NewGZCompressedFile.txt",            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))         CompressFile(fs, data, CompressionType.GZip);     using (FileStream fs = new FileStream(@"C:\NewGzCompressedFile.txt",            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))     {         byte[] retData = DeCompressFile(fs, CompressionType.GZip);         Console.WriteLine("GZipped file bytes count == " + retData.Length);     } }

When this test code is run, we get three files with different sizes. The first file, NewNormalFile.txt, is 10,000,000 bytes in size. The NewCompressedFile.txt file is 85,095 bytes. The final file, NewGzCompressedFile.txt file is 85,113 bytes. As you can see, there is not much difference between the sizes for the files compressed with the DeflateStream class and the GZipStream class. The reason for this is that both compression classes use the same compression/decompression algorithm (i.e., the lossless Deflate algorithm as described in the RFC 1951: Deflate 1.3 specification).

You may be wondering why you would pick one class over the other if they use the same algorithm. There is one good reason; the GZipStream class adds a CRC check to the file to determine if it has been corrupted. If the file has been corrupted, an InvalidDataException is thrown with the statement "The CRC in GZip footer does not match the CRC calculated from the decompressed data." By catching this exception, you can determine if your data is corrupted.