(Optional) The zlib module provides support for "zlib" compression. (This compression method is also known as "deflate.")
Example 2-43 shows how the compress and decompress functions take string arguments.
Example 2-43. Using the zlib Module to Compress a String
File: zlib-example-1.py import zlib MESSAGE = "life of brian" compressed_message = zlib.compress(MESSAGE) decompressed_message = zlib.decompress(compressed_message) print "original:", repr(MESSAGE) print "compressed message:", repr(compressed_message) print "decompressed message:", repr(decompressed_message) original: 'life of brian' compressed message: 'x234313311LKU310OSH*312L314 03 00! 10 04302' decompressed message: 'life of brian'
The compression rate varies a lot, depending on the contents of the file, as you can see in Example 2-44.
Example 2-44. Using the zlib Module to Compress a Group of Files
File: zlib-example-2.py import zlib import glob for file in glob.glob("samples/*"): indata = open(file, "rb").read() outdata = zlib.compress(indata, zlib.Z_BEST_COMPRESSION) print file, len(indata), "=>", len(outdata), print "%d%%" % (len(outdata) * 100 / len(indata)) samplessample.au 1676 => 1109 66% samplessample.gz 42 => 51 121% samplessample.htm 186 => 135 72% samplessample.ini 246 => 190 77% samplessample.jpg 4762 => 4632 97% samplessample.msg 450 => 275 61% samplessample.sgm 430 => 321 74% samplessample.tar 10240 => 125 1% samplessample.tgz 155 => 159 102% samplessample.txt 302 => 220 72% samplessample.wav 13260 => 10992 82%
You can also compress or decompress data on the fly, which Example 2-45 demonstrates.
Example 2-45. Using the zlib Module to Decompress Streams
File: zlib-example-3.py import zlib encoder = zlib.compressobj() data = encoder.compress("life") data = data + encoder.compress(" of ") data = data + encoder.compress("brian") data = data + encoder.flush() print repr(data) print repr(zlib.decompress(data)) 'x234313311LKU310OSH*312L314 03 00! 10 04302' 'life of brian'
Example 2-46 shows how to make it a bit more convenient to read a compressed file, by wrapping a decoder object in a file-like wrapper.
Example 2-46. Emulating a File Object for Compressed Streams
File: zlib-example-4.py import zlib import string, StringIO class ZipInputStream: def _ _init_ _(self, file): self.file = file self._ _rewind() def _ _rewind(self): self.zip = zlib.decompressobj() self.pos = 0 # position in zipped stream self.offset = 0 # position in unzipped stream self.data = "" def _ _fill(self, bytes): if self.zip: # read until we have enough bytes in the buffer while not bytes or len(self.data) < bytes: self.file.seek(self.pos) data = self.file.read(16384) if not data: self.data = self.data + self.zip.flush() self.zip = None # no more data break self.pos = self.pos + len(data) self.data = self.data + self.zip.decompress(data) def seek(self, offset, whence=0): if whence == 0: position = offset elif whence == 1: position = self.offset + offset else: raise IOError, "Illegal argument" if position < self.offset: raise IOError, "Cannot seek backwards" # skip forward, in 16k blocks while position > self.offset: if not self.read(min(position - self.offset, 16384)): break def tell(self): return self.offset def read(self, bytes = 0): self._ _fill(bytes) if bytes: data = self.data[:bytes] self.data = self.data[bytes:] else: data = self.data self.data = "" self.offset = self.offset + len(data) return data def readline(self): # make sure we have an entire line while self.zip and " " not in self.data: self._ _fill(len(self.data) + 512) i = string.find(self.data, " ") + 1 if i <= 0: return self.read() return self.read(i) def readlines(self): lines = [] while 1: s = self.readline() if not s: break lines.append(s) return lines # # try it out data = open("samples/sample.txt").read() data = zlib.compress(data) file = ZipInputStream(StringIO.StringIO(data)) for line in file.readlines(): print line[:-1] We will perhaps eventually be writing only small modules which are identified by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language. -- Donald E. Knuth, December 1974
Core Modules
More Standard Modules
Threads and Processes
Data Representation
File Formats
Mail and News Message Processing
Network Protocols
Internationalization
Multimedia Modules
Data Storage
Tools and Utilities
Platform-Specific Modules
Implementation Support Modules
Other Modules