Compressing and Archiving Files with Gzip and Tar

Problem

You want to write compressed data to a file to save space, or uncompress the contents of a compressed file. If you e compressing data, you might want to compress multiple files into a single archive file.

Solution

The most common compression format on Unix systems is gzip. Rubys zlib library lets you read to and write from gzipped I/O streams as though they were normal files. The most useful classes in this library are GzipWriter and GzipReader.[3]

[3] The compressed strings in these examples are actually larger than the originals. This is because I used very short strings to save space in the book, and short strings don compress well. Any compression technique introduces some overhead; with gzip, you don actually save any space by compressing a text string of less than about 100 bytes.

Heres GzipWriter being used to create a compressed file, and GzipReader decompressing the same file:

	require  
zlib

	file = compressed.gz
	Zlib::GzipWriter.open(file) do |gzip|
	 gzip << "For my next trick, Ill be written to a compressed file."
	 gzip.close
	end

	open(file, 
b) { |f| f.read(10) }
	# => "3721310002012766D0003"

	Zlib::GzipReader.open(file) { |gzip| gzip.read }
	# => "For my next trick, Ill be written to a compressed file."

Discussion

GzipWriter and GzipReader are most commonly used to write to files on disk, but you can wrap any file-like object in the appropriate class and automatically compress everything you write to it, or decompress everything you read from it.

The following code works the same way as the compression code in the Solution, but its more flexible: the File object thats passed into the Zlib::GzipWriter constructor could just as easily be a Socket or other file-like object.

	open(compressed.gz, wb) do |file|
	 gzip = Zlib::GzipWriter.new(file)
	 gzip << "For my next trick, Ill be written to a compressed file."
	 gzip.close
	end

If you need to compress or decompress a string, use the Zlib::Deflate or Zlib::Inflate classes rather than constructing a StringI0 object:

	deflated = Zlib::Deflate.deflate("Im a compressed string.")
	# => "x234363T317UHTH…"
	Zlib::Inflate.inflate(deflated)
	# => "Im a compressed string."

Tar files

Gzip compresses a single file. What if you want to smash multiple files together into a single archive file? The standard archive format for Unix is tar, and tar files are sometimes called tarballs. A tarball might also be compressed with gzip to save space, but on Unix the archiving and the compression are separate steps (unlike on Windows, where a ZIP file both archives multiple files and compresses them).

The Minitar library is the simplest way to create tarballs in pure Ruby. Its available as the archive-tar-minitar gem.[4]

[4] The RubyGems package defines the Gem::Package::TarWriter and Gem::Package::TarReader classes, which expose an interface similar to Minitars. You can use these classes if you e fanatical about minimizing your dependencies, but I don recommend it. These classes only implement the bare-bones functionality necessary to pack and unpack gem-like tarballs, and they also make your code look like it has something to do with RubyGems.

Heres some code that creates a tarball containing two files and a directory. Note the Unix permission modes (0644, 0755, and 0600). These are the permissions the files will have when they e extracted, perhaps by the Unix tar command.

	require 
ubygems
	require archive/tar/minitar

	open(	arball.tar, wb) do |f|
	 Archive::Tar::Minitar::Writer.open(f) do |w|


	w.add_file(file1, :mode => 0644, :mtime => Time.now) do |stream, io|
	 stream.write(This is file 1.)
	 end

	w.mkdir(subdirectory, :mode => 0755, :mtime => Time.now)

	w.add_file(subdirectory/file2, :mode => 0600,
	 :mtime => Time.now) do |stream, io|
	 stream.write(This is file 2.)
	 end
	 end
	end

Heres a method that reads a tarball and print out its contents:

	def browse_tarball(filename)
	 open(filename, 
b) do |f|
	 Archive::Tar::Minitar::Reader.open(f).each do |entry|
	 puts %{I see a file "#{entry.name}" thats #{entry.size} bytes long.}
	 end
	 end
	end

	browse_tarball(	arball.tar)
	# I see a file "file1" thats 15 bytes long.
	# I see a file "subdirectory" thats 0 bytes long.
	# I see a file "subdirectory/file2" thats 15 bytes long.

And heres a simple method for archiving a number of disk files into a compressed tarball. Note how the Minitar Writer is wrapped within a GzipWriter, which automatically compresses the data as its written. Minitar doesn have to know about the GzipWriter, because all file-like objects look more or less the same.

	def make_tarball(destination, *paths)
	 Zlib::GzipWriter.open(destination) do |gzip|
	 out = Archive::Tar::Minitar::Output.new(gzip)
	 paths.each do |file|
	 puts "Packing #{file}"
	 Archive::Tar::Minitar.pack_file(file, out)
	 end
	 out.close
	 end
	end

This code creates some files and tars them up:

	Dir.mkdir(colors)
	paths = [colors/burgundy, colors/beige, colors/clear]
	paths.each do |path|
	 open(path, w) do |f|
	 f.puts %{This is a dummy file.}
	 end
	end

	make_tarball(
ew_tarball.tgz, *paths)

	# Packing colors/burgundy
	# Packing colors/beige
	# Packing colors/clear
	# => #

See Also

  • On Windows, both compression and archiving are usually handled with ZIP files; see the next recipe, Recipe 12.11, "Reading and Writing ZIP Files," for details
  • Recipe 14.3, "Customizing HTTP Request Headers," uses zlib to decompress the gzipped body of a response from a web server


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net