Recipe 12.10. Compressing and Archiving Files with Gzip and TarProblemYou want to write compressed data to a file to save space, or uncompress the contents of a compressed file. If you're compressing data, you might want to compress multiple files into a single archive file. Solution
The most common compression format on Unix systems is gzip. Ruby's
zlib
library lets you read to and write from
Here's GzipWriter being used to create a compressed file, and GzipReader decompressing the same file:
require '
zlib'
file = 'compressed.gz'
Zlib::GzipWriter.open(file) do gzip
gzip << "For my next trick, I'll be written to a compressed file."
gzip.close
end
open(file, 'rb') { f f.read(10) }
# => "730
DiscussionGzipWriter and GzipReader are most commonly used to write to files on disk, but you can wrap any file-like object in the appropriate class and automatically compress everything you write to it, or decompress everything you read from it. The following code works the same way as the compression code in the Solution, but it's more flexible: the File object that's passed into the Zlib::GzipWriter constructor could just as easily be a Socket or other file-like object.
open('compressed.gz', 'wb') do file
gzip = Zlib::GzipWriter.new(file)
gzip << "For my next trick, I'll be written to a compressed file."
gzip.close
end
If you need to compress or decompress a string, use the Zlib::Deflate or Zlib::Inflate classes rather than constructing a StringI0 object:
deflated = Zlib::Deflate.deflate("I'm a compressed string.")
# => "x43T7UHTH…"
Zlib::Inflate.inflate(deflated)
# => "I'm a compressed string."
Tar filesGzip compresses a single file. What if you want to smash multiple files together into a single archive file? The standard archive format for Unix is tar , and tar files are sometimes called tarballs. A tarball might also be compressed with gzip to save space, but on Unix the archiving and the compression are separate steps (unlike on Windows, where a ZIP file both archives multiple files and compresses them). The Minitar library is the simplest way to create tarballs in pure Ruby. It's available as the archive-tar-minitar gem. [4]
Here's some code that creates a tarball containing two files and a directory. Note the Unix permission modes (0644, 0755, and 0600). These are the permissions the files will have when they're extracted, perhaps by the Unix tar command.
require 'rubygems'
require 'archive/tar/minitar'
open('tarball.tar', 'wb') do f
Archive::Tar::Minitar::Writer.open(f) do w
w.add_file('file1', :mode => 0644, :mtime => Time.now) do stream, io
stream.write('This is file 1.')
end
w.mkdir('subdirectory', :mode => 0755, :mtime => Time.now)
w.add_file('subdirectory/file2', :mode => 0600,
:mtime => Time.now) do stream, io
stream.write('This is file 2.')
end
end
end
Here's a method that reads a tarball and print out its contents:
def browse_tarball(filename)
open(filename, 'rb') do f
Archive::Tar::Minitar::Reader.open(f).each do entry
puts %{I see a file "#{entry.name}" that's #{entry.size} bytes long.}
end
end
end
browse_tarball('tarball.tar')
# I see a file "file1" that's 15 bytes long.
# I see a file "subdirectory" that's 0 bytes long.
# I see a file "subdirectory/file2" that's 15 bytes long.
And here's a simple method for archiving a number of disk files into a compressed tarball. Note how the Minitar Writer is wrapped within a GzipWriter , which automatically compresses the data as it's written. Minitar doesn't have to know about the GzipWriter , because all file-like objects look more or less the same.
def make_tarball(destination, *paths)
Zlib::GzipWriter.open(destination) do gzip
out = Archive::Tar::Minitar::Output.new(gzip)
paths.each do file
puts "Packing #{file}"
Archive::Tar::Minitar.pack_file(file, out)
end
out.close
end
end
This code creates some files and tars them up:
Dir.mkdir('colors')
paths = ['colors/burgundy', 'colors/beige', 'colors/clear']
paths.each do path
open(path, 'w') do f
f.puts %{This is a dummy file.}
end
end
make_tarball('new_tarball.tgz', *paths)
# Packing colors/burgundy
# Packing colors/beige
# Packing colors/clear
# => #<File:new_
tarball.tgz (closed)>
See Also
|