Recipe2.9.Reading Data from zip Files


Recipe 2.9. Reading Data from zip Files

Credit: Paul Prescod, Alex Martelli

Problem

You want to directly examine some or all of the files contained in an archive in zip format, without expanding them on disk.

Solution

zip files are a popular, cross-platform way of archiving files. The Python Standard Library comes with a zipfile module to access such files easily:

import zipfile z = zipfile.ZipFile("zipfile.zip", "r") for filename in z.namelist( ):     print 'File:', filename,     bytes = z.read(filename)     print 'has', len(bytes), 'bytes'

Discussion

Python can work directly with data in zip files. You can look at the list of items in the archive's directory and work with the "data file"s themselves. This recipe is a snippet that lists all of the names and content lengths of the files included in the zip archive zipfile.zip.

The zipfile module does not currently handle multidisk zip files nor zip files with appended comments. Take care to use r as the flag argument, not rb, which might seem more natural (e.g., on Windows). With ZipFile, the flag is not used the same way when opening a file, and rb is not recognized. The r flag handles the inherently binary nature of all zip files on all platforms.

When a zip file contains some Python modules (meaning .py or preferably .pyc files), possibly in addition to other (data) files, you can add the file's path to Python's sys.path and then use the import statement to import modules from the zip file. Here's a toy, self-contained, purely demonstrative example that creates such a zip file on the fly, imports a module from it, then removes itall just to show you how it's done:

import zipfile, tempfile, os, sys handle, filename = tempfile.mkstemp('.zip') os.close(handle) z = zipfile.ZipFile(filename, 'w') z.writestr('hello.py', 'def f( ): return "hello world from "+_ _file_ _\n') z.close( ) sys.path.insert(0, filename) import hello print hello.f( ) os.unlink(filename)

Running this script emits something like:

hello world from /tmp/tmpESVzeY.zip/hello.py

Besides illustrating Python's ability to import from a zip file, this snippet also shows how to make (and later remove) a temporary file, and how to use the writestr method to add a member to a zip file without placing that member into a disk file first.

Note that the path to the zip file from which you import is treated somewhat like a directory. (In this specific example run, that path is /tmp/tmpESVzeY.zip, but of course, since we're dealing with a temporary file, the exact value of the path can change at each run, depending also on your platform.) In particular, the _ _file_ _ global variable, within the module hello, which is imported from the zip file, has a value of /tmp/tmpESVzeY.zip/hello.pya pseudo-path, made up of the zip file's path seen as a "directory" followed by the relative path of hello.py within the zip file. If you import from a zip file a module that computes paths relative to itself in order to get to data files, you need to adapt the module to this effect, because you cannot just open such a "pseudo-path" to get a file object: rather, to read or write files inside a zip file, you must use functions from standard library module zipfile, as shown in the solution.

For more information about importing modules from a zip file, see Recipe 16.12. While that recipe is Unix-specific, the information in the recipe's Discussion about importing from zip files is also valid for Windows.

See Also

Documentation for the zipfile module in the Library Reference and Python in a Nutshell; modules tempfile, os, sys; for archiving a tree of files, see Recipe 2.11; for more information about importing modules from a zip file, Recipe 16.12.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net