Flylib.com

Books Software

 
 
 

Recipe2.15.Adapting a File-like Object to a True File Object


Recipe 2.15. Adapting a File-like Object to a True File Object

Credit: Michael Kent

Problem

You need to pass a file-like object (e.g., the results of a call such as urllib.urlopen ) to a function or method that insists on receiving a true file object (e.g., a function such as marshal.load ).

Solution

To cooperate with such type-checking, we need to write all data from the file-like object into a temporary file on disk. Then, we can use the (true) file object for that temporary disk file. Here's a function that implements this idea:

import types, tempfile
CHUNK_SIZE = 16 * 1024
def adapt_file(fileObj):
    if isinstance(fileObj, file): return fileObj
    tmpFileObj = tempfile.TemporaryFile
    while True:
        data = fileObj.read(CHUNK_SIZE)
        if not data: break
        tmpFileObj.write(data)
    fileObj.close( )
    tmpFileObj.seek(0)
    return tmpFileObj

Discussion

This recipe demonstrates an unusual Pythonic application of the Adapter Design Pattern (i.e., what to do when you have an X and you need a Y instead). While design patterns are most normally thought of in an object-oriented way, and therefore implemented by writing classes, nothing is intrinsically necessary about that. In this case, for example, we don't really need to introduce any new class, since the adapt_file function is obviously sufficient. Therefore, we respect Occam's Razor and do not introduce entities without necessity.

One way or another, you should think in terms of adaptation, in preference to type testing, even when you need to rely on some lower-level utility that insists on precise types. Instead of raising an exception when you get passed an object that's perfectly adequate save for the technicality of type membership, think of the possibility of adapting what you get passed to what you need. In this way, your code will be more flexible and more suitable for reuse.

See Also

Documentation on built-in file objects, and modules tempfile and marshal , in the Library Reference and Python in a Nutshell .


Recipe 2.16. Walking Directory Trees

Credit: Robin Parmar, Alex Martelli

Problem

You need to examine a "directory", or an entire directory tree rooted in a certain directory, and iterate on the files (and optionally folders) that match certain patterns.

Solution

The generator os.walk from the Python Standard Library module os is sufficient for this task, but we can dress it up a bit by coding our own function to wrap os.walk :

import os, fnmatch
def all_files(root, patterns='*', single_level=False, yield_folders=False):
    # Expand patterns from semicolon-separated string to list
    patterns = patterns.split(';')
    for path, subdirs, files in os.walk(root):
        if yield_folders:
            files.extend(subdirs)
        files.sort( )
        for name in files:
            for pattern in patterns:
                if fnmatch.fnmatch(name, pattern):
                    yield os.path.join(path, name)
                    break
        if single_level:
            break

Discussion

The standard directory tree traversal generator os.walk is powerful, simple, and flexible. However, as it stands, os.walk lacks a few niceties that applications may need, such as selecting files according to some patterns, flat (linear) looping on all files (and optionally folders) in sorted order, and the ability to examine a single directory (without entering its subdirectories). This recipe shows how easily these kinds of features can be added, by wrapping os.walk into another simple generator and using standard library module fnmatch to check filenames for matches to patterns.

The file patterns are possibly case-insensitive (that's platform-dependent) but otherwise Unix-style, as supplied by the standard fnmatch module, which this recipe uses. To specify multiple patterns, join them with a semicolon. Note that this means that semicolons themselves can't be part of a pattern.

For example, you can easily get a list of all Python and HTML files in directory /tmp or any subdirectory thereof:

thefiles = list(all_files('/tmp', '*.py;*.htm;*.html'))

Should you just want to process these files' paths one at a time (e.g., print them, one per line), you do not need to build a list: you can simply loop on the result of calling all_files :

for path in all_files('/tmp', '*.py;*.htm;*.html'):
    print path

If your platform is case-sensitive, alnd you want case-sensitive matching, then you need to specify the patterns more laboriously, e.g., ' *.[Hh][Tt][Mm][Ll] ' instead of just ' *.html '.

See Also

Documentation for the os.path module and the os.walk generator, as well as the fnmatch module, in the Library Reference and Python in a Nutshell .