Recipe2.16.Walking Directory Trees


Recipe 2.16. Walking Directory Trees

Credit: Robin Parmar, Alex Martelli

Problem

You need to examine a "directory", or an entire directory tree rooted in a certain directory, and iterate on the files (and optionally folders) that match certain patterns.

Solution

The generator os.walk from the Python Standard Library module os is sufficient for this task, but we can dress it up a bit by coding our own function to wrap os.walk:

import os, fnmatch def all_files(root, patterns='*', single_level=False, yield_folders=False):     # Expand patterns from semicolon-separated string to list     patterns = patterns.split(';')     for path, subdirs, files in os.walk(root):         if yield_folders:             files.extend(subdirs)         files.sort( )         for name in files:             for pattern in patterns:                 if fnmatch.fnmatch(name, pattern):                     yield os.path.join(path, name)                     break         if single_level:             break

Discussion

The standard directory tree traversal generator os.walk is powerful, simple, and flexible. However, as it stands, os.walk lacks a few niceties that applications may need, such as selecting files according to some patterns, flat (linear) looping on all files (and optionally folders) in sorted order, and the ability to examine a single directory (without entering its subdirectories). This recipe shows how easily these kinds of features can be added, by wrapping os.walk into another simple generator and using standard library module fnmatch to check filenames for matches to patterns.

The file patterns are possibly case-insensitive (that's platform-dependent) but otherwise Unix-style, as supplied by the standard fnmatch module, which this recipe uses. To specify multiple patterns, join them with a semicolon. Note that this means that semicolons themselves can't be part of a pattern.

For example, you can easily get a list of all Python and HTML files in directory /tmp or any subdirectory thereof:

thefiles = list(all_files('/tmp', '*.py;*.htm;*.html'))

Should you just want to process these files' paths one at a time (e.g., print them, one per line), you do not need to build a list: you can simply loop on the result of calling all_files:

for path in all_files('/tmp', '*.py;*.htm;*.html'):     print path

If your platform is case-sensitive, alnd you want case-sensitive matching, then you need to specify the patterns more laboriously, e.g., '*.[Hh][Tt][Mm][Ll]' instead of just '*.html'.

See Also

Documentation for the os.path module and the os.walk generator, as well as the fnmatch module, in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net