Recipe 2.16. Walking Directory TreesCredit: Robin Parmar, Alex Martelli ProblemYou need to examine a "directory", or an entire directory tree rooted in a certain directory, and iterate on the files (and optionally folders) that match certain patterns. SolutionThe generator os.walk from the Python Standard Library module os is sufficient for this task, but we can dress it up a bit by coding our own function to wrap os.walk: import os, fnmatch def all_files(root, patterns='*', single_level=False, yield_folders=False): # Expand patterns from semicolon-separated string to list patterns = patterns.split(';') for path, subdirs, files in os.walk(root): if yield_folders: files.extend(subdirs) files.sort( ) for name in files: for pattern in patterns: if fnmatch.fnmatch(name, pattern): yield os.path.join(path, name) break if single_level: break DiscussionThe standard directory tree traversal generator os.walk is powerful, simple, and flexible. However, as it stands, os.walk lacks a few niceties that applications may need, such as selecting files according to some patterns, flat (linear) looping on all files (and optionally folders) in sorted order, and the ability to examine a single directory (without entering its subdirectories). This recipe shows how easily these kinds of features can be added, by wrapping os.walk into another simple generator and using standard library module fnmatch to check filenames for matches to patterns. The file patterns are possibly case-insensitive (that's platform-dependent) but otherwise Unix-style, as supplied by the standard fnmatch module, which this recipe uses. To specify multiple patterns, join them with a semicolon. Note that this means that semicolons themselves can't be part of a pattern. For example, you can easily get a list of all Python and HTML files in directory /tmp or any subdirectory thereof: thefiles = list(all_files('/tmp', '*.py;*.htm;*.html')) Should you just want to process these files' paths one at a time (e.g., print them, one per line), you do not need to build a list: you can simply loop on the result of calling all_files: for path in all_files('/tmp', '*.py;*.htm;*.html'): print path If your platform is case-sensitive, alnd you want case-sensitive matching, then you need to specify the patterns more laboriously, e.g., '*.[Hh][Tt][Mm][Ll]' instead of just '*.html'. See AlsoDocumentation for the os.path module and the os.walk generator, as well as the fnmatch module, in the Library Reference and Python in a Nutshell. |