4.24 Computing Directory Sizes in a Cross-Platform Way


Credit: Frank Fejes

4.24.1 Problem

You need to compute the total size of a directory (or set of directories) in a way that works under both Windows and Unix-like platforms.

4.24.2 Solution

There are easier platform-dependent solutions, such as Unix's du, but Python also makes it quite feasible to have a cross-platform solution:

import os from os.path import * class DirSizeError(Exception): pass def dir_size(start, follow_links=0, start_depth=0, max_depth=0, skip_errs=0):     # Get a list of all names of files and subdirectories in directory start     try: dir_list = os.listdir(start)     except:         # If start is a directory, we probably have permission problems         if os.path.isdir(start):             raise DirSizeError('Cannot list directory %s'%start)         else:  # otherwise, just re-raise the error so that it propagates             raise     total = 0L     for item in dir_list:         # Get statistics on each item--file and subdirectory--of start         path = join(start, item)         try: stats = os.stat(path)         except:              if not skip_errs:                 raise DirSizeError('Cannot stat %s'%path)         # The size in bytes is in the seventh item of the stats tuple, so:         total += stats[6]         # recursive descent if warranted         if isdir(path) and (follow_links or not islink(path)):             bytes = dir_size(path, follow_links, start_depth+1, max_depth)             total += bytes             if max_depth and (start_depth < max_depth):                 print_path(path, bytes)     return total def print_path(path, bytes, units='b'):     if units == 'k':         print '%-8ld%s' % (bytes / 1024, path)     elif units == 'm':         print '%-5ld%s' % (bytes / 1024 / 1024, path)     else:         print '%-11ld%s' % (bytes, path) def usage (name):     print "usage: %s [-bkLm] [-d depth] directory [directory...]" % name     print '\t-b\t\tDisplay in Bytes (default)'     print '\t-k\t\tDisplay in Kilobytes'     print '\t-m\t\tDisplay in Megabytes'     print '\t-L\t\tFollow symbolic links (meaningful on Unix only)'     print '\t-d, --depth\t# of directories down to print (default = 0)' if _ _name_ _=='_ _main_ _':     # When used as a script:     import string, sys, getopt     units = 'b'     follow_links = 0     depth = 0     try:         opts, args = getopt.getopt(sys.argv[1:], "bkLmd:", ["depth="])     except getopt.GetoptError:         usage(sys.argv[0])         sys.exit(1)     for o, a in opts:         if o == '-b': units = 'b'         elif o == '-k': units = 'k'         elif o == '-L': follow_links = 1         elif o == '-m': units = 'm'         elif o in ('-d', '--depth'):             try: depth = int(a)             except:                 print "Not a valid integer: (%s)" % a                 usage(sys.argv[0])                 sys.exit(1)     if len(args) < 1:         print "No directories specified"         usage(sys.argv[0])         sys.exit(1)     else:         paths = args     for path in paths:         try: bytes = dir_size(path, follow_links, 0, depth)         except DirSizeError, x: print "Error:", x         else: print_path(path, bytes)

4.24.3 Discussion

Unix-like platforms have the du command, but that doesn't help when you need to get information about disk-space usage in a cross-platform way. This recipe has been tested under both Windows and Unix, although it is most useful under Windows, where the normal way of getting this information requires using a GUI. In any case, the recipe's code can be used both as a module (in which case you'll normally call only the dir_size function) or as a command-line script. Typical use as a script is:

C:\> python dir_size.py "c:\Program Files"

This will give you some idea of where all your disk space has gone. To help you narrow the search, you can, for example, display each subdirectory:

C:\> python dir_size.py --depth=1 "c:\Program Files"

The recipe's operation is based on recursive descent. os.listdir provides a list of names of all the files and subdirectories of a given directory. If dir_size finds a subdirectory, it calls itself recursively. An alternative architecture might be based on os.path.walk, which handles the recursion on our behalf and just does callbacks to a function we specify, for each subdirectory it visits. However, here we need to be able to control the depth of descent (e.g., to allow the useful --depth command-line option, which turns into the max_depth argument of the dir_size function). This control is easier to attain when we administer the recursion directly, rather than letting os.path.walk handle it on our behalf.

4.24.4 See Also

Documentation for the os.path and getopt modules in the Library Reference.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2005
Pages: 346

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net