Recipe 2.11. Archiving a Tree of Files into a Compressed tar FileCredit: Ed Gordon, Ravi Teja Bhupatiraju ProblemYou need to archive all of the files and folders in a subtree into a tar archive file, compressing the data with either the popular gzip approach or the higher-compressing bzip2 approach. SolutionThe Python Standard Library's tarfile module directly supports either kind of compression: you just need to specify the kind of compression you require, as part of the option string that you pass when you call tarfile.TarFile.open to create the archive file. For example: import tarfile, os def make_tar(folder_to_backup, dest_folder, compression='bz2'): if compression: dest_ext = '.' + compression else: dest_ext = '' arcname = os.path.basename(folder_to_backup) dest_name = '%s.tar%s' % (arcname, dest_ext) dest_path = os.path.join(dest_folder, dest_name) if compression: dest_cmp = ':' + compression else: dest_cmp = '' out = tarfile.TarFile.open(dest_path, 'w'+dest_cmp) out.add(folder_to_backup, arcname) out.close( ) return dest_path DiscussionYou can pass, as argument compression to function make_tar, the string 'gz' to get gzip compression instead of the default bzip2, or you can pass the empty string '' to get no compression at all. Besides making the file extension of the result either .tar, .tar.gz, or .tar.bz2, as appropriate, your choice for the compression argument determines which string is passed as the second argument to tarfile.TarFile.open: 'w', when you want no compression, or 'w:gz' or 'w:bz2' to get two kinds of compression. Class tarfile.TarFile offers several other classmethods, besides open, which you could use to generate a suitable instance. I find open handier and more flexible because it takes the compression information as part of the mode string argument. However, if you want to ensure bzip2 compression is used unconditionally, for example, you could choose to call classmethod bz2open instead. Once we have an instance of class tarfile.TarFile that is set to use the kind of compression we desire, the instance's method add does all we require. In particular, when string folder_to_backup names a "directory" (or folder), rather than an ordinary file, add recursively adds all of the subtree rooted in that directory. If on some other occasion, we wanted to change this behavior to get precise control on what is archived, we could pass to add an additional named argument recursive=False to switch off this implicit recursion. After calling add, all that's left for function make_tar to do is to close the TarFile instance and return the path on which the tar file has been written, just in case the caller needs this information. See AlsoLibrary Reference docs on module tarfile. |