Recipe 14.6. Resuming the HTTP Download of a FileCredit: Chris Moffitt ProblemYou need to resume an HTTP download of a file that has been partially transferred. SolutionDownloads of large files are sometimes interrupted. However, a good HTTP server that supports the Range header lets you resume the download from where it was interrupted. The standard Python module urllib lets you access this functionality almost seamlessly: you just have to add the required header and intercept the error code that the server sends to confirm that it will respond with a partial file. Here is a function, with a little helper class, to perform this task: import urllib, os class myURLOpener(urllib.FancyURLopener): """ Subclass to override err 206 (partial file being sent); okay for us """ def http_error_206(self, url, fp, errcode, errmsg, headers, data=None): pass # Ignore the expected "non-error" code def getrest(dlFile, fromUrl, verbose=0): myUrlclass = myURLOpener( ) if os.path.exists(dlFile): outputFile = open(dlFile, "ab") existSize = os.path.getsize(dlFile) # If the file exists, then download only the remainder myUrlclass.addheader("Range","bytes=%s-" % (existSize)) else: outputFile = open(dlFile, "wb") existSize = 0 webPage = myUrlclass.open(fromUrl) if verbose: for k, v in webPage.headers.items( ): print k, "=", v # If we already have the whole file, there is no need to download it again numBytes = 0 webSize = int(webPage.headers['Content-Length']) if webSize == existSize: if verbose: print "File (%s) was already downloaded from URL (%s)" % ( dlFile, fromUrl) else: if verbose: print "Downloading %d more bytes" % (webSize-existSize) while True: data = webPage.read(8192) if not data: break outputFile.write(data) numBytes = numBytes + len(data) webPage.close( ) outputFile.close( ) if verbose: print "downloaded", numBytes, "bytes from", webPage.url return numbytes DiscussionThe HTTP Range header lets the web server know that you want only a certain range of data to be downloaded, and this recipe takes advantage of this header. Of course, the server needs to support the Range header, but since the header is part of the HTTP 1.1 specification, it's widely supported. This recipe has been tested with Apache 1.3 as the server, but I expect no problems with other reasonably modern servers. The recipe lets urllib.FancyURLopener do all the hard work of adding a new header, as well as the normal handshaking. I had to subclass the standard class from urllib only to make it known that the error 206 is not really an error in this caseso you can proceed normally. In the function, I also perform extra checks to quit the download if I've already downloaded the entire file. Check out HTTP 1.1 RFC (2616) to learn more about the meaning of the headers. You may find a header that is especially useful, and Python's urllib lets you send any header you want. See AlsoDocumentation on the urllib standard library module in the Library Reference and Python in a Nutshell; the HTTP 1.1 RFC (http://www.ietf.org/rfc/rfc2616.txt). |