Monitoring Download Progress

One potential weakness in the examples presented so far in this chapter is that there hasn't been a way to monitor a download in progress. Sure, it's nice that a Deferred will pass you the results of a page once it's completely downloaded, but sometimes what you really need is to keep an eye on the download as it's happening.

3.5.1. How Do I Do That?

Again, the utility functions provided by twisted.web.client don't give you quite enough control. Define a subclass of client.HTTPDownloader, the factory class used for downloading a web page to a file. By overriding a couple of methods, you can keep track of a download in progress. The webdownload.py script in Example 3-6 shows how.

Example 3-6. webdownload.py


from twisted.web import client



class HTTPProgressDownloader(client.HTTPDownloader):



 def gotHeaders(self, headers):

 if self.status == '200': # page data is on the way

 if headers.has_key('content-length'):

 self.totalLength = int(headers['content-length'][0])

 else:

 self.totalLength = 0

 self.currentLength = 0.0

 print ''

 return client.HTTPDownloader.gotHeaders(self, headers)



 def pagePart(self, data):

 if self.status == '200':

 self.currentLength += len(data)

 if self.totalLength:

 percent = "%i%%" % (

 (self.currentLength/self.totalLength)*100)

 else:

 percent = '%dK' % (self.currentLength/1000)

 print "33[1FProgress: " + percent

 return client.HTTPDownloader.pagePart(self, data)



def downloadWithProgress(url, file, contextFactory=None, *args, **kwargs):

 scheme, host, port, path = client._parse(url)

 factory = HTTPProgressDownloader(url, file, *args, **kwargs)

 if scheme == 'https':

 from twisted.internet import ssl

 if contextFactory is None:

 contextFactory = ssl.ClientContextFactory( )

 reactor.connectSSL(host, port, factory, contextFactory)

 else:

 reactor.connectTCP(host, port, factory)

 return factory.deferred



if __name__ == "_ _main_ _":

 import sys

 from twisted.internet import reactor



 def downloadComplete(result):

 print "Download Complete."

 reactor.stop( )



 def downloadError(failure):

 print "Error:", failure.getErrorMessage( )

 reactor.stop( )



 url, outputFile = sys.argv[1:]

 downloadWithProgress(url, outputFile).addCallback(

 downloadComplete).addErrback(

 downloadError)

 reactor.run( )

Run webdownload.py with two arguments: the URL of a page to download and a filename in which to save it. As the command works, it will print updates on the download progress:


 $ python webdownload.py http://www.oreilly.com/ oreilly.html

 Progress: 100% <- updated during the download

 Download Complete.

If the web server doesn't return a Content-Length header indicating the total length of the download, it isn't possible to calculate the percentage complete. In this case, webdownload.py prints the number of kilobytes downloaded:


 $ python webdownload.py http://www.slashdot.org/ slashdot.html

 Progress: 60K <- updated during the download

 Download Complete.

3.5.2. How Does That Work?

HTTPProgressDownloader is a subclass of client.HTTPDownloader. It overrides the gotHeaders method to check for a Content-Length header that would indicate the total size of the page being downloaded. It also overrides the pagePart method, which is called each time a chunk of page data is received, to keep track of the number of bytes downloaded so far.

Each time a chunk of data comes in, HTTPProgressDownloader prints out a progress report. The string 33[1F is a terminal escape sequence that causes each line of the progress report to be written over the preceding line. This effect makes it look like the progress information is being updated in place.

The downloadWithProgress function contains code similar to that in Example 3-5 for parsing the requested URL, creating the HTTPProgressDownloader factory object, and initializing the connection. downloadComplete and downloadError are simple callback and errback handlers that print a message and stop the reactor.

Getting Started

Building Simple Clients and Servers

Web Clients

Web Servers

Web Services and RPC

Authentication

Mail Clients

Mail Servers

NNTP Clients and Servers

SSH

Services, Processes, and Logging