Web pages can require authentication. If you're developing an HTTP client application, it's a good idea to be prepared to handle this case, and give the user some way of entering his login name and password.
3.2.1. How Do I Do That?
If an HTTP request fails with a 401 status code, authentication is required. Try the request again, this time passing a user-supplied login and password in an Authorization header, as shown in the script webcat3.py in Example 3-3.
Example 3-3. webcat3.py
from twisted.web import client, error as weberror from twisted.internet import reactor import sys, getpass, base64 def printPage(data): print data reactor.stop( ) def checkHTTPError(failure, url): failure.trap(weberror.Error) if failure.value.status == '401': print >> sys.stderr, failure.getErrorMessage( ) # prompt for user name and password username = raw_input("User name: ") password = getpass.getpass("Password: ") basicAuth = base64.encodestring("%s:%s" % (username, password)) authHeader = "Basic " + basicAuth.strip( ) # try to fetch the page again with authentication return client.getPage( url, headers={"Authorization": authHeader}) else: return failure def printError(failure): print >> sys.stderr, "Error:", failure.getErrorMessage( ) reactor.stop( ) if len(sys.argv) == 2: url = sys.argv[1] client.getPage(url).addErrback( checkHTTPError, url).addCallback( printPage).addErrback( printError) reactor.run( ) else: print "Usage: %s " % sys.argv[0]
Run webcat3.py with a URL as the first argument, and it will attempt to download and print the page. If it receives a 401 error, it will ask for a username and password and try the request again:
$ python webcat3.py http://example.com/protected/page 401 Authorization Required User name: User Password:
A Password Protected Page
...
3.2.2. How Does That Work?
This example uses an extra error handler, checkHTTPError. It is added to the Deferred returned from client.getPage first, before adding the printPage and printError handler functions. This gives checkHTTPError the opportunity to handle any errors returned by client.getPage before either of the other handler functions is called.
As an errback handler, checkHTTPError will be called with a twisted.python.failure.Failure object. The Failure object encapsulates the exception that was raised, records a traceback at the time of the exception, and adds several useful methods. checkHTTPError starts by using the Failure.trap method to verify that the exception was of type twisted.web.error.Error . If it wasn't, trap will reraise the exception, exiting the current function and letting the error pass on to the next errback handler in line, printError.
Next, checkHTTPError checks the HTTP response status code. failure.value is an Exception object, and since it's already been verified as type twisted.web.error.Error, it's known to have a status attribute containing the status code. If the status is not 401, the original failure is simply returned, which again has the effect of letting the error pass through to printError.
But if the status is 401, checkHTTPError takes action. It prompts the user for his login name and password, and encodes the results into an HTTP Authorization header. Then it calls client.getPage again, and returns the resulting Deferred. This causes something very cool to happen: the reactor waits for the results of this second call to getPage, and then calls printPage or printError with those results. In effect, checkHTTPError is saying, "Handling the error resulted in another Deferredwait for the result of that Deferred and pass it to the next event handler in line." This technique is very powerful and is used many times in Twisted applications.
The end result is the same as in previous examples: either printPage is called with the downloaded page data, or printError is called with an error. Of course, if the initial request was successful, and didn't require authentication, checkHTTPError is never called at all, and the result passes directly to printPage.
Getting Started
Building Simple Clients and Servers
Web Clients
Web Servers
Web Services and RPC
Authentication
Mail Clients
Mail Servers
NNTP Clients and Servers
SSH
Services, Processes, and Logging