The urllib Module

The urlib module provides a unified client interface for HTTP, FTP, and gopher. It automatically picks the right protocol handler based on the uniform resource locator (URL) passed to the library.

Fetching data from a URL is extremely easy. Just call the urlopen method, and read from the returned stream object, as shown in Example 7-14.

Example 7-14. Using the urllib Module to Fetch a Remote Resource

File: urllib-example-1.py

import urllib

fp = urllib.urlopen("http://www.python.org")

op = open("out.html", "wb")

n = 0

while 1:
 s = fp.read(8192)
 if not s:
 break
 op.write(s)
 n = n + len(s)

fp.close()
op.close()

for k, v in fp.headers.items():
 print k, "=", v

print "copied", n, "bytes from", fp.url

server = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.org

Note that stream object provides some non-standard attributes. headers is a Message object (as defined by the mimetools module), and url contains the actual URL. The latter is updated if the server redirects the client to a new URL.

The urlopen function is actually a helper function, which creates an instance of the FancyURLopener class and calls its open method. To get special behavior, you can subclass that class. For instance, the class in Example 7-15 automatically logs in to the server when necessary.

Example 7-15. Using the urllib Module with Automatic Authentication

File: urllib-example-3.py

import urllib

class myURLOpener(urllib.FancyURLopener):
 # read an URL, with automatic HTTP authentication

 def setpasswd(self, user, passwd):
 self._ _user = user
 self._ _passwd = passwd

 def prompt_user_passwd(self, host, realm):
 return self._ _user, self._ _passwd

urlopener = myURLOpener()
urlopener.setpasswd("mulder", "trustno1")

fp = urlopener.open("http://www.secretlabs.com")
print fp.read()

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols

Internationalization

Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules



Python Standard Library
Python Standard Library (Nutshell Handbooks) with
ISBN: 0596000960
EAN: 2147483647
Year: 2000
Pages: 252
Authors: Fredrik Lundh

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net