Opening HTML Documents


import urllib u = urllib.urlopen(webURL) u = urllib.urlopen(localURL) buffer = u.read() print u.info() print "Read %d bytes from %s.\n" % \ (len(buffer), u.geturl())

The urllib and urllib2 modules included with Python provide the functionality to open and fetch data from URLs, including HTML documents.

To use the urllib module to open an HTML document, specify the URL location of the document, including the filename in the urlopen(url [,data]) function. The urlopen function will open a local file and return a file-like object that can be used to read data from the HTML document.

Once you have opened the HTML document, you can read the file using the read([nbytes]), readline(), and readlines() functions similar to normal files. To read the entire contents of the HTML document, use the read() function to return the file contents as a string.

After you open a location, you can retrieve the location of the file using the geturl() function. The geturl function returns the URL in string format, taking into account any redirection that might have taken place when accessing the HTML file.

Note

Another helpful function included in the file-like object returned from urlopen is the info() function. The info() function returns the available metadata about the URL location, including content length, content type, and so on.


import urllib webURL = "http://www.python.org" localURL = "/books/python/CH8/code/test.html" #Open web-based URL u = urllib.urlopen(webURL) buffer = u.read() print u.info() print "Read %d bytes from %s.\n" % \ (len(buffer), u.geturl()) #Open local-based URL u = urllib.urlopen(localURL) buffer = u.read() print u.info() print "Read %d bytes from %s." % \ (len(buffer), u.geturl())


html_open.py

Date: Tue, 18 Jul 2006 18:28:19 GMT Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 mod_ssl/2.0.54 OpenSSL/0.9.7e Last-Modified: Mon, 17 Jul 2006 23:06:04 GMT ETag: "601f6-351c-1310af00" Accept-Ranges: bytes Content-Length: 13596 Connection: close Content-Type: text/html Web-Based URL Read 13596 bytes from http://www.python.org. Content-Type: text/html Content-Length: 433 Last-modified: Thu, 13 Jul 2006 22:07:53 GMT Local-Based URL Read 433 bytes from file:///books/python/CH8/code/test.html.


Output from html_open.py code




Python Phrasebook(c) Essential Code and Commands
Python Phrasebook
ISBN: 0672329107
EAN: 2147483647
Year: N/A
Pages: 138
Authors: Brad Dayley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net