Retrieving Cookies in HTML Documents


import urllib2 import cookielib from urllib2 import urlopen, Request cJar = cookielib.LWPCookieJar() opener=urllib2.build_opener( \     urllib2.HTTPCookieProcessor(cJar)) urllib2.install_opener(opener) r = Request(testURL) h = urlopen(r) for ind, cookie in enumerate(cJar):     print "%d - %s" % (ind, cookie)     cJar.save(cookieFile)

The Python language includes a cookielib module that provides classes for automatic handling of HTTP cookies in HTML documents. This can be absolutely necessary when dealing with HTML documents that require cookies to be set on the client.

To retrieve the cookies from an HTML document, first create an instance of a cookie jar using the LWPCookieJar() function of the cookielib module. The LWPCookieJar() function returns an object that can load from and save cookies to disk.

Next, create an opener, using the build_opener([handler, . . .]) function of the urllib2 module, which will handle the cookies when the HTML file is opened. The build_opener function accepts zero or more handlers that will be chained together in the order in which they are specified and returns an opener object.

Note

If you want urlopen() to use the opener object to open HTML files, call the install_opener(opener) function and pass in the opener object. Otherwise, use the open(url) function of the opener object to open the HTML files.


Once the opener has been created and installed, create a Request object using the Request(url) function of the urllib2 module, and then open the HTML file using the urlopen(request) function.

Once the HTML page has been opened, any cookies in the page will now be stored in the LWPCookieJar object. You can then use the save(filename) function of the LWPCookieJar object.

import os import urllib2 import cookielib from urllib2 import urlopen, Request cookieFile = "cookies.dat" testURL = 'http://maps.google.com/' #Create instance of cookie jar cJar = cookielib.LWPCookieJar() #Create HTTPCookieProcessor opener object opener = urllib2.build_opener( \     urllib2.HTTPCookieProcessor(cJar)) #Install the HTTPCookieProcessor opener urllib2.install_opener(opener) #Create a Request object r = Request(testURL) #Open the HTML file h = urlopen(r) print "Page Header\n======================" print h.info() print "Page Cookies\n======================" for ind, cookie in enumerate(cJar):     print "%d - %s" % (ind, cookie) #Save the cookies cJar.save(cookieFile)


html_cookie.py

Page Header ====================== Cache-Control: private Set-Cookie: PREF=ID=fac1f1fcb33dae16:TM=1153336398: LM=1153336398:S=CpIvoPKTNq6KhCx1; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com Content-Type: text/html; charset=ISO-8859-1 Server: mfe Content-Length: 28271 Date: Wed, 19 Jul 2006 19:13:18 GMT Page Cookies ====================== 0 - <Cookie PREF=ID=fac1f1fcb33dae16:TM=1153336398: LM=1153336398:S=CpIvoPKTNq6KhCx1 for .google.com/>


Output from html_cookie.py code



Python Phrasebook(c) Essential Code and Commands
Python Phrasebook
ISBN: 0672329107
EAN: 2147483647
Year: N/A
Pages: 138
Authors: Brad Dayley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net