Chapter 8. Processing HTML


Several modules included with Python provide virtually all the necessary tools necessary to parse and process HTML documents without needing to use a web server or web browser. Parsing HTML files is becoming much more commonplace in such applications as search engines, document indexing, document conversion, data retrieval, site backup or migration, as well as several others.

Because there is no way to cover the extent of options Python provides in HTML processing, the first two phrases in this chapter focus on specific Python modules to simplify opening HTML documents locally and on the Web. The rest of the phrases discuss how to use the Python modules to quickly parse the data in the HTML files to process specific items, such as links, images, and cookies. The final phrase in this chapter uses the example of fixing HTML files that do not have properly formatted tag data to demonstrate how to easily process the entire contents of the HTML file.



Python Phrasebook(c) Essential Code and Commands
Python Phrasebook
ISBN: 0672329107
EAN: 2147483647
Year: N/A
Pages: 138
Authors: Brad Dayley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net