(New in 2.0) The robotparser module reads robots.txt files, which are used to implement the Robot Exclusion Protocol (http://info.webcrawler.com/mak/projects/robots/robots.html).
If you're implementing an HTTP robot that will visit arbitrary sites on the Net (not just your own sites), it's a good idea to use this module to check that you really are welcome. Example 7-21 demonstrates the robotparser module.
Example 7-21. Using the robotparser Module
File: robotparser-example-1.py import robotparser r = robotparser.RobotFileParser() r.set_url("http://www.python.org/robots.txt") r.read() if r.can_fetch("*", "/index.html"): print "may fetch the home page" if r.can_fetch("*", "/tim_one/index.html"): print "may fetch the tim peters archive" may fetch the home page
Core Modules
More Standard Modules
Threads and Processes
Data Representation
File Formats
Mail and News Message Processing
Network Protocols
Internationalization
Multimedia Modules
Data Storage
Tools and Utilities
Platform-Specific Modules
Implementation Support Modules
Other Modules