The robotparser Module

(New in 2.0) The robotparser module reads robots.txt files, which are used to implement the Robot Exclusion Protocol (

If you're implementing an HTTP robot that will visit arbitrary sites on the Net (not just your own sites), it's a good idea to use this module to check that you really are welcome. Example 7-21 demonstrates the robotparser module.

Example 7-21. Using the robotparser Module


import robotparser

r = robotparser.RobotFileParser()

if r.can_fetch("*", "/index.html"):
 print "may fetch the home page"

if r.can_fetch("*", "/tim_one/index.html"):
 print "may fetch the tim peters archive"

may fetch the home page

