The robotparser Module

(New in 2.0) The robotparser module reads robots.txt files, which are used to implement the Robot Exclusion Protocol (http://info.webcrawler.com/mak/projects/robots/robots.html).

If you're implementing an HTTP robot that will visit arbitrary sites on the Net (not just your own sites), it's a good idea to use this module to check that you really are welcome. Example 7-21 demonstrates the robotparser module.

Example 7-21. Using the robotparser Module

File: robotparser-example-1.py

import robotparser

r = robotparser.RobotFileParser()
r.set_url("http://www.python.org/robots.txt")
r.read()

if r.can_fetch("*", "/index.html"):
 print "may fetch the home page"

if r.can_fetch("*", "/tim_one/index.html"):
 print "may fetch the tim peters archive"

may fetch the home page

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols

Internationalization

Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules



Python Standard Library
Python Standard Library (Nutshell Handbooks) with
ISBN: 0596000960
EAN: 2147483647
Year: 2000
Pages: 252
Authors: Fredrik Lundh

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net