urllib.robotparser — Internet Spider Access Control — PyMOTW 3

robotparser implements a parser for the robots.txt file format, including a function that checks if a given user agent can access a resource. It is intended for use in well-behaved spiders, or other crawler applications that need to either be throttled or otherwise restricted.

Read more…

This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.