urllib.robotparser — Internet Spider Access Control — PyMOTW 3
Originally posted
· 1 min read
robotparser implements a parser for the robots.txt file format, including a function that checks if a given user agent can access a resource. It is intended for use in well-behaved spiders, or other crawler applications that need to either be throttled or otherwise restricted.
This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.