Small Crawler written in Python 2.7
Basic Features:
- Uses Max Depth to stop crawling
- Can crawl outside domain URL's
- Checks for http response 200
Needs:
- Multiprocessing support
- Handle 301 and 302 url's
- Add crawled data into a database
- Respect robots.txt