An SEO Crawler meant to act as an frewware alternative to ScreamingFrog. There is a lot of work to be done.
usage: run.py [-h] [-t THREADS] [-a AGENT] [-p PROXY] [-o TIMEOUT] [-r ROBOTS]
[-m MAX_URLS] [-d DATA_FORMAT]
url
positional arguments:
url url to start the crawl from
optional arguments:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
number of threads - scale with caution
-a AGENT, --agent AGENT
user-agent
-p PROXY, --proxy PROXY
proxy to use with crawler
-o TIMEOUT, --timeout TIMEOUT
time to stop crawl after no new urls are found
-r ROBOTS, --robots ROBOTS
whether you obey robots.txt rules
-m MAX_URLS, --max_urls MAX_URLS
stop crawling after data collected from a list of urls
-d DATA_FORMAT, --data_format DATA_FORMAT
data format, either csv or sql
- Work out the most efficient way to write to SQLite when SQL output is set. Currently, is to slow.
- Add support for MongoDB.
- Package as a command line executable.
- Extend SEO parser - redirect history, and other useful information.
- Maybe add support for MongoDB???
- Proper logging
- Arg parsing from command line.