ScreamingCrawl

An SEO Crawler meant to act as an frewware alternative to ScreamingFrog. There is a lot of work to be done.

usage: run.py [-h] [-t THREADS] [-a AGENT] [-p PROXY] [-o TIMEOUT] [-r ROBOTS]
              [-m MAX_URLS] [-d DATA_FORMAT]
              url

positional arguments:
  url                   url to start the crawl from

optional arguments:
  -h, --help            show this help message and exit
  -t THREADS, --threads THREADS
                        number of threads - scale with caution
  -a AGENT, --agent AGENT
                        user-agent
  -p PROXY, --proxy PROXY
                        proxy to use with crawler
  -o TIMEOUT, --timeout TIMEOUT
                        time to stop crawl after no new urls are found
  -r ROBOTS, --robots ROBOTS
                        whether you obey robots.txt rules
  -m MAX_URLS, --max_urls MAX_URLS
                        stop crawling after data collected from a list of urls
  -d DATA_FORMAT, --data_format DATA_FORMAT
                        data format, either csv or sql

TODO

Work out the most efficient way to write to SQLite when SQL output is set. Currently, is to slow.
Add support for MongoDB.
Package as a command line executable.
Extend SEO parser - redirect history, and other useful information.
Maybe add support for MongoDB???
Proper logging
Arg parsing from command line.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
parsing		parsing
.gitignore		.gitignore
README.md		README.md
crawl.py		crawl.py
result_db.py		result_db.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing

parsing

.gitignore

.gitignore

README.md

README.md

crawl.py

crawl.py

result_db.py

result_db.py

run.py

run.py

Repository files navigation

ScreamingCrawl

TODO

About

Releases

Packages

Languages

EdmundMartin/ScreamingCrawl

Folders and files

Latest commit

History

Repository files navigation

ScreamingCrawl

TODO

About

Resources

Stars

Watchers

Forks

Languages