Skip to content

killua8q8/Python-Manta-Crawler

Repository files navigation

Manta Crawler

Simple tool to crawl from manta.com

Install


  • Clone this repo and have virtualenv ready if possible.
  • $ cd /path/to/repo
  • $ virtualenv env && source env/bin/activate
  • (env)$ pip install -r requirements.txt

Configurations


Not much to configs, but proxy is required.
Change timezone accordingly.
Read configs.py for details.

Get Started


  • Create keywords.csv file
    $ touch keywords.csv
  • Put something into keywords.csv in this format: keyword,city,state
    $ echo 'burger,new york,ny' >> keywords.csv
  • Run Crawler searchUrlCrawler.py
    $ python searchUrlCrawler.py
  • If success, then run Crawler pagerCrawler.py
    $ python pagerCrawler.py
  • If success, then run Crawler detailListCrawler.py
    $ python detailListCrawler.py
  • If success, then run Crawler detailCrawler.py or your own crawler to crawl page details
    $ python detailCrawler.py
  • Done! Then you results will be outputted to details.csv @ the root of repo

P.S.

  • Manta only accepts SSL requests, make sure your proxies have SSL access.
  • Don't change the delay too short, just in case.
  • Change proxies if encountered HTTP 405 error.
  • Make your own Detail Crawler for customized crawled result.
  • All crawled results will be appending to the existing output file, delete or truncate the files if you want a clean results.

About

Utility for crawling from manta

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages