Simple tool to crawl from manta.com
- Clone this repo and have virtualenv ready if possible.
$ cd /path/to/repo
$ virtualenv env && source env/bin/activate
(env)$ pip install -r requirements.txt
Not much to configs, but proxy is required.
Change timezone accordingly.
Read configs.py for details.
- Create keywords.csv file
$ touch keywords.csv
- Put something into keywords.csv in this format: keyword,city,state
$ echo 'burger,new york,ny' >> keywords.csv
- Run Crawler
searchUrlCrawler.py
$ python searchUrlCrawler.py
- If success, then run Crawler
pagerCrawler.py
$ python pagerCrawler.py
- If success, then run Crawler
detailListCrawler.py
$ python detailListCrawler.py
- If success, then run Crawler
detailCrawler.py
or your own crawler to crawl page details
$ python detailCrawler.py
- Done! Then you results will be outputted to details.csv @ the root of repo
- Manta only accepts SSL requests, make sure your proxies have SSL access.
- Don't change the delay too short, just in case.
- Change proxies if encountered HTTP 405 error.
- Make your own Detail Crawler for customized crawled result.
- All crawled results will be appending to the existing output file, delete or truncate the files if you want a clean results.