$ # build docker image
$ docker build --tag scraper .
$ # execute container
$ docker run --interactive --tty \
--name capstone-scraper \
--mount type=bind,source=`pwd`,target=/app \
scraper
$ # create spider(s)
$ scrapy startproject Scraper
$ cd Scraper
$ scrapy genspider -t crawl cardekho cardekho.com
$ scrapy genspider -t crawl zigwheels zigwheels.com
$ docker start capstone-scraper
$ docker exec -it capstone-scraper bash
$ scrapy crawl cardekho -o data/data.csv
$ scrapy crawl zigwheels -o data/data.csv
Open clean.ipynb in Google Colab and use cars.csv
present in Scraper/spiders/data
as input.