a webcrawler written in python with urging requests and very BeautifulSoup
- Clone this repo
$ git clone git@github.com:ishankhare07/scrapper.git && cd scrapper
- Create a Virtual Environment, assuming python3
$ pyvenv venv
$ source venv/bin/activate
- Install requirements from pip, (again assuming pip3 for python3)
$ pip3 install -r requirements.txt
- Assuming python3 again
>>> from main import Scrapper
>>> s = Scrapper("http://news.ycombinator.com/","heacker_news") #url, filename to store data
>>> s.start_scrapping()
- We can also issue recursion depths and max-urls to scan
>>> from main import Scrapper
>>> s = Scrapper("http://news.ycombinator.com/", #url
"hacker_news", #filename to store data
20, #max-recursion depth
30) #max-urls to scan
>>> s.start_scrapping()
- Viewing the data
>>> import shelve
>>> from pprint import pprint
>>> db = shelve.open('hacker_news')
>>> pprint(list(db.items()))