GabyBots

Awesome Web-crawling Robotz

Installation

After you clone the repository, enter the directory and perform the following setup.

Create a python virtual environment (optional, but recommended)
```
 $ python -m virtualenv gpython
```
Activate the virtual environment (you have to do this for every terminal you wish to run GabyBots in)
```
 $ source ./gpython/bin/activate
```
Download the required python packages
```
 $ pip install -r requirements.txt
```

Create your database

 $ ./manage.py syncdb
 $ ./manage.py migrate

Add the default scrapers (you can use 'minimal' to create a database with no web sources)
```
 $ ./manage.py loaddata starter
```

Running A Spider

You can run the starter spider, which will scrape the Google News - World RSS Feed. First you must be in the gbots directory.

$ cd gbots/

Then you can run scrapy.

$ scrapy crawl google-news -a id=1

If you want it to add the scraped articles to the database, you would use:

$ scrapy crawl google-news -a id=1 -a do_action=yes

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
db		db
gbots		gbots
.gitignore		.gitignore
README.md		README.md
manage.py		manage.py
reboot.py		reboot.py
requirements.txt		requirements.txt
runtests.py		runtests.py
scrapy.cfg		scrapy.cfg
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db

db

gbots

gbots

.gitignore

.gitignore

README.md

README.md

manage.py

manage.py

reboot.py

reboot.py

requirements.txt

requirements.txt

runtests.py

runtests.py

scrapy.cfg

scrapy.cfg

wsgi.py

wsgi.py

Repository files navigation

GabyBots

Installation

Running A Spider

About

Releases

Packages

Languages

righttrack/GabyBots

Folders and files

Latest commit

History

Repository files navigation

GabyBots

Installation

Running A Spider

About

Resources

Stars

Watchers

Forks

Languages