scrapi

Getting started

You will need to:
- Install requirements.
- Install Elasticsearch
- Install consumers
- Install rabbitmq

Requirements

Create and enter virtual environment for scrapi, and go to the top level project directory. From there, run

$ pip install -r requirements.txt

and the python requirements for the project will download and install.

Installing Elasticsearch

note: JDK 7 must be installed for elasticsearch to run

Mac OSX

$ brew install elasticsearch

Now, just run

$ elasticsearch

or

$ invoke elasticsearch

and you should be good to go.

Ubuntu

$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb 
$ sudo dpkg -i elasticsearch-1.2.1.deb

Now, just run

$ sudo service elasticsearch start

or

$ invoke elasticsearch

and you should be good to go.

Running the server

Just run

$ python main.py

from the scrapi/website/ directory, and the server should be up and running!

Consumers

Just run

$ invoke install_consumers

and the consumers specified in the manifest files of the worker_manager, and their requirements, will be installed.

Rabbitmq

Mac OSX

$ brew install rabbitmq

Ubuntu

$ sudo apt-get install rabbitmq-server

Running the scheduler

from the top-level project directory run:

$ invoke celery_beat

to start the scheduler, and

$ invoke celery_worker

to start the worker.

Testing

To run the tests for the project, just type

$ invoke test

and all of the tests in the 'tests/' directory will be run.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
api		api
tests		tests
website		website
worker_manager		worker_manager
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
scraper_architecture.png		scraper_architecture.png
tasks.py		tasks.py

csheldonhess/scrapi

Folders and files

Latest commit

History

Repository files navigation

scrapi

Getting started

Requirements

Installing Elasticsearch

Mac OSX

Ubuntu

Running the server

Consumers

Rabbitmq

Mac OSX

Ubuntu

Running the scheduler

Testing

About

Resources

Stars

Watchers

Forks