Skip to content

A web scraper for NORI, a centralized club and athlete management system in Iceland.

License

Notifications You must be signed in to change notification settings

busla/norix-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NORIX

A spider for NORI, a centralized club and athlete management system in Iceland. Over 80% of all sports clubs in Iceland use Nori to manage their subscriptions and payment.

The spider is written with Scrapy and managed by the ScrapyRT REST api. The spider is launched with ScrapyRT which forwards the payload (club, username, password), logs into Nori and scrapes all data accessible for that user. It then saves the results to MongoDB.

The spider takes the aformentioned parameters and saves the scraped data to db. It therefore doesn´t return any results. To start scraping, you can use the norix-ui which communicates with norix-api that in turn sends a request to ScrapyRT. Norix-api uses JSON Web Tokens (JWT) to encrypt the password in our DB and sends an authorization token back to the user.

You can play with the scraper independantly if you POST the correct payload and then view the results in your Mongo database.

Example:

curl 127.0.0.1:9080/crawl.json -d '{"spider_name":"norix", "request": {"url": "http://nameofclub.felog.is/UsersLogin.aspx", "meta": {"user": "yourusername", "password": "yourpassword"}}}'

Install MongoDB

Install MongoDB

OS dependencies

The spider depends on lxml (http://lxml.de/), which in turn depends on the GCC compiler library.

OSX

Install XCode.

Debian/Ubuntu

This command should do it. $ sudo apt-get install python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev

Install project.

$ git clone https://github.com/busla/norix
$ cd norix/norix
$ which python #slóðin á python 2.7, skrifaðu $python og athugaðu
$ virtualenv -p [slóðin að python] venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ scrapyrt

$ scrapyrt needs to be launched in the spider project directory (where scrapy.cfg is located).

Web API

See norix-api

Web UI

See norix-ui

About

A web scraper for NORI, a centralized club and athlete management system in Iceland.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages