ArtExIn is short for Article Extraction and Indexing. It's a set of modules for fetching HTML pages, extracting relevant articles from it, and indexing the extracted text.
ArtExIn is developed by Outernet Inc and it powers the preparation of web pages for broadcast over the Outernet network.
Install artexin using pip:
pip install git+git://github.com/Outernet-Project/artexin.git
Execute unittests with:
python setup.py test
or if you've got tox installed:
tox
Please report all bugs to our issue tracker.