News Context Explorer

About

News Context Explorer (NCE) helps to find people, places and nouns referenced in a given text file. It was built thinking of translators and news readers in a second language. After uploading a text file or providing an article URL, NCE will find names, places, organizations and other relevant words with their links to Wikipedia, as well as images, maps and other contextual information in 10 languages.

Features:

Uploads a txt file or processes a URL
Supports English, German and Spanish as input languages
Cleans HTML for text processing and display
Provides the user with an editor to work and download the source text html
Highlights found entities
Allows the user to explore entities in a target language (10 languages currently supported) and download the references to a CSV file
Geocodes found locations and marks them on a google map
Retrieves photos of found entities and displays them on a gallery

Screenshots

Requirements

NCE requires:

Python 2.7.6 or later
Flask
Java (JRE) is required to run the Stanford NES and POS taggers
A Google API key for geocoding and map display
Memcache and memcached to store entities in cache

Python libraries listed in requirements.txt

Installing NCE

Better to start with a virtual environment. To install virtualenv:

$ sudo pip install virtualenv

$ cd ~/code/myproject/

$ virtualenv env

To activate the virtual environment:

$ source env/bin/activate
Once you have a virtual environment Pip install the required libraries with requirements.txt

$ env/bin/pip install -r requirements.txt
Clone this repo into your project directory.
You need to add 2 keys:
- A Flask API key in controller.py
- A Google API key in templates/base.html
Download and unzip the Stanford NER 3.5.0 and Stanford POS English tagger 3.5.0 on your project directory. I renamed them stanford-ner and stanford-postagger inside the app, but you should double check the routes in german_processing.py and spanish_processing.py
Run the English NER file in java as a server in port 8080

java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/english.muc.7class.distsim.crf.ser.gz -port 8080 -outputFormat inlineXML

The German and Spanish NER files are not running as a server, but they are noticeably slower. You can adapt the code to run either way.
Create an 'uploads' and 'downloads' folder in your static directory.
Download and install memcached and run it. If memcached is not running the app will still work but it will be slower and make more requests to the Wikipedia API.
If you want to add or remove target languages, just add or remove the item from the templates/editor.html dropdown menu, and add new languages to the lancodes dictionary in controller.py. Wikipedia has articles in 128 locales.

Aknowledgments

I used excellent code and examples from:

Stanford NLP. Wikipedia API. jQuery Highlight plugin. Medium editor. Magnific Popup. HTML sanitizer. Front page tutorial.

Contact info

This project was completed during Hackbright, a 10 week engineering fellowship for women.

If you want to know more about this project, find me on Twitter @lenazun

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
static		static
templates		templates
clean_html.py		clean_html.py
controller.py		controller.py
file_reader.py		file_reader.py
geocoding.py		geocoding.py
german_processing.py		german_processing.py
readme.md		readme.md
requirements.txt		requirements.txt
spanish_processing.py		spanish_processing.py
text_processing.py		text_processing.py
wikipedia_linker.py		wikipedia_linker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

static

static

templates

templates

clean_html.py

clean_html.py

controller.py

controller.py

file_reader.py

file_reader.py

geocoding.py

geocoding.py

german_processing.py

german_processing.py

readme.md

readme.md

requirements.txt

requirements.txt

spanish_processing.py

spanish_processing.py

text_processing.py

text_processing.py

wikipedia_linker.py

wikipedia_linker.py

Repository files navigation

News Context Explorer

About

Features:

Screenshots

Requirements

Installing NCE

Aknowledgments

Contact info

About

Releases

Packages

Languages

lenazun/context

Folders and files

Latest commit

History

Repository files navigation

News Context Explorer

About

Features:

Screenshots

Requirements

Installing NCE

Aknowledgments

Contact info

About

Resources

Stars

Watchers

Forks

Languages