Geiger

(work in progress - for details, see the proposal.)

Setup

Install requirements:

$ pip install -r requirements.txt

Setup the config as necessary:

$ cp config-sample.py config.py; vi config.py

Download the necessary corpora:

$ python -m textblob.download_corpora

You will need to prepare some data:

$ python prep.py train_phrases
$ python prep.py train_idf

You can use any corpus to train these on; I used the body text of about 120k NYT articles and it has worked well. The more, the better, most likely.

These are used to better identify phrases in text and to have some notion of salience (inverse document frequency).

Usage

Run the server:

$ python server.py

Then try out the demo:

localhost:5001/

Development

If you are developing and need to reload Geiger a lot, you are in for a bad time. The phrase, IDF, and Word2Vec models take a very long time to load.

Fortunately, things are setup so that you can run each of these in their own separate processes, which don't need to be reloaded. If you set remote=True in config.py, the functions which rely on the phrase and Word2Vec models will call out to these separate processes instead of loading the models directly. Just make sure you set remote=False when deploying for production.

Then you can run these processes separately like so:

$ python dev.py word2vec
$ python dev.py phrases
$ python dev.py idf

The downside is that calling out to separate processes like this slows the usage of these models, but you'll likely be saving time overall.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
examples		examples
geiger		geiger
proposal		proposal
server		server
writeup		writeup
.gitignore		.gitignore
README.md		README.md
config-sample.py		config-sample.py
data		data
dev.py		dev.py
requirements.txt		requirements.txt
server.py		server.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

geiger

geiger

proposal

proposal

server

server

writeup

writeup

.gitignore

.gitignore

README.md

README.md

config-sample.py

config-sample.py

data

data

dev.py

dev.py

requirements.txt

requirements.txt

server.py

server.py

test.py

test.py

Repository files navigation

Geiger

Setup

Usage

Development

About

Releases

Packages

Languages

frnsys/geiger

Folders and files

Latest commit

History

Repository files navigation

Geiger

Setup

Usage

Development

About

Resources

Stars

Watchers

Forks

Languages