Habakkuk

Habakkuk is an application for filtering tweets containing Christian bible references. The goal is to capture the book name, chapter number, verse number and tweet text for further analysis.

Django

This project uses django for project organization purposes. Perform the following to set up the virtual environment.

$ virtualenv .
$ . ./bin/activate
$ pip install -r requirements.txt

Storm

This project uses a storm topology to analyze tweets from the twitter sample stream. The entry point is a storm spout that uses twitter4j to access the stream with a username and password. Tweets are then passed to a storm shell bolt implemented in Python that applies a regular expression for detecting Christian bible references. Finally, a bolt receives the tuple with a bible reference tag and stores it to elasticsearch.

For more information refence the storm concepts wiki. I also have a habakkuk starter page that provides some background.

Elasticsearch

This project uses ElasticSearch as backend storage. Please reference the site for details.

Accumulo

I experimented with using Apache Accumulo. The code has been disabled but the Bolt is still there is anyone wants to try it. It works fine but I found Elasticsearch worked better for this project.

Hadoop

Scripts in analysis/ depend on Cloudera Hadoop CDH3.

Sub-Directories

java - Storm Application
bible_verse_matching - Tools to build and test the bible reference regular expressions. Also dictionary files for pig and mahout.
elasticsearch - Index templates and tools to query elasticsearch
accumulo - Table initialization scripts
config - Configuration files for setting up storm with supervisord
analysis - pig scripts for data analysis

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
accumulo		accumulo
analysis		analysis
bible_verse_matching		bible_verse_matching
config		config
elasticsearch		elasticsearch
habakkuk		habakkuk
java		java
web		web
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
fabfile.py		fabfile.py
manage.py		manage.py
requirements.txt		requirements.txt

pldheeraj/Hak

Folders and files

Latest commit

History

Repository files navigation

Habakkuk

Django

Storm

Elasticsearch

Accumulo

Hadoop

Sub-Directories

About

Resources

Stars

Watchers

Forks

Languages