GitHub - AaronTR/SLU_Software_Engineering_Group_A: Group A's Project for Software Engineering Class

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
Tools		Tools
res		res
.gitattributes		.gitattributes
.gitignore		.gitignore
GoldmineGUI.py		GoldmineGUI.py
README.txt		README.txt
Sprint 3 Reflection.doc		Sprint 3 Reflection.doc
The power of voice.pdf		The power of voice.pdf
corpora_trainer.py		corpora_trainer.py
get_tweets.py		get_tweets.py
getyql.py		getyql.py
graphanalysis.py		graphanalysis.py
json_appender.py		json_appender.py
master_tweet_sample.json		master_tweet_sample.json
product_backlog.txt		product_backlog.txt
sample.db		sample.db
stock_loaddaily.py		stock_loaddaily.py
tweet_Tri-Gram_Classifier.py		tweet_Tri-Gram_Classifier.py
tweet_trender.py		tweet_trender.py
tweetcache.py		tweetcache.py
zmq_srv.py		zmq_srv.py
zmq_stockpull.py		zmq_stockpull.py
zmq_stockpull_graphic.py		zmq_stockpull_graphic.py
zmq_stockpull_simplegraphic.py		zmq_stockpull_simplegraphic.py
zmq_stockpush.py		zmq_stockpush.py

Repository files navigation

Group A Readme
--------------

Link to python-twitter downloads: https://github.com/bear/python-twitter/wiki
Link to Twitter documentation: https://dev.twitter.com/docs


Getting Started
---------------
The entire system centers around zmq_srv.py, this is the server that
hosts the database that the individual clients will connect to and 
expect to do a number of things, from pushing tweets and stocks to from 
repsective APIs, to pulling existing tweets and stocks for analysis, to
storing that analysis

start the server with "python zmq_srv.py"
If the database isn't already created, run "python getyql.py"

### Populating your stock database

The easiset way to generate data for analysis is to pull historical daily 
stock information for a specific ticker symbol. zmq_loaddaily.py was written
to do just that! This program will take a ticker symbol, a start date, and an
end date, and download the daily stock values for each trading day within
that range inclusive of the dates given.

### Populating your tweet database

To populate the database with tweet information, you must run 
sentiment_analyzer.py. This is then followed by the get_tweets.py program which
will take in a company name as its search term. get_tweets.py will output a raw
tweet object to the sentiment_analyzer, which would then process the 
information and store into the database a formatted dictionary with:
   1) "id" of the tweet.
   2) "date" of the tweet, which is the creation date of that tweet.
   3) "sentiment" of the tweet, either positive or negative.
   4) "company" that the tweet is refering to. 

Database Info
----------------
Timestamp information should be stored as a string now. When pulling
historical stock data, this timestamp should include a date in text format
"YYYY-MM-DD HH:MM", OR "YYYY-MM-DD high" OR "YYYY-MM-DD low". 

Supplementary Tools
________________
A few tools have been created to enable the users a greater degree in flexibility
with the corpora they want to use in order to train the sentiment analyzer which
are as follows:

### sanders_tweet_corpora_extractor ###

This tool is focused primarily on Sander's twitter corpora which contains roughly
4000 usable, individually sorted tweets with categories of positive, negative, and
neutral. This tool will take the supplied information Sander's corpora download 
and distill the csv and rawdata into a more usable format, such as a json or a text 
file. Just stick this program into the folder where you installed Sander's corpora.

USAGE: "python sanders_tweet_corpora_extractor <mode>"

The modes are as follows:
    -jM (For a json of all sentiments)
    -jS (For separate jsons of positive, negative, and neutral)
    -txtM (For a text file of all sentiments)
    -txtS (For separate text files of positive, negative, and neutral)

### search_tweets ###

Search tweets is a program built to supply data for a user in use with either the
json_filter or the text_filter manual sorting programs. This program will
create a single json named "manual_tweet.json" with the keyword a user specifies.
With this keyword, search_tweets will attempt to search twitter for all tweets with
that keyword including a scattering of positive, negative, and financial terms to
zone in on possibly useful tweets.

USAGE: "python search_tweets.py <keyword>"

### text_filter ###

The text filter manual sorting program is built for the purpose of supplying the
nltk trainer with more data. text filter will display the text of a tweet, provide
the various options for sorting, and build several text files of the respective 
sentiment.

USAGE: "python text_filter.py"

       z = Positive
       x = Negative
       c = Trash
       a = Useful Trash
       q = Save and Quit
       [Enter] to skip the tweet

### json_filter ###

The json filter manual sorting program is built for the purpose of supplying 
training data for the tri_gram sentiment classifier. json filter will take the
manual_tweet.json file and traverse, one by one, each tweet object. It will display
the text for the user and allow the user to sort into either positive, negative,
neutral, or trash. json_filter will create 4 files, 1 for positive, negative, neutral,
and 1 for the master json that contains all 3 of these sentiments. Json filter will
also append a sentiment key to the tweet object before writing it to the file.

USAGE: "python json_filter.py"
       z = Positive
       x = Negative
       c = Neutral
       a = Trash
       q = Save and Quit
       [Enter] to skip the tweet.