Market Pulse

This project uses sentiment analysis from Twitter tweets to help make predictions on the stock market. Every tweet used is associated to a particular stock symbol when a #(stock symbol) or $(stock symbol) is found. For example, the #SP500 or $SP500 is assumed to be related to the SP 500 stock.

Gathering Tweets

Tweets were gathered using the [Tweepy](http://www.tweepy.org/) Python library. Tweets were streamed in real time and saved to a MongoDB database. Anywhere from 4-6 million tweets were gathered per day.

See save_stock_tweets.py for the code.

Streaming Stock Quotes

Both historical and current stock quotes were gathered via the [Yahoo Finance](https://pypi.python.org/pypi/yahoo-finance) Python library.

See yahoo_quotes.py for the code. This includes some data cleaning and preliminary modeling.

First Attempt
My first attempt at getting stock data involved scraping the NASDAQ website in real time for current and historic stock quotes. See scrape_nasdaq.py for the code. I ended up not using this method because it was very time consuming to get quotes. This made it unreasonable considering I wanted to live stream quotes in a web app.

Exploratory Data Analysis

An easy way to get an idea of what your data is doing is to visualize it. For this project I used TFIDF and Nonnegative Matrix Factorization to get an easily interpretable result to graph and model.

So what does this tell me? Well the blue line represents the closing price for a stock symbol for that day and the red lines represent the NMF values for a stock symbol for that day. What I can see from this is that when the red lines go up then the stock market also goes up in the next day. And possibly the same is true for when the market goes down.

See clustering.py for the code.

I can also get an idea of what people are saying about a particular stock symbol by looking at the most used words that relate to it. Enter the word cloud:

Word Cloud for #AAPL or Apple

Word Cloud for #YHOO or Yahoo

Modeling

To start I used a Random Forest Classifier to see if I could simply identify whether the a particular stock symbol would increase or decrease in value in the following day. From this approach I was getting close to %70 accuracy so I decided to move on to creating a Random Forest Regression model. For this approach I was using the RMSE or Root Mean Squared Error, and the MSE or Mean Squared Error to get an idea of where a stock price would close in the next day.

This image shows the closing prices for a weeks worth of data for the TSLA (Tesla) stock symbol. The red box to the right of the graph shows where my model is predicting the market will close for that day. (You will probably notice that two points are missing here.. This is because those dates were on Saturday and Sunday and there will be no closing prices for those days.)

NMF and Regression
When working with Nonnegative Matrix Factorization, or NMF, you need a way to figure what the best number of features to use is. For this I gauged how a certain number of features changed the MSE in the Regression model. That code can be found in model_validation.py. This code is basically my version of Grid Searching a different number of NMF features and different Random Forest metrics.

Web App

Finally I wanted to turn this project into a usable application. To do this I used Flask to create a web application that could allow a user to search different stock symbols, live stream stock quotes, give historical stock data, and display the predictions my model was making for the different stock symbols.

Search Page

Streaming Page

Prediction Page

Conclusion

In the end I believe that using unsupervised learning techniques, like Nonnegative Matrix Factorization, is a great way to fuel supervised learning techniques like Random Forest Regression. I used a lot of new technologies in this project and learned a lot in the process. I hope that this project has shown that I am a capable Data Scientist, Application Developer, and Interface Designer. These are three areas that I greatly enjoy working in.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.ipynb_checkpoints		.ipynb_checkpoints
bokeh		bokeh
canvasjs		canvasjs
data		data
html		html
imgs		imgs
market_pulse		market_pulse
s3		s3
static		static
stream_quotes		stream_quotes
templates		templates
testing		testing
yahoo_news		yahoo_news
.gitignore		.gitignore
AWS_large.jpg		AWS_large.jpg
Arrow_dollar_iStock.jpg		Arrow_dollar_iStock.jpg
Contact.tiff		Contact.tiff
HTML.png		HTML.png
MongoDB.png		MongoDB.png
Prediction_Baselines.md		Prediction_Baselines.md
README.md		README.md
Scraping NASDAQ.ipynb		Scraping NASDAQ.ipynb
Tommy_Martin.key		Tommy_Martin.key
VisualPresentation.key		VisualPresentation.key
candlestick.html		candlestick.html
d3.layout.cloud.js		d3.layout.cloud.js
datascience.jpg		datascience.jpg
dtreediagram.png		dtreediagram.png
eda_nmf_quotes.png		eda_nmf_quotes.png
flask.png		flask.png
javascript.png		javascript.png
jquery.gif		jquery.gif
keep_calm.jpg		keep_calm.jpg
predict.tiff		predict.tiff
presentation.key		presentation.key
python-logo.png		python-logo.png
randomforest.jpg		randomforest.jpg
run_daily.py		run_daily.py
scrap_update.py		scrap_update.py
scrape_app.py		scrape_app.py
scrape_app_2.py		scrape_app_2.py
scrape_app_bokeh.py		scrape_app_bokeh.py
scrape_nasdaq.py		scrape_nasdaq.py
search.tiff		search.tiff
start.py		start.py
stream.tiff		stream.tiff
test.csv		test.csv
text_input.html		text_input.html
tweet.tiff		tweet.tiff
twitter-icon@2x.png		twitter-icon@2x.png
word_cloud.js		word_cloud.js

gravity226/NASDAQ

Folders and files

Latest commit

History

Repository files navigation

Market Pulse

Table Of Contents

Gathering Tweets

Streaming Stock Quotes

Exploratory Data Analysis

Modeling

Web App

Conclusion

About

Resources

Stars

Watchers

Forks

Languages