Skip to content

XGBoost is known to be fast and achieve good prediction results as compared to the regular gradient boosting libraries. This project attempts to predict stock price direction by using the stock's daily data and indicators derived from its daily data as predictors. As such this is a classification problem.

sequent/XGBoost_stock_prediction

 
 

Repository files navigation

XGBoost_stock_prediction

XGBoost is known to be fast and achieve good prediction results as compared to the regular gradient boosting libraries. This project attempts to predict stock price direction by using the stock's daily data and indicators derived from its daily data as predictors. A classification solution.

Data Investigation & Preprocessing

The histogram illustrates the richness of data

Feature data histograms

The correlation of the feature data between each other

Feature data correlation heatmap

The correlation of the feature data between each other after feature selection

Selected Feature data correlation heatmap

The importance of each selected feature data according to the Feature Selection algorithm

Feature data importance

Results

Training accuracy

Training Accuracy

Training loss

Training Loss

Model Testing

Testing on XGBoost Model

The XGBoost Classification Tree

The XGBoost Tree

Improvement suggestion

Before arriving at XGboostCV, GridsearchCV (all hyperparameters tuning at once) and XGboosting (one hyperparameter tuning at a time) were tried. The former took a long time to train and achieve lacklustre result (below 0.7 accuracy), the latter performs much faster but is seriously overtrained. Even if the current result doesn't overfit, the performance ~ 0.7 test accuracy is lacklustre, given the number of features to learn from. I suspect this can be due to autocorrelation and autoregressive nature of the time series data and that slicing the data at the wrong place diconnects its learnability. It may be necessary to combine with other models, such as econometric model and other non-linear model to learn well from time-series stock data.

Instructions

To execute the program, under command prompt, run : python P5.py

Prerequisites

Python 3.6 or Anaconda with Python 3.6 environment

The code is written in a Windows machine and has been tested on three operating systems: Linux Ubuntu 16.04 & Windows 10 Pro

About

XGBoost is known to be fast and achieve good prediction results as compared to the regular gradient boosting libraries. This project attempts to predict stock price direction by using the stock's daily data and indicators derived from its daily data as predictors. As such this is a classification problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.3%
  • Python 2.7%