Installation and setup consists of installing some basic dependancies, Theano, Lasagne, CUDA, BLAS.
The non-trivial part of this installation is getting CUDA and Theano to work.
For detailed Theano, CUDA and BLAS guide, Please visit.
http://deeplearning.net/software/theano/install_ubuntu.html#install-ubuntu
The guide below is a modification of https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Ubuntu-14.04
As I prefer using Conda over apt-get python install.
sudo apt-get install -y gcc g++ gfortran build-essential git wget libopenblas-dev
wget "http://repo.continuum.io/archive/Anaconda2-4.0.0-Linux-x86_64.sh"
bash Anaconda2-4.0.0-Linux-x86_64.sh
pip install --upgrade https://github.com/Theano/Theano/archive/master.zip
pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
(Use the flag --user if you are not working as root and want it in your user directory)
mkdir -p ~/code/mnist
cd ~/code/mnist
wget https://github.com/Lasagne/Lasagne/raw/master/examples/mnist.py
THEANO_FLAGS=device=cpu python mnist.py mlp 5
output should look something like this.
Loading data...
Downloading train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz
Downloading t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz
Building model and compiling functions...
Starting training...
Epoch 1 of 5 took 11.483s
training loss: 1.225167
validation loss: 0.407454
validation accuracy: 88.55 %
Epoch 2 of 5 took 11.299s
training loss: 0.565556
validation loss: 0.309781
validation accuracy: 90.92 %
Disclaimer, have not tested this as I have it setup on my machines already.
Please follow this guide from the BLAS part
https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Ubuntu-14.04
remember to make the ~/.theanorc file as mentioned in the guide.
A cool tip: use the command.
nvidia-smi
To check for available GPU resources. Should look something like.
Mon Apr 25 04:42:52 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:05:00.0 Off | N/A |
| 58% 84C P2 141W / 250W | 11864MiB / 12287MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:06:00.0 Off | N/A |
| 22% 57C P8 17W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 0000:09:00.0 Off | N/A |
| 24% 61C P8 18W / 250W | 23MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 0000:0A:00.0 Off | N/A |
| 53% 84C P2 211W / 250W | 11864MiB / 12287MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8592 C python 11838MiB |
| 3 2619 C python 11839MiB |
+-----------------------------------------------------------------------------+
mkdir -p ~/code/mnist
cd ~/code/mnist
wget https://github.com/Lasagne/Lasagne/raw/master/examples/mnist.py
THEANO_FLAGS=device=gpu python mnist.py mlp 5
output should look something like this (notice it is much faster!):
Using gpu device 1: GeForce GTX TITAN X (CNMeM is disabled, cuDNN Version is too old. Update to v5, was 3007.)
Loading data...
Building model and compiling functions...
Starting training...
Epoch 1 of 5 took 1.878s
training loss: 1.198237
validation loss: 0.402080
validation accuracy: 88.43 %
Epoch 2 of 5 took 1.898s
training loss: 0.561074
validation loss: 0.306817
validation accuracy: 91.06 %
Go to where you want to install the repo.
mkdir EEG_projects && cd EEG_projects
Now given the train/valid dataset (you can enquire Justin Dauwels at jdauwels at ntu.edu.sg for access).
Place the train and validation data in the following folder $PATH_TO_DIR/EEG_Dauwelslab/data/csv/train/
Such that.
cd $PATH_TO_DIR/EEG_Dauwelslab/data/csv/train/
ls
Btrn1.csv Btrn3.csv Btrn5.csv Btrn7.csv Bval1.csv
Bval3.csv Bval5.csv Bval7.csv Strn1.csv Strn3.csv
Strn5.csv Strn7.csv Sval1.csv Sval3.csv Sval5.csv
Sval7.csv Btrn2.csv Btrn4.csv Btrn6.csv Btrn8.csv
Bval2.csv Bval4.csv Bval6.csv Bval8.csv Strn2.csv
Strn4.csv Strn6.csv Strn8.csv Sval2.csv Sval4.csv
Sval6.csv Sval8.csv
Make the data into a numpy format
cd $PATH_TO_DIR/EEG_Dauwelslab/
python create_data train
you csv path is: ./data/csv/train/*
Converting data ...
Opening: ./data/csv/train/Sval2.csv
Saved to ./data/numpy/train/Sval2.npy.gz
Opening: ./data/csv/train/Bval6.csv
Saved to ./data/numpy/train/Bval6.npy.gz
...
Opening: ./data/csv/train/Bval8.csv
Saved to ./data/numpy/train/Bval8.npy.gz
cd $PATH_TO_DIR/EEG_Dauwelslab/
Training models is performed by the following command.
Usage: python train.py <config_name> <CVNumber1,2,3> <num_epochs>
python train.py RNN 1 150
Using gpu device 1: GeForce GTX TITAN X (CNMeM is disabled, cuDNN Version is too old. Update to v5, was 3007.)
Setting sys parameters ...
Defining symbolic variables ...
Loading config file: 'RNN'
Setting config params ...
Optimizer: rmsprop
Lambda: 0.00010
Batch size: 2048
Experiment id: RNN-1-20160425-052845
Loading data ...
loadData started!
Preprocesssing data ...
Data shapes ...
(405429, 64, 1)
(64446, 64, 1)
(405429, 1)
(64446, 1)
(12763, 64, 1)
(2010, 64, 1)
(12763, 1)
(2010, 1)
DEBUG: max train values
84.267
Building network ...
number of parameters: 41401
layer output shapes:
InputLayer (2048, 64, 1)
LSTMLayer (2048, 64, 100)
SliceLayer (2048, 100)
DenseLayer (2048, 1)
Building cost function ...
Computing updates ...
Getting gradients ...
Print configuring updates ...
cut_norm: 20
Compiling functions ...
@@@@STARTING TO TRAIN@@@@
--------- Validation ----------
validating: train loss
average evaluation accuracy (train): 0.54948
average evaluation AUC (train): 0.72353
validating: valid loss
average evaluation accuracy (valid): 0.64233
average evaluation AUC (valid): 0.73729
Epoch 1 of 151
setting learning rate to 0.0010000
Shuffling data
---------- Train ----------
average training loss: 0.55832
average training accuracy: 0.76965
average auc: 0.82934
00:01:05 since start (65.98 s)
saving parameters and metadata
stored in metadata/dump_RNN-1-20160425-052845
--------- Validation ----------
validating: train loss
average evaluation accuracy (train): 0.89933
average evaluation AUC (train): 0.94912
validating: valid loss
average evaluation accuracy (valid): 0.90007
average evaluation AUC (valid): 0.92259
Epoch 2 of 151
Shuffling data
---------- Train ----------
average training loss: 0.32418
average training accuracy: 0.89727
average auc: 0.94564
00:02:13 since start (67.51 s)
saving parameters and metadata
stored in metadata/dump_RNN-1-20160425-052845
--------- Validation ----------
validating: train loss
average evaluation accuracy (train): 0.93125
average evaluation AUC (train): 0.98539
validating: valid loss
average evaluation accuracy (valid): 0.96249
average evaluation AUC (valid): 0.98593
Cool tip: use nohup <commands> > file &
to run script in background.
nohup python train.py RNN 1 151 > out_file &
ctrl+c
Cool tip: use nvidia-smi
to see usage.
Cool tip: In the ~/.theanorc
file you specify the device. Setting device=gpu0
will make it use gpu numbered as 0, whereas device=gpu1 will make it use device 1. Using this will allow you to utilize multiple GPUs concurrently.
Cool tip: during "testing" the script, you will generate metadata files. You are probably not interested in these. To remove, do as follows.
cd $PATH_TO_DIR/EEG_Dauwelslab/metadata/
rm -rf dump*
Remember to keep the other files, such that a ls
in the metadata directory will give you
ls
FOR_DEBUGGING FOR_ENSEMBLE __init__.py
This will take ~24 hours on a GPU
cd $PATH_TO_DIR/EEG_Dauwelslab/
bash batch_train.sh
Use the debugging tool debug_model.py
to find maximas on the validation set.
This tool will take a given path
(being the latest epoch, epoch no. 149) and a topX
argument. The topX
argument will list the highest validation epochs in sorted order.
Usage python debug_model.py <path> <topX>
Debug all of the models, you should get the following response.
Example python debug_model.py metadata/dump_RNN-1-20160425-055244-149.pkl 20
Do this for every split (...RNN-1-...
, ...RNN-2-...
, ..., ...RNN-8-...
)
Next step is choosing which epoch to sample weigths for the neural network from, given the use of debug_model
you will have validations for each epoch and train/validation for every epoch.
The rule used when picking the best epoch is as follows:
- The epoch must be after the first 20 epochs (the network will often not have picked up features of high describtive value before after a certain amount of training)
- Pick highest valued validation.
- For ensemble, epochs picked must be at least 10 epochs apart (often you will find well performing epochs following each other, as they have very similar weights. Picking epochs apart will increase variance).
Using these rules you should end up with:
Given no ensemble:
Epoch for 1st split: 22
Epoch for 2nd split: 39
Epoch for 3rd split: 20
Epoch for 4th split: 137
Epoch for 5th split: 26
Epoch for 6th split: 35
Epoch for 7th split: 81
Epoch for 8th split: 140
Given ensemble:
Epochs for 1st split: 22, 39, 64
Epochs for 2nd split: 39, 64, 129
Epochs for 3rd split: 20, 30, 40
Epochs for 4th split: 137, 93, 121
Epochs for 5th split: 26, 41, 53
Epochs for 6th split: 35, 46, 56
Epochs for 7th split: 81, 131, 146
Epochs for 8th split: 140, 91, 109
Move the epochs from:
$PATH_TO_DIR/EEG_Dauwelslab/metadata/dump_RNN-...
To:
$PATH_TO_DIR/EEG_Dauwelslab/metadata/FOR_ENSEMBLE/RNN/<split>
Example:
cd $PATH_TO_DIR/EEG_Dauwelslab/metadata/
mv metadata/dump_RNN-1-20160425-055244-149.pkl FOR_ENSEMBLE/RNN/1
Now given the test dataset (you can enquire Justin Dauwels at jdauwels at ntu.edu.sg for access).
Place the test data in the following folder $PATH_TO_DIR/EEG_Dauwelslab/data/csv/test/
Such that.
cd $PATH_TO_DIR/EEG_Dauwelslab/data/csv/train/
ls
Btst1.csv Btst3.csv Btst5.csv Btst7.csv Stst1.csv
Stst3.csv Stst5.csv Stst7.csv Btst2.csv Btst4.csv
Btst6.csv Btst8.csv Stst2.csv Stst4.csv Stst6.csv
Stst8.csv
Make the data into a numpy format.
cd $PATH_TO_DIR/EEG_Dauwelslab/
python create_data test
you csv path is: ./data/csv/test/*
Converting data ...
Opening: ./data/csv/test/Btst7.csv
Saved to ./data/numpy/test/Btst7.npy.gz
Opening: ./data/csv/test/Stst2.csv
Saved to ./data/numpy/test/Stst2.npy.gz
...
Opening: ./data/csv/test/Stst4.csv
Saved to ./data/numpy/test/Stst4.npy.gz
cd $PATH_TO_DIR/EEG_Dauwelslab/
bash batch_predictions.sh
(predictions will be saved in $PATH_TO_DIR/EEG_Dauwelslab/predictions/RNN/)
cd $PATH_TO_DIR/EEG_Dauwelslab/
Please note that eval_predictions.py
takes p
being probability for spike/background as first system argument.
eval_predictions.py <probability> <model>
For p=0.5
python eval_predictions.py 0.5 RNN
(eval_predictions.py
will average probabilities over all predictions in $PATH_TO_DIR/EEG_Dauwelslab/predictions/RNN/<split>
)
For no ensemble you should get the following results from eval_predictions.py 0.5 RNN
:
-- SPLIT 1 of 8 --
Using gpu device 1: GeForce GTX TITAN X (CNMeM is disabled, cuDNN Version is too old. Update to v5, was 3007.)
loadTest started!
TP 3198, FN 345
FP 1095, TN 35298
TPR: 0.90262
SPC: 0.96991
PPV: 0.74493
AUC (test) is: 0.98532
-- SPLIT 2 of 8 --
loadTest started!
TP 1971, FN 39
FP 0, TN 64446
TPR: 0.98060
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 0.99959
-- SPLIT 3 of 8 --
loadTest started!
TP 1426, FN 336
FP 421, TN 46761
TPR: 0.80931
SPC: 0.99108
PPV: 0.77206
AUC (test) is: 0.99484
-- SPLIT 4 of 8 --
loadTest started!
TP 2735, FN 3
FP 0, TN 68887
TPR: 0.99890
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 1.00000
-- SPLIT 5 of 8 --
loadTest started!
TP 807, FN 35
FP 0, TN 53119
TPR: 0.95843
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 0.99974
-- SPLIT 6 of 8 --
loadTest started!
TP 1746, FN 0
FP 1780, TN 57430
TPR: 1.00000
SPC: 0.96994
PPV: 0.49518
AUC (test) is: 0.99991
-- SPLIT 7 of 8 --
loadTest started!
TP 2595, FN 215
FP 865, TN 92763
TPR: 0.92349
SPC: 0.99076
PPV: 0.75000
AUC (test) is: 0.98956
-- SPLIT 8 of 8 --
loadTest started!
TP 2810, FN 55
FP 6236, TN 77167
TPR: 0.98080
SPC: 0.92523
PPV: 0.31063
AUC (test) is: 0.99241
FINAL RESULTS
[[ 17288. 1028.]
[ 10397. 495871.]]
AUC = 0.99517
TPR = 0.94427
SPC = 0.98086
PPV = 0.75910
For no ensemble you should get the following results from eval_predictions.py 0.5 RNN
:
-- SPLIT 1 of 8 --
Using gpu device 1: GeForce GTX TITAN X (CNMeM is disabled, cuDNN Version is too old. Update to v5, was 3007.)
loadTest started!
TP 3464, FN 79
FP 1095, TN 35298
TPR: 0.97770
SPC: 0.96991
PPV: 0.75982
AUC (test) is: 0.98117
-- SPLIT 2 of 8 --
loadTest started!
TP 2005, FN 5
FP 0, TN 64446
TPR: 0.99751
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 0.99992
-- SPLIT 3 of 8 --
loadTest started!
TP 1758, FN 4
FP 823, TN 46359
TPR: 0.99773
SPC: 0.98256
PPV: 0.68113
AUC (test) is: 0.99966
-- SPLIT 4 of 8 --
loadTest started!
TP 2737, FN 1
FP 0, TN 68887
TPR: 0.99963
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 1.00000
-- SPLIT 5 of 8 --
loadTest started!
TP 808, FN 34
FP 0, TN 53119
TPR: 0.95962
SPC: 1.00000
PPV: 1.00000
AUC (test) is: 1.00000
-- SPLIT 6 of 8 --
loadTest started!
TP 1746, FN 0
FP 54, TN 59156
TPR: 1.00000
SPC: 0.99909
PPV: 0.97000
AUC (test) is: 1.00000
-- SPLIT 7 of 8 --
loadTest started!
TP 2595, FN 215
FP 433, TN 93195
TPR: 0.92349
SPC: 0.99538
PPV: 0.85700
AUC (test) is: 0.99348
-- SPLIT 8 of 8 --
loadTest started!
TP 2810, FN 55
FP 3905, TN 79498
TPR: 0.98080
SPC: 0.95318
PPV: 0.41847
AUC (test) is: 0.99472
FINAL RESULTS
[[ 1.79230000e+04 3.93000000e+02]
[ 6.31000000e+03 4.99958000e+05]]
AUC = 0.99612
TPR = 0.97956
SPC = 0.98751
PPV = 0.83580
This github project is dedicated to spike detection in EEG data from epileptical patients.
The code is Produced by A. Rosenberg Johansen1.
The project is supervisioned by PhD. Justin Dauwels2 lab.
The data is supplied by MD. Sydney Cash3 and MD. M. Brandon Westover3.
The code as been used for ICASSP 2016 submission, using a dataset of five patients, and is now being used for a journal paper with a dataset of 100 patients.
1: Technical University of Denmark, DTU Compute, Lyngby, Denmark
2: Nanyang Technological University, School of Electrical and Electronic Engineering, Singapore
3: Massechusetts General Hospital, Neurology Department, and Harvard Medical School, USA
The data set consists of 30 minutes EEG recordings sampled from over 100 patients. The dataset has been preprocessed into 64 length sliding windows, and labelled by M.D. to either be containing "epileptical spike" or "no epileptical spike". The data is split into eight equal size training, validation and test splits with no overlapping patients.
The code can be run through train.py, dumps will be saved in metadata/ and can be evaluated using debug_metadata.py. Configurations for the different models used can be found in configurations/
Training AUC: 0.999 Valid AUC: 0.999 Test AUC: 0.998
As of 4'th April 2016
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/RNN.py
Training AUC: 0.999
Validation AUC: 0.999
As of 23'th December 2015
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/ConvBN_DenseBN_LSTM.py
Training AUC: 0.998
Validation AUC: 0.997
As of 11'th December 2015
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/MultiConv.py
Training AUC: 0.996
Validation AUC: 0.961
As of 11'th December
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/MLP.py
Training AUC: 0.998
Validation AUC: 0.973
As of 5'th December 2015
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/LogisticRegression.py
Training AUC: 0.835
Validation AUC: 0.672
As of 5'th December 2015
Model config: https://github.com/alrojo/EEG_DauwelsLab/blob/master/configurations/MLP.py
Training AUC: 0.999
Validation AUC: 0.972
Done training with LSTM, giving 0.999 AUC Train, 0.999 AUC valid and 0.998 AUC test.