Matlab and Python files for PhD topic "Ensemble Learning of High Dimensional Datasets"
This folder contains the files to reproduce the results and figures in "Ensemble Learning of High Dimensional Datasets". External codes are not included in the repository but can be downloaded from their original sources. The external codes include
- Linear Discriminant Analysis https://www.mathworks.com/matlabcentral/fileexchange/29673-lda-linear-discriminant-analysis
- L1-Magic https://statweb.stanford.edu/~candes/l1magic/
- MatConvNet http://www.vlfeat.org/matconvnet/
Also not included are the weights for the deep neural networks, and the image, audio and the UCI datasets. These however can be downloaded from
- Deep Neural Network Weights : http://www.vlfeat.org/matconvnet/pretrained/
- Imagenet ILSVRC 2012 : http://www.image-net.org/challenges/LSVRC/2012/
- Images: http://sipi.usc.edu/database/
- Audio:
- Danse Arabe :- https://freesound.org/people/FreqMan/sounds/42956/
- Nature sounds :- https://freesound.org/people/IchBinChrist/sounds/424288/
- Human Speech :- https://freesound.org/people/tim.kahn/sounds/71744/
- UCI datasets:
The folder Utility contains helper codes that should be included via the addpath command. Details for the codes in this folder are as descibed in the next section Other folders organizes the codes by the chapters they are used in including codes used for analysis and not discussed anywhere in the thesis.
- CreateAxes :- creates a MxN grid of axes according to the dimensions specified and settings specified. Grid has shared legends. Code was written later in the research when it became obvious that manually arranging ~1000 figures was distracting from more productive work
- HouseHolder_nv :- defines the householder normal vector with chracteristics specified by vector v. Implements Algorithms C.1
- binRandGen : generates non-i.i.d binary vectors with specific "densities"
- cummBinnProb : cdf calculator for binomial distributions
- cummPolyaProb : calculates majority vote ensemble accuracy as per P.E. distribution
- rand* : Generates various random projections
Codes in this folder requires the images, audio, and UCI-DOROTHEA datasets
- ImageWrapper : This code reproduces the figures in the section on the empirical corroboration for image datasets (Figures: )
- ImageWrapperStratified : This code reproduces the figures in the section on the empirical corroboration for image datasets with stratified sampling(Figures: )
- SparseWrapper : This code reproduces the figures in the section on the empirical corroboration on real world sparse binary vectors (Figures:
- AudioWrapper : This code reproduces the figures in the section on the empirical corroboration for audio dataset
- SynthBinWrapper : This code reproduces the figures in the section on the empirical corroboration for synthetic testcases. Note that the features are not IID (Figures: )
- SynthRandWrapper : This code empirically corroborate our theory when the features are not generated by a bernoilli process.
Codes in this folder requires the images and audio dataset, as well as L1-Magic
- l1-subspace : Small proof of concept showing unsuitability of RS as a sensing matrix. Figure . Note: both RP and RS has a ~45% chance failing to reconstruct the sparse signal
- cs-image : Code used to reconstruct image from small number of samples
- cs-audio : Code used to reconstruct audio from small number of samples. Warning, do not listen to the audio reconstruction of low samples signals on headphones. Volume should be kept at ~80% at all times to prevent speaker damage
- cs-audio-sup : L1-eq reconstruction of audio file, not used in the thesis. RP may fail to converge, causing code to fail sometimes
- cs-image-sup : L1-eq reconstruction of image file, not used in the thesis. RP may fail to converge, causing code to fail sometimes
- badFlipWrapper : Wrapper around badFlip_ for organizing the flipping probability and plotting the figures
- fpEnsExeriment : Simulates how flipping probability relates to ensembles and a simulation of an Ensemble of RS projection on the Bayes' classifier
- ldaEnsembleRotationDataIndPlotWeightedOrderedLabelNoise_auto : Generates synthetic testcases used in chapter 5
- ldaDataset_* : runs experiment on UCI datasets
- cnn_imagenet_* : Runs experiments on the corresponding DNN
- process_results_* : ensembles the results of the PseudoSaccade views using majority vote
- process_results_borda : experiment with a borda count ensemble
- process_results_confMat : generates confusion matrix for additional analysis
- corrBase_tbl_7p5 : Calculates diversity measure between base and the pseudosaccade view
- corrSaccade_tbl_7p6 : Calculates diversity measure between the pseudosaccades view
These are helper scripts for processing and training neural network, required modules include numpy, scipy, matplotlib, keras, tensorflow, Foolbox
- adverseAttack - generate Foolbox adversarial examples, requires Foolbox
- ensExp* - various neural network experiments on the ensembles