GitHub

Overview

Idea: Trying to get the maximum from datasets where we have very few training examples, but each example has a very large number of features. Examples of such datasets include medical databases where we have gene activation measurements for very few patients but many different genes.

Method: We design a neural network architecture whose number of parameters is constant with respect to the number of features (which is not the case with a typical linear classifier). The basic idea is that we use a linear classifier whose coefficients for each features are generated by a single MLP that takes as an input a representation for this feature, which is basically a transformation of the set of values taken by this feature through all the examples. More complex (deep) architectures are also experimented.

Datasets

ICML 2003 feature selection challenge datasets: Arcene, Dorothea
AML/ALL Leukemia classification dataset

Details

See doc/README.pdf for detailed explanations.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
doc		doc
.gitignore		.gitignore
README.md		README.md
apply_model.py		apply_model.py
ber.py		ber.py
contributors.txt		contributors.txt
dataset.py		dataset.py
datastream.py		datastream.py
features_reduction.py		features_reduction.py
look.py		look.py
mlp.py		mlp.py
mlpfsel.py		mlpfsel.py
mlpfsel2.py		mlpfsel2.py
mlpfsel2ae.py		mlpfsel2ae.py
mlpfsel3.py		mlpfsel3.py
mlpfsel4.py		mlpfsel4.py
mlpfsel5.py		mlpfsel5.py
mlprnn.py		mlprnn.py
rnn.py		rnn.py
test.py		test.py
train.py		train.py

Alexis211/transpose_features

Folders and files

Latest commit

History

Repository files navigation

Overview

Datasets

Details

Usefull links

About

Resources

Stars

Watchers

Forks

Languages