ValidPy

This is a Python tool to choose the best configuration and algorithm (between SVM and ANN) for your machine learning regression task.

It is been developed to take part to AA1 Cup 2014.

It use the PyBrain implementation of ANN and scikit-learn implementation of SVM.

Here you can find the documentation

Install

Clone this repository, then install it using pip (Linux)::

$ sudo pip install -e ./ValidPy

Dependencies

simplejson 3.3.1
NumPy 1.9.2
PyBrain 0.3
SciPy 0.13.3
matplotlib 1.3.1
scikit-learn 0.15.2

Quick start

This tool implement K-cross validation for both ANN and SVM.

For all the experiments you need a csv file comma "," separated. This file have to be 3 columns, each row is:

id, output_x, output_y

ANN k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this::

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":[1,2,3],
  "units":[15,25],
  "function":["sigmoid","gaussian"],
  "momentum":[0.0,0.9],
  "learning_rate":[0.01,0.05],
  "lr_decay":[1.0, 0.9999]
}

Then you have to run ann_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

SVM k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this::

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":["linear", "poly", "rbf", "sigmoid"],
  "C":[0.1, 1.0, 10, 100],
  "epsilon":[0.01,0.05, 0.1, 0.5, 1, 5],
  "degree":[3]
}

Then you have to run svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

ANN vs SVM k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this, you can choose how many time to repeat the experiment setting the experiments parameter::

{
  "experiments":4,
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "ANN": {
    "hidden_layers":2,
    "units":25,
    "function":"sigmoid",
    "momentum":0.0,
    "learning_rate":0.05,
    "lr_decay":0.9999
  },
  "SVM": {
    "kernel":"rbf",
    "C":30,
    "epsilon":0.1,
    "degree":3
  }
}

Then you have to run ann_vs_svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_vs_svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each experiment the average training time, the average average euclidean distance over the k experiments, the total average average training time and the total average average euclidean distance(computed on the validation set outputs). It also produce for each experiment a folder with the single experiment details and models.

ANN test

To perform a test you need to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":2,
  "valid_prop":0.1,
  "units":25,
  "function":"sigmoid",
  "momentum":0.0,
  "learning_rate":0.05,
  "lr_decay":0.9999
}

Then you have to run ann_test.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM test

To perform a test you need to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":30,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_test.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM predict

To predict over a blind set you need a csv file comma "," separated. This file have to be 2 columns, each row is:

id, output_x

You have to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "out_file":"absolute_path_output_file.csv",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":10,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_train.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_train.sh path_to_config_JSON

The script will produce for each output a model.

Then you have to run svm_predict.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_predict.sh path_to_config_JSON

The script will produce a csv file containing 3 columns, each row is:

id, output_x, output_y

ANN predict

Not already implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
doc		doc
executables		executables
validpy		validpy
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

executables

executables

validpy

validpy

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

ValidPy

Install

Dependencies

Quick start

ANN k-cross validation

SVM k-cross validation

ANN vs SVM k-cross validation

ANN test

SVM test

SVM predict

ANN predict

About

Releases

Packages

Languages

LoreDema/ValidPy

Folders and files

Latest commit

History

Repository files navigation

ValidPy

Install

Dependencies

Quick start

ANN k-cross validation

SVM k-cross validation

ANN vs SVM k-cross validation

ANN test

SVM test

SVM predict

ANN predict

About

Resources

Stars

Watchers

Forks

Languages