Skip to content

LoreDema/ValidPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ValidPy

This is a Python tool to choose the best configuration and algorithm (between SVM and ANN) for your machine learning regression task.

It is been developed to take part to AA1 Cup 2014.

It use the PyBrain implementation of ANN and scikit-learn implementation of SVM.

Here you can find the documentation

Install

Clone this repository, then install it using pip (Linux)::

$ sudo pip install -e ./ValidPy

Dependencies

Quick start

This tool implement K-cross validation for both ANN and SVM.

For all the experiments you need a csv file comma "," separated. This file have to be 3 columns, each row is:

id, output_x, output_y

ANN k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this::

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":[1,2,3],
  "units":[15,25],
  "function":["sigmoid","gaussian"],
  "momentum":[0.0,0.9],
  "learning_rate":[0.01,0.05],
  "lr_decay":[1.0, 0.9999]
}

Then you have to run ann_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

SVM k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this::

{
  "grid":"true",
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":["linear", "poly", "rbf", "sigmoid"],
  "C":[0.1, 1.0, 10, 100],
  "epsilon":[0.01,0.05, 0.1, 0.5, 1, 5],
  "degree":[3]
}

Then you have to run svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.

ANN vs SVM k-cross validation

To perform a k-cross validation over a file you need to create a configuration JSON like this, you can choose how many time to repeat the experiment setting the experiments parameter::

{
  "experiments":4,
  "k":8,
  "parallel_process":4,
  "data_file":"absolute_path_to_data_file.csv",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "ANN": {
    "hidden_layers":2,
    "units":25,
    "function":"sigmoid",
    "momentum":0.0,
    "learning_rate":0.05,
    "lr_decay":0.9999
  },
  "SVM": {
    "kernel":"rbf",
    "C":30,
    "epsilon":0.1,
    "degree":3
  }
}

Then you have to run ann_vs_svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_vs_svm_kcross.sh path_to_config_JSON

The script will produce a csv file containing for each experiment the average training time, the average average euclidean distance over the k experiments, the total average average training time and the total average average euclidean distance(computed on the validation set outputs). It also produce for each experiment a folder with the single experiment details and models.

ANN test

To perform a test you need to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "hidden_layers":2,
  "valid_prop":0.1,
  "units":25,
  "function":"sigmoid",
  "momentum":0.0,
  "learning_rate":0.05,
  "lr_decay":0.9999
}

Then you have to run ann_test.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh ann_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM test

To perform a test you need to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":30,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_test.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_test.sh path_to_config_JSON

The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.

SVM predict

To predict over a blind set you need a csv file comma "," separated. This file have to be 2 columns, each row is:

id, output_x

You have to create a configuration JSON like this::

{
  "training_set":"absolute_path_to_training_set_file.csv",
  "test_set":"absolute_path_to_test_set_file.csv"",
  "out_folder":"absolute_path_output_folder",
  "out_file":"absolute_path_output_file.csv",
  "input_length": 10,
  "output_length": 2,
  "kernel":"rbf",
  "C":10,
  "epsilon":0.1,
  "degree":3
}

Then you have to run svm_train.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_train.sh path_to_config_JSON

The script will produce for each output a model.

Then you have to run svm_predict.sh in executable/ giving the path to the configuration JSON as parameter::

$ cd ./ValidPy/executable/
$ sh svm_predict.sh path_to_config_JSON

The script will produce a csv file containing 3 columns, each row is:

id, output_x, output_y

ANN predict

Not already implemented.