This is a Python tool to choose the best configuration and algorithm (between SVM and ANN) for your machine learning regression task.
It is been developed to take part to AA1 Cup 2014.
It use the PyBrain implementation of ANN and scikit-learn implementation of SVM.
Here you can find the documentation
Clone this repository, then install it using pip (Linux)::
$ sudo pip install -e ./ValidPy
- simplejson 3.3.1
- NumPy 1.9.2
- PyBrain 0.3
- SciPy 0.13.3
- matplotlib 1.3.1
- scikit-learn 0.15.2
This tool implement K-cross validation for both ANN and SVM.
For all the experiments you need a csv file comma "," separated. This file have to be 3 columns, each row is:
id, output_x, output_y
To perform a k-cross validation over a file you need to create a configuration JSON like this::
{
"grid":"true",
"k":8,
"parallel_process":4,
"data_file":"absolute_path_to_data_file.csv",
"out_folder":"absolute_path_output_folder",
"input_length": 10,
"output_length": 2,
"hidden_layers":[1,2,3],
"units":[15,25],
"function":["sigmoid","gaussian"],
"momentum":[0.0,0.9],
"learning_rate":[0.01,0.05],
"lr_decay":[1.0, 0.9999]
}
Then you have to run ann_kcross.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh ann_kcross.sh path_to_config_JSON
The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.
To perform a k-cross validation over a file you need to create a configuration JSON like this::
{
"grid":"true",
"k":8,
"parallel_process":4,
"data_file":"absolute_path_to_data_file.csv",
"out_folder":"absolute_path_output_folder",
"input_length": 10,
"output_length": 2,
"kernel":["linear", "poly", "rbf", "sigmoid"],
"C":[0.1, 1.0, 10, 100],
"epsilon":[0.01,0.05, 0.1, 0.5, 1, 5],
"degree":[3]
}
Then you have to run svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh svm_kcross.sh path_to_config_JSON
The script will produce a csv file containing for each combination of the parameters the average training time and the average average euclidean distance (computed on the validation set outputs) over the k experiments. It also produce for each combination a folder with the single experiments details and models.
To perform a k-cross validation over a file you need to create a configuration JSON like this, you can choose how many time to repeat the experiment setting the experiments parameter::
{
"experiments":4,
"k":8,
"parallel_process":4,
"data_file":"absolute_path_to_data_file.csv",
"out_folder":"absolute_path_output_folder",
"input_length": 10,
"output_length": 2,
"ANN": {
"hidden_layers":2,
"units":25,
"function":"sigmoid",
"momentum":0.0,
"learning_rate":0.05,
"lr_decay":0.9999
},
"SVM": {
"kernel":"rbf",
"C":30,
"epsilon":0.1,
"degree":3
}
}
Then you have to run ann_vs_svm_kcross.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh ann_vs_svm_kcross.sh path_to_config_JSON
The script will produce a csv file containing for each experiment the average training time, the average average euclidean distance over the k experiments, the total average average training time and the total average average euclidean distance(computed on the validation set outputs). It also produce for each experiment a folder with the single experiment details and models.
To perform a test you need to create a configuration JSON like this::
{
"training_set":"absolute_path_to_training_set_file.csv",
"test_set":"absolute_path_to_test_set_file.csv"",
"out_folder":"absolute_path_output_folder",
"input_length": 10,
"output_length": 2,
"hidden_layers":2,
"valid_prop":0.1,
"units":25,
"function":"sigmoid",
"momentum":0.0,
"learning_rate":0.05,
"lr_decay":0.9999
}
Then you have to run ann_test.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh ann_test.sh path_to_config_JSON
The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.
To perform a test you need to create a configuration JSON like this::
{
"training_set":"absolute_path_to_training_set_file.csv",
"test_set":"absolute_path_to_test_set_file.csv"",
"out_folder":"absolute_path_output_folder",
"input_length": 10,
"output_length": 2,
"kernel":"rbf",
"C":30,
"epsilon":0.1,
"degree":3
}
Then you have to run svm_test.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh svm_test.sh path_to_config_JSON
The script will produce a txt file containing the training time and the average euclidean distance over the test set outputs and the experiment models.
To predict over a blind set you need a csv file comma "," separated. This file have to be 2 columns, each row is:
id, output_x
You have to create a configuration JSON like this::
{
"training_set":"absolute_path_to_training_set_file.csv",
"test_set":"absolute_path_to_test_set_file.csv"",
"out_folder":"absolute_path_output_folder",
"out_file":"absolute_path_output_file.csv",
"input_length": 10,
"output_length": 2,
"kernel":"rbf",
"C":10,
"epsilon":0.1,
"degree":3
}
Then you have to run svm_train.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh svm_train.sh path_to_config_JSON
The script will produce for each output a model.
Then you have to run svm_predict.sh in executable/ giving the path to the configuration JSON as parameter::
$ cd ./ValidPy/executable/
$ sh svm_predict.sh path_to_config_JSON
The script will produce a csv file containing 3 columns, each row is:
id, output_x, output_y
Not already implemented.