-
Notifications
You must be signed in to change notification settings - Fork 0
The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit.
macchineimparanti/impara
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
@author: Alessandro Ferrari <alessandroferrari87@gmail.com> The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit. Impara realizes a complete pipeline of classification. Classification can be perfomed using an SVM with radial basis functions as kernel, chi-squared kernels and linear kernel. The method allows to perform features scaling, pca, model selection, random forests features selection and recursive features elimination. All these settings can be managed by specifying them in a configuration file. Type "python __main__.py -h" for more informations. First release. v0.1 Dependencies: Python numpy matplotlib scikit-learn panda scipy If you want to use sparse filtering features: Theano PyAutoDiff (https://github.com/LowinData/pyautodiff/tree/python2-ast) meta (https://github.com/numba/meta) Examples of usage: First, add in the python path the path of the library. For linux just types in the terminal: >>PYTHONPATH=$PYTHONPATH:/path/to/folder/impara/source >>export PYTHONPATH Move the terminal current working directory to the library root folder: >>cd /path/to/folder/impara/source By typing: >>python __main__.py -h you get help about the command line parameters inputs. Example of usage: python __main__.py --configuration_filename /path/to/script_params.json Example of script_params.json (you cannot keep comments in the deployed one): { "C": null , #this has to be set to a float number if model selection #is disabled, while if there is model selection enabled, #it can be left to null "gamma": null , #this has to be set to a float number if model selection #is disabled, while if there is model selection enabled, #it can be left to null "pca_flag": true, #if true, pca is enabled, otherwise is disabled "pca_variance_retain": 0.99, "scaling_flag": true, #if true, features scaling is performed "C_list": null, #list of C parameters to cross-validate. #If not set, it is left to the default one. "gamma_list": null, #list of gamma parameters to cross-validate. #Only for rbf kernel. If not set, it is left to the default one. "svm_type": "rbf", #selection among ('rbf','rbf_chi2','linear') "skip_model_selection": false, #if true, there is model selection, #"C" and "gamma" are determined by this configuration file "n_iterations_ms": 6, #number of iterations in model selection for having more reliable cross-validation "n_iterations_performance_estimation": 20, #number of iterations for having more reliable performances estimation "sparse_filtering_flag": false, #determines if sparse filtering is enabled "save_sf_fn": null, "load_sf_fn": null, "n_layers_sf": null, "n_iterations_sf": 1000, "n_features_sf": 50, "rf_features_selection_flag": true, #if true, random forests features selection is performed "rfe_features_selection_flag": false, #if true, recursive features elimination is enabled, useful for cross-validate number of pca components "n_iterations_rfe": 5, #number iterations recursive features elimination, for having more robust cross-validation "overnight_simulation": true, #if true, there is not need of user interaction during the training "show_precision_flag": true, #show accurate precision statistics "show_accuracy_flag": true, #show accurate accuracy statistics "show_recall_flag": true, #show accurate recall statistics "show_f1score_flag": true, #show accurate f1-score statistics "show_trerr_flag": false, #show accurate training error statistics "show_cverr_flag": false, #show accurate cross-validation error statistics "max_num_cpus": 4, #-1 to leave it unset! "dataset_name": "generic", #identification name for saving the model, useful for organizing data "test_set_flag": false, #this does not exclude cross-validation, in this case test_set are samples without labels, useful in competitions where you have unlabeled data to submit. "training_set_fn": "/path/to/training/set.csv", "target_set_fn": "/path/to/labels.csv", "test_set_fn": "/path/to/test/set.csv", #this has to be specified only if there is test_set_unlabeled data to classify "predicted_set_fn": "/path/to/predicted/set.csv", #this is done only if there is test_set_unlabeled data to classify "dest_path": "/destination/path" #directory where to save produced data } Licensing: The materials including the source codes are released with gpl-3.0 license. You should have received a copy of the GNU General Public License along with this tutorial. If not, see <http://www.gnu.org/licenses/>.
About
The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published