GitHub - macchineimparanti/impara: The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit.

macchineimparanti / impara Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit.

3 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
doc		doc
source		source
README		README
gpl-3.0.txt		gpl-3.0.txt

Repository files navigation

@author: Alessandro Ferrari <alessandroferrari87@gmail.com>

The project macchineimparanti was created with the aim to encourage the collaborative works on machine learning challenges in order to share know how and improve the skills of the challenger, pursuing the open source spirit. 

Impara realizes a complete pipeline of classification.
Classification can be perfomed using an SVM with radial basis functions as kernel, chi-squared kernels and linear kernel. 
The method allows to perform features scaling, pca, model selection, random forests features selection and recursive features elimination.
All these settings can be managed by specifying them in a configuration file. Type "python __main__.py -h" for more informations.

First release. v0.1

Dependencies:

Python
numpy
matplotlib
scikit-learn
panda
scipy

If you want to use sparse filtering features:

Theano
PyAutoDiff (https://github.com/LowinData/pyautodiff/tree/python2-ast)
meta (https://github.com/numba/meta)

Examples of usage:

First, add in the python path the path of the library. 
For linux just types in the terminal:
>>PYTHONPATH=$PYTHONPATH:/path/to/folder/impara/source
>>export PYTHONPATH

Move the terminal current working directory to the library root folder:
>>cd /path/to/folder/impara/source

By typing:
>>python __main__.py -h
you get help about the command line parameters inputs.

Example of usage:
python __main__.py --configuration_filename /path/to/script_params.json

Example of script_params.json (you cannot keep comments in the deployed one):

{
    "C":  null ,  #this has to be set to a float number if model selection 
    #is disabled, while if there is model selection enabled, 
    #it can be left to null
    
    "gamma":  null , #this has to be set to a float number if model selection 
    #is disabled, while if there is model selection enabled, 
    #it can be left to null
    
    "pca_flag": true, #if true, pca is enabled, otherwise is disabled
    
    "pca_variance_retain": 0.99,
    
    "scaling_flag": true, #if true, features scaling is performed
    
    "C_list": null, #list of C parameters to cross-validate. 
    #If not set, it is left to the default one.
    
    "gamma_list": null, #list of gamma parameters to cross-validate. 
    #Only for rbf kernel. If not set, it is left to the default one.
    
    "svm_type": "rbf",  #selection among ('rbf','rbf_chi2','linear')
    
    "skip_model_selection": false, #if true, there is model selection, 
    #"C" and "gamma" are determined by this configuration file
    
    "n_iterations_ms": 6, #number of iterations in model selection for having more reliable cross-validation
    
    "n_iterations_performance_estimation": 20, #number of iterations for having more reliable performances estimation
    
    "sparse_filtering_flag": false, #determines if sparse filtering is enabled
    
    "save_sf_fn": null,
    
    "load_sf_fn": null,
    
    "n_layers_sf": null,
    
    "n_iterations_sf": 1000,
    
    "n_features_sf": 50,
    
    "rf_features_selection_flag": true, #if true, random forests features selection is performed
    
    "rfe_features_selection_flag": false, #if true, recursive features elimination is enabled, useful for cross-validate number of pca components
    
    "n_iterations_rfe": 5, #number iterations recursive features elimination, for having more robust cross-validation
    
    "overnight_simulation": true, #if true, there is not need of user interaction during the training
    
    "show_precision_flag": true, #show accurate precision statistics
    
    "show_accuracy_flag": true, #show accurate accuracy statistics
    
    "show_recall_flag": true, #show accurate recall statistics
    
    "show_f1score_flag": true, #show accurate f1-score statistics
    
    "show_trerr_flag": false, #show accurate training error statistics
    
    "show_cverr_flag": false, #show accurate cross-validation error statistics
    
    "max_num_cpus": 4, #-1 to leave it unset!

    "dataset_name": "generic", #identification name for saving the model, useful for organizing data
    
    "test_set_flag": false,  #this does not exclude cross-validation, in this case test_set are samples without labels, useful in competitions where you have unlabeled data to submit.
    
    "training_set_fn": "/path/to/training/set.csv",
    
    "target_set_fn": "/path/to/labels.csv",
    
    "test_set_fn": "/path/to/test/set.csv", #this has to be specified only if there is test_set_unlabeled data to classify
    
    "predicted_set_fn": "/path/to/predicted/set.csv", #this is done only if there is test_set_unlabeled data to classify
    
    "dest_path": "/destination/path" #directory where to save produced data
}


Licensing:

The materials including the source codes are released with gpl-3.0 license. You should have received a copy of the GNU General Public License along with this tutorial.  If not, see <http://www.gnu.org/licenses/>.