Skip to content

Semi-supervised learning frameworks for python, which allow fitting scikit-learn classifiers to partially labeled data

License

Notifications You must be signed in to change notification settings

terry07/semisup-learn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semi-supervised learning frameworks for Python

This project contains Python implementations for semi-supervised learning, made compatible with scikit-learn, including

  • Contrastive Pessimistic Likelihood Estimation (CPLE) (based on - but not equivalent to - Loog, 2015), a `safe' framework applicable for all classifiers which can yield prediction probabilities (safe here means that the model trained on both labelled and unlabelled data should not be worse than models trained only on the labelled data)

  • Self learning (self training), a naive semi-supervised learning framework applicable for any classifier (iteratively labelling the unlabelled instances using a trained classifier, and then re-training it on the resulting dataset - see e.g. http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf )

  • Semi-Supervised Support Vector Machine (S3VM) - a simple scikit-learn compatible wrapper for the QN-S3VM code developed by Fabian Gieseke, Antti Airola, Tapio Pahikkala, Oliver Kramer (see http://www.fabiangieseke.de/index.php/code/qns3vm ) This method was included for comparison

The first method is a novel extension of Loog, 2015 for any discriminative classifier (the differences to the original CPLE are explained below). The last two methods are only included for comparison.

The advantages of the CPLE framework compared to other semi-supervised learning approaches include

  • it is a generally applicable framework (works with scikit-learn classifiers which allow per-sample weights)

  • it needs low memory (as opposed to e.g. Label Spreading which needs O(n^2)), and

  • it makes no additional assumptions except for the ones made by the choice of classifier

The main disadvantage is high computational complexity. Note: this is an early stage research project, and work in progress (it is by no means efficient or well tested)!

If you need faster results, try the Self Learning framework (which is a naive approach but much faster):

from frameworks.SelfLearning import *

any_scikitlearn_classifier = SVC()
ssmodel = SelfLearningModel(any_scikitlearn_classifier)
ssmodel.fit(X, y)

For details consult the documentation

About

Semi-supervised learning frameworks for python, which allow fitting scikit-learn classifiers to partially labeled data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%