Skip to content

ximitiejiang/machine_learning_algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine learning algorithm implementation

Machine learning algorithm implemented by python3: trying to build a clear, modular, easy-to-use-and-modify machine learning library. all the machine learning algorithms are rewrited as Class, with same and clear interface. also implement common dataset Class that can be easily used in any algorithms. As this is a simplified machine learning algorithm implement, the accuracy is not the main factor to be considered, but it can be taken as a baseline, a better acc result is possible to get by optimizing the training hyper params.

update

  • 2019/07/08 add gbdt algorithm
  • 2019/07/06 add cart reg algorithm
  • 2019/07/04 add ada boost algorithm
  • 2019/07/02 add random forest algorithm
  • 2019/06/27 add cart algorithm
  • 2019/06/26 add naive bayes algorithm
  • 2019/06/25 add kdtree algorithm
  • 2019/06/21 add svm algorithm
  • 2019/06/15 add perceptron algorithm
  • 2019/06/14 add softmax regression algorithm
  • 2019/06/12 add logistic regression algorithm
  • 2019/06/10 add knn regression algorithm
  • 2019/06/03 reconstruct this repo

features

  • pure python code to implement all the algorithm.
  • all the algorithms integrated as Class, easy to use and modify.
  • all the datasets integrated as Class, easy to use and modify.
  • all the algorithms are validated on several datasets(mainly focus on sklearn exist datasets include digits dataset).
  • support multi class classify by using multi-class-model_wrapper on top of two-class-classify-model.
  • support training hyper-parameters modify: batch_size change, learning rate change, model save and load.
  • visualization training process: log text and loss curve generation.
  • detailed code explanation.

usage

  • prepare main dataset: mnist(from kaggle), other datasets have been prepared by sklearn or in ./dataset/simple/ folder.
python3 setup.sh
  • train(knn/kdtree don't need to train)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
  • eval a dataset(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.evaluation(test_feats, test_labels)
  • test a sample(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.predict_single([-1, 8.5])
  • visualize the linear divide hyperplane(only support logistic_reg/perceptron)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_points_line()
  • visualize the predict boundary(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_boundary()
  • save model(support all models)
sm.save('save_folder_path')
  • load model(support all models)
sm.load('model_path')

PART 1.1 knn classifier


feature:

  • no model weight
  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features.
    test code: test_knn.
    source code: knn_reg_lib.

PART 1.2 logistic regression classifier


feature:

  • with model weight(n_feat+1, 1).
  • only support two-classes-classification.
  • support linear separable features.
    test code: test_logistic_reg.
    source code: logistic_reg_lib.

PART 1.3 softmax regression classifier


feature:

  • with model weight(n_feat+1, n_class).
  • support two-classes-classification and multi-classes-classification.
  • support linear separable features.
    test code: test_softmax_reg.
    source code: softmax_reg_lib.

PART 1.4 perceptron classifier


feature:

  • with model weight(n_feat+1, 1).
  • only support two-classes-classification.
  • support linear separable features.
    test code: test_perceptron.
    source code: perceptron_lib.

PART 1.5 svm classifier


feature:

  • with model weight.
  • only support two-classes-classification.
  • support linear separable features and nonlinear separable features.
    test code: test_svm.
    source code: svm_lib.

PART 1.6 kdtree classifier


feature:

  • no model weight
  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features.
    test code: test_knn.
    source code: knn_reg_lib.

PART 1.7 naive bayes classifier


feature:

  • no model weight
  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features(but strongly restricted by features distribution).
  • support continuous features and discrete features
    test code: test_cart.
    source code: cart_lib.

PART 1.8 CART classifier


feature:

  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features.
  • support continuous features and discrete features
    test code: test_decision_tree.
    source code: decision_tree_lib.

PART 1.11 random forest classifer


feature:

  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features.
  • support continuous features and discrete features
    test code: test_random_forest.
    source code: random_forest_lib.

PART 1.12 ada boost classifer


feature:

  • only support two-classes-classification.
  • support linear separable features and nonlinear separable features.
  • support continuous features and discrete features
    test code: test_ada_boost.
    source code: ada_boost_lib.

PART 1.13 gbdt classifer


feature:

  • support two-classes-classification and multi-classes-classification.
  • support linear separable features and nonlinear separable features.
  • support continuous features and discrete features
    test code: test_gbdt.
    source code: gbdt_lib.

PART 1.14 xgboost classifer


to be update ...

PART 1.15 MLP classifier(BP network)


to be update ...

PART 1.15 CNN classifier


to be update ...

PART 1.12 OneVSOne model wrapper


feature:

  • as a wrapper to transform 2-class classifer to multi-class classifier, can be used on logistic-reg/svm/perceptron
    test code: test_cart.
    source code: cart_lib.

PART 2.1 linear reg/rigid reg/lasso reg


to be update ...

PART 2.2 cart reg


to be update ...
test code: test_decision_tree_regressor.
source code: decision_tree_lib.

PART 3.1 K-means


to be update ...

PART 4.1 crf


to be update ...

Reference:

  • Machine Learning in Action, Peter Harrington
  • Python Machine learning Algorithm, Zhiyong Zhao
  • Statical learning method, Hang Li