Machine learning algorithm implementation

Machine learning algorithm implemented by python3: trying to build a clear, modular, easy-to-use-and-modify machine learning library. all the machine learning algorithms are rewrited as Class, with same and clear interface. also implement common dataset Class that can be easily used in any algorithms. As this is a simplified machine learning algorithm implement, the accuracy is not the main factor to be considered, but it can be taken as a baseline, a better acc result is possible to get by optimizing the training hyper params.

update

2019/07/08 add gbdt algorithm
2019/07/06 add cart reg algorithm
2019/07/04 add ada boost algorithm
2019/07/02 add random forest algorithm
2019/06/27 add cart algorithm
2019/06/26 add naive bayes algorithm
2019/06/25 add kdtree algorithm
2019/06/21 add svm algorithm
2019/06/15 add perceptron algorithm
2019/06/14 add softmax regression algorithm
2019/06/12 add logistic regression algorithm
2019/06/10 add knn regression algorithm
2019/06/03 reconstruct this repo

features

pure python code to implement all the algorithm.
all the algorithms integrated as Class, easy to use and modify.
all the datasets integrated as Class, easy to use and modify.
all the algorithms are validated on several datasets(mainly focus on sklearn exist datasets include digits dataset).
support multi class classify by using multi-class-model_wrapper on top of two-class-classify-model.
support training hyper-parameters modify: batch_size change, learning rate change, model save and load.
visualization training process: log text and loss curve generation.
detailed code explanation.

usage

prepare main dataset: mnist(from kaggle), other datasets have been prepared by sklearn or in ./dataset/simple/ folder.

python3 setup.sh

train(knn/kdtree don't need to train)

from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()

eval a dataset(support all models)

from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.evaluation(test_feats, test_labels)

test a sample(support all models)

from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.predict_single([-1, 8.5])

visualize the linear divide hyperplane(only support logistic_reg/perceptron)

from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_points_line()

visualize the predict boundary(support all models)

from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_boundary()

save model(support all models)

sm.save('save_folder_path')

load model(support all models)

sm.load('model_path')

PART 1.1 knn classifier

feature:

no model weight
support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features.
test code: test_knn.
source code: knn_reg_lib.

PART 1.2 logistic regression classifier

feature:

with model weight(n_feat+1, 1).
only support two-classes-classification.
support linear separable features.
test code: test_logistic_reg.
source code: logistic_reg_lib.

PART 1.3 softmax regression classifier

feature:

with model weight(n_feat+1, n_class).
support two-classes-classification and multi-classes-classification.
support linear separable features.
test code: test_softmax_reg.
source code: softmax_reg_lib.

PART 1.4 perceptron classifier

feature:

with model weight(n_feat+1, 1).
only support two-classes-classification.
support linear separable features.
test code: test_perceptron.
source code: perceptron_lib.

PART 1.5 svm classifier

feature:

with model weight.
only support two-classes-classification.
support linear separable features and nonlinear separable features.
test code: test_svm.
source code: svm_lib.

PART 1.6 kdtree classifier

feature:

no model weight
support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features.
test code: test_knn.
source code: knn_reg_lib.

PART 1.7 naive bayes classifier

feature:

no model weight
support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features(but strongly restricted by features distribution).
support continuous features and discrete features
test code: test_cart.
source code: cart_lib.

PART 1.8 CART classifier

feature:

support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features.
support continuous features and discrete features
test code: test_decision_tree.
source code: decision_tree_lib.

PART 1.11 random forest classifer

feature:

support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features.
support continuous features and discrete features
test code: test_random_forest.
source code: random_forest_lib.

PART 1.12 ada boost classifer

feature:

only support two-classes-classification.
support linear separable features and nonlinear separable features.
support continuous features and discrete features
test code: test_ada_boost.
source code: ada_boost_lib.

PART 1.13 gbdt classifer

feature:

support two-classes-classification and multi-classes-classification.
support linear separable features and nonlinear separable features.
support continuous features and discrete features
test code: test_gbdt.
source code: gbdt_lib.

PART 1.14 xgboost classifer

to be update ...

PART 1.15 MLP classifier(BP network)

to be update ...

PART 1.15 CNN classifier

to be update ...

PART 1.12 OneVSOne model wrapper

feature:

as a wrapper to transform 2-class classifer to multi-class classifier, can be used on logistic-reg/svm/perceptron
test code: test_cart.
source code: cart_lib.

PART 2.1 linear reg/rigid reg/lasso reg

to be update ...

PART 2.2 cart reg

to be update ...
test code: test_decision_tree_regressor.
source code: decision_tree_lib.

PART 3.1 K-means

to be update ...

PART 4.1 crf

to be update ...

Reference:

Machine Learning in Action, Peter Harrington
Python Machine learning Algorithm, Zhiyong Zhao
Statical learning method, Hang Li

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.idea		.idea
benchmark		benchmark
core		core
dataset		dataset
demo		demo
example		example
utils		utils
.DS_Store		.DS_Store
README.md		README.md
README_CN.py		README_CN.py
setup.py		setup.py
setup.sh		setup.sh

ximitiejiang/machine_learning_algorithm

Folders and files

Latest commit

History

Repository files navigation

Machine learning algorithm implementation

update

features

usage

PART 1.1 knn classifier

PART 1.2 logistic regression classifier

PART 1.3 softmax regression classifier

PART 1.4 perceptron classifier

PART 1.5 svm classifier

PART 1.6 kdtree classifier

PART 1.7 naive bayes classifier

PART 1.8 CART classifier

PART 1.11 random forest classifer

PART 1.12 ada boost classifer

PART 1.13 gbdt classifer

PART 1.14 xgboost classifer

PART 1.15 MLP classifier(BP network)

PART 1.15 CNN classifier

PART 1.12 OneVSOne model wrapper

PART 2.1 linear reg/rigid reg/lasso reg

PART 2.2 cart reg

PART 3.1 K-means

PART 4.1 crf

Reference:

About

Topics

Resources

Stars

Watchers

Forks

Languages