Some Algorithms for Imbalanced Data

Here you can find some algorithms to handle the problem of imbalanced data. It is has been assumed for some situations that the target class is the small class, as usually happens in imbalance data situations.

K-Clustering

The K-Clustering algorithm clusters the big class into K separate clusters, and K classifiers are trained for each cluster against the target data. To classify a sample as positive (small), all the classifiers must agree on classifying the sample as positive.

The algorithm is follows these steps:

Find K clusters in the large class. The number of clusters K is defined so that each cluster contains a similar number of samples to the target class.
K classifiers are trained for each cluster data against the small class.
In the prediction stage, each sample is shown to the K classifiers. For a sample to be predicted as positive, all K classifiers must agree, otherwise is classified as negative.

This image shows an example where the large class is formed by three Gaussian Distributions. The base estimator showed in this example is logistic regression, and the three black lines show the decision boundaries for each classifier.

Dagging

Dagging stands for Down-sampling for Bagging. This algorithm randomly splits the samples of the big class into K chunks, and K classifiers are trained with each chunk against the whole target samples. The final classification is the geometric mean of the classification probabilities for all trained classifiers.

(More detailed description on progress)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
DaggingClassifier.py		DaggingClassifier.py
KClustering.py		KClustering.py
README.md		README.md
kclustering.png		kclustering.png
plotters.py		plotters.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

DaggingClassifier.py

DaggingClassifier.py

KClustering.py

KClustering.py

README.md

README.md

kclustering.png

kclustering.png

plotters.py

plotters.py

test.py

test.py

utils.py

utils.py

Repository files navigation

Some Algorithms for Imbalanced Data

K-Clustering

Dagging

About

Releases

Packages

Languages

tonbadal/imbalanced_data_algorithms

Folders and files

Latest commit

History

Repository files navigation

Some Algorithms for Imbalanced Data

K-Clustering

Dagging

About

Resources

Stars

Watchers

Forks

Languages