k-means-constrained

K-means clustering implementation whereby a minimum and/or maximum size for each cluster can be specified.

This K-means implementation modifies the cluster assignment step (E in EM) by formulating it as a Minimum Cost Flow (MCF) linear network optimisation problem. This is then solved using a cost-scaling push-relabel algorithm and uses Google's Operations Research tools's SimpleMinCostFlow which is a fast C++ implementation.

This package is inspired by Bradley et al.. The original Minimum Cost Flow (MCF) network proposed by Bradley et al. has been modified so maximum cluster sizes can also be specified along with minimum cluster size.

The code is based on scikit-lean's KMeans and implements the same API with modifications.

Ref:

Example

>>> from k_means_constrained import KMeansConstrained
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
>>>                [4, 2], [4, 4], [4, 0]])
>>> clf = KMeansConstrained(
>>>     n_clusters=2,
>>>     size_min=2,
>>>     size_max=5,
>>>     random_state=0
>>> )
>>> clf.fit(X)
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> clf.cluster_centers_
array([[ 1.,  2.],
       [ 4.,  2.]])
>>> clf.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)

Installation

Requires Google's OR-tools to be installed.

Currently tested with:

scikit-learn == 0.19.1
ortools == 6.7.4973

As this package uses internal scikit learn methods associated with k-means it may break with other versions of scikit-learn. This can be addressed in the future by importing the internal methods into this project.

TODO:

Documentation
Test with sparse X
Remove dependencies on internal scikit learn methods by importing them into this project

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
ect		ect
k_means_constrained		k_means_constrained
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
compile_cython.py		compile_cython.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ect

ect