Skip to content

LSH based high dimensional clustering for sets and points

Notifications You must be signed in to change notification settings

cjdd3b/lshhdc-names

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSHHDC : Locality-Sensitive Hashing based High Dimensional Clustering

Locality-sensitive hashing

Unlike cryptographic hashing where the goal is to map objects to numbers with a low collision rate and high randomness, the goal of LSH is to map similar elements to similar keys with high probability.

An obvious use of this technique is clustering. From Rajamaran, "Mining of Massive Datasets":

A family F of functions is said to be (d1, d2, p1, p2)-sensitive if for every f in F:

  1. If d(x,y) ≤ d1, then the probability that f(x) = f(y) is at least p1.
  2. If d(x,y) ≥ d2, then the probability that f(x) = f(y) is at most p2.

Forked modifications

This package has been modified slightly by Chase Davis to accommodate the application of clustering short strings (in this case names). See simple code for that task in tests/test_names.py.

About

LSH based high dimensional clustering for sets and points

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%