Count me Maybe

Hey I just met you, This is crazy, But here's a number, So count me, maybe?

countmemaybe is a set of probabilistic data structures that can estimate the cardinality of a set and also support unions. The base set of operations are:

Cardinality
Union
Cardinality of the Union
Cardinality of the Intersection
Jaccard Index

We also implement a probabilistic quantile data-structure which creates a synopsis of the data in order to satisfy queries regarding the quantiles of the stream.

Insertions are quite quick! For both hyperloglog and kminvalues there is an average insertion time of 4us on a 1.8GHz i7 Macbook Air.

Below are some benchmarks of error rates. The set used for estimation are 160000 random 16bit integers and the benchmark was generated using the test_dve.py script. For some as of yet unknown reason, KMV with a 64bit hash function performs worse than a 32bit hash function. This is counterintuitive and requires more investigation. Furthermore, exact mean relative error rates vary from run to run by ~1%

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
countmemaybe		countmemaybe
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

countmemaybe

countmemaybe

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

README.md

README.md

setup.py

setup.py

Repository files navigation

Count me Maybe

About

Releases

Packages

Languages

darkseed/countmemaybe

Folders and files

Latest commit

History

Repository files navigation

Count me Maybe

About

Resources

Stars

Watchers

Forks

Languages