MULTIPROBE_LSH

Utility Package for Multiprobe LSH Families

Dependency List

Anaconda3 for Python 3.8
plyvel
sqlitedict
Flask 1.1.2+ (service demo only)

Dataset Creation

mkdir ./data
export DS_NAME=random10k
# number of data points
export DS_SIZE=10000
# number of reference points used as ground truth
# beware of this number:
# DS_SIZE*DS_EVAL_SIZE similarity comparisons will be computed
export DS_EVAL_SIZE=500
python make_datasets.py -dtype=r -data_dir=./data/$DS_NAME  -data_size=$DS_SIZE -eval_size=$DS_EVAL_SIZE

LSH Family Benchmark

Make sure experiment configuration files in ./expr_configs are correctly set. Suppose in ./expr_configs/default.json

{
  "lsh_id": 0,
  "params": {
    "vec_dim": 256,
    "vec_norm": 10,
    "L": [2,3,4,5,6,7,8,9,10],
    "K": [16,14,12],
    "F": [2,3,4,5,6,7,8]
    "bucket_limit": null
  },
  "data_path": "./data/random1M",
  "mapper_cache": "./tmp/random1M",
  "repeat": 1
}

mapper_cache is not a required field and if it absent all the signature mapping and buckets will be cached in memory. Unless you have at least 64GB local RAM, you have to specify the location for on-disk cache for 1M size datasets otherwise your OS kernel will kill your process for OOM. Run the main experiments by

python main_expr.py ./expr_configs/default.json

All the hyperparameters in lists will be crosslisted so 9 * 3 * 6=162 experiment trials will be performed. Here lsh_id has three options to decide which multiprobe LSH to use. The mapping is

0: DWiseHplaneLSH
1: HplaneFlipQLSH
3: MultiprobeCrossPolytopeLSH

For query log data (access will be available soon), specify the configuration json file as follows.

{
  "lsh_id": 0,
  "params": {
    "vec_dim": 256,
    "vec_norm": 10,
    "L": [2,3,4,5,6,7,8,9,10],
    "K": [16,14,12],
    "F": [2,3,4,5,6,7,8]
    "bucket_limit": null
  },
  "data_path": "./data/qlog1M",
  "mapper_cache": "./tmp/qlog1M",
  "repeat": 1
}

If reference data is not yet ready, do

export DS_EVAL_SIZE=100
export DS_NAME=qlog1M
python make_datasets.py -dtype=q -data_dir=./data/$DS_NAME -eval_size=$DS_EVAL_SIZE

Experiment reports will be dumped into ./report.tsv which is automatically created if not existing.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
expr_configs		expr_configs
multiprobe		multiprobe
service		service
LICENSE.md		LICENSE.md
README.md		README.md
main_expr.py		main_expr.py
make_datasets.py		make_datasets.py
make_service_demo.py		make_service_demo.py
pairwise_sampling.py		pairwise_sampling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expr_configs

expr_configs

multiprobe

multiprobe

service

service

LICENSE.md

LICENSE.md

README.md

README.md

main_expr.py

main_expr.py

make_datasets.py

make_datasets.py

make_service_demo.py

make_service_demo.py

pairwise_sampling.py

pairwise_sampling.py

Repository files navigation

MULTIPROBE_LSH

Dependency List

Dataset Creation

LSH Family Benchmark

About

Releases

Packages

Languages

License

vivi489/multiprobe_lsh

Folders and files

Latest commit

History

Repository files navigation

MULTIPROBE_LSH

Dependency List

Dataset Creation

LSH Family Benchmark

About

Resources

License

Stars

Watchers

Forks

Languages