docker-kgraph

This image contains some python code to run KGraph K nearest neighbour search over a simple HTTP server.

The input data should be in the format of comma separated CSV files. The first column of the data file should be some kind of IDs of the rows to be indexed. The rest of the columns should all be floating point numbers. The CSV data file should NOT have a header line. The data files must be indexed before becoming available for search.

index

Put your data file in a local directory that will be linked to the container volume /kgraph, and run indexer like this:

docker run  -v <local_directory>:/kgraph \
            -t huahaiy/kgraph  \
            index.py -f /kgraph/<data.file>

After indexing, check that a file called <data.file>.index is created in the same directory.

search

Search is running as a simple HTTP service on default port 8071. -p option allows one to change the port. Make sure to expose that port from container, like so:

docker run  -v <local_directory>:/kgraph \
            -p 8080:8080 \
            -it huahaiy/kgraph  \
            search.py -p 8080 \
                      -d <data.name1>,<data.name2>,<data.name3>
                      -f <data.file1>,<data.file2>,<data.file3>

As can be seen, multiple data sources can be specified for the searcher. -d gives the coma separated names assigned to each data files, which should correspond to the d= parameters in the search URL; -f are the corresponding data file names that were used during indexing.

The search URL is something like this:

http://localhost:8080/search?d=data.name1&k=10&q=0.2,0,3,0.3

Here we ask for 10 nearest neighbours of the vector [0.2, 0.3, 0.3] in data.name1. Make sure the length of the query vector equals to the data file column numbers minus one (for the IDs column).

If all goes well, the JSON response will be something like this

{'ids': [id1, id2, id3, ..., id10], 
 'dists': [0.003, 0.004, 0.007, ..., 0.012]}

Here we return both the IDs of the nearest neighbours and the corresponding distances to the query vector. Currently, only Euclidean distances are supported.

license

BSD, see LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
kgraph		kgraph
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
index.py		index.py
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kgraph

kgraph

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

index.py

index.py

search.py

search.py

Repository files navigation

docker-kgraph

index

search

license

About

Releases

Packages

Languages

License

huahaiy/docker-kgraph

Folders and files

Latest commit

History

Repository files navigation

docker-kgraph

index

search

license

About

Resources

License

Stars

Watchers

Forks

Languages