CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

Overview

The reverse k nearest neighbors (RkNN) requires to find every data point that has the query point as one of its k closest points. According to the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine candidates whether belong to the RkNN set or not. With CSD, the vast majority of candidates can be verified with a computational complexity of O(1). Based on CSD, a novel RkNN algorithm CSD-RkNN is implemented. The comparative experiments are conducted between CSD-RkNN and other two state-of-the-art RkNN algorithms, SLICE and VR-RkNN. The experimental results indicate that the efficiency of CSD-RkNN is significantly higher than the other two algorithms.

Project structure

├── data/: real data set
│   └── us50000.txt
├── common/: common data structures, spatial indices and ploting functions
│   ├── __init__.py
│   ├── VoronoiDiagram.py
│   ├── Rtree.py
│   ├── KDtree.py
│   ├── VoRtree.py
│   └── VoKDtree.py
├── RkNN/: RkNN algorithms (including CSD-RkNN, SLICE and VR-RkNN)
│   ├── __init__.py
│   ├── CSD.py
│   ├── SLICE.py
│   └── VR.py
└── test/: test benchmarks
    └── benchmark.py

Usage

Generate the facility set and user set:

>>> import numpy as np
>>> from shapely.geometry import Point
>>> bounds = (0, 0, 10000, 10000)
>>> users = [(i, Point(np.random.uniform(bounds[0], bounds[2]), np.random.uniform(bounds[1], bounds[3])))
             for i in range(1000)]
>>> facilities = [(i, Point(np.random.uniform(bounds[0], bounds[2]), np.random.uniform(bounds[1], bounds[3])))
                  for i in range(1000)]

Index the facilities and users:

>>> from common.VoKDtree import VoKDtreeIndex
>>> user_index = VoKDtreeIndex(bounds[0], bounds[1], bounds[2], bounds[3], users)
>>> facility_index = VoKDtreeIndex(bounds[0], bounds[1], bounds[2], bounds[3], facilities)

Retrieve the Bi-RkNNs of the (q-1)-th facility form the user set:

>>> from RkNN.CSD import BiRkNN, MonoRkNN
>>> q, k = np.random.randint(0, len(facilities)), 10 
>>> bi_rknn = BiRkNN(q, k, facility_index, user_index)
>>> print bi_rknn
[(727, <shapely.geometry.point.Point object at 0x1037d1590>), 
 (175, <shapely.geometry.point.Point object at 0x103796950>), 
 ..., (13, <shapely.geometry.point.Point object at 0x10375f050>)]

Retrieve the Mono-RkNNs of the (q-1)-th facility form the facility set:

>>> mono_rknn = BiRkNN(q, k, facility_index, user_index)
>>> print mono_rknn
[(334, <shapely.geometry.point.Point object at 0x10380dfd0>),
 (906, <shapely.geometry.point.Point object at 0x103850110>),
 ..., (712, <shapely.geometry.point.Point object at 0x103835fd0>)]

Plot the result:

>>> import matplotlib.pyplot as plt
>>> from common import plot_points, plot_stars
>>> fig = plt.figure(figsize=(16, 8))
>>> bi_rknn_ax = fig.add_subplot(121)
>>> plot_points(bi_rknn_ax, facilities, 'gray', 4, 'Facilities')
>>> plot_points(bi_rknn_ax, users, 'blue', 4, 'Users')
>>> plot_stars(bi_rknn_ax,[facilities[q]],'green',20,'Query facility')
>>> plot_points(bi_rknn_ax, bi_rknn, 'red', 8, 'RkNNs')
>>> bi_rknn_ax.set_xlabel('Bi-RkNN')
>>> bi_rknn_ax.legend()
>>> mono_rknn_ax = fig.add_subplot(122)
>>> plot_points(mono_rknn_ax, facilities, 'gray', 4, 'Facilities')
>>> plot_stars(mono_rknn_ax,[facilities[q]],'green',20,'Query facility')
>>> plot_points(mono_rknn_ax, mono_rknn, 'red', 8, 'RkNNs')
>>> mono_rknn_ax.set_xlabel('Mono-RkNN')
>>> mono_rknn_ax.legend()
>>> plt.show()

Evaluation

If you want to evaluate the performance of CSD-RkNN against other algorithms (e.g., SLICE and VR-RkNN), we provide some benchmarks:

>>> from test import benchmark
>>> benchmark.Effect_of_data_size_on_BiRkNN(30) # evaluate the effect of data size on Bi-RkNN
>>> benchmark.Effect_of_data_size_on_MonoRkNN(30) # evaluate the effect of data size on Mono-RkNN
>>> benchmark.Effect_of_k_on_BiRkNN_in_real_distribution(30) # evaluate the effect of k on Bi-RkNN in real distribution
>>> benchmark.Effect_of_k_on_MonoRkNN_in_real_distribution(30) # evaluate the effect of k on Mono-RkNN in real distribution
>>> benchmark.Effect_of_k_on_BiRkNN_in_uniform_distribution(30) # evaluate the effect of k on Bi-RkNN in uniform distribution
>>> benchmark.Effect_of_k_on_MonoRkNN_in_uniform_distribution(30) # evaluate the effect of k on Mono-RkNN in uniform distribution
>>> benchmark.Effect_of_user_num_relative_to_facility_num(30) # evaluate the effect of the number of users relative to the number of facilities on Bi-RkNN
>>> benchmark.Effect_of_distribution(30) # evaluate the effect of data distribution on Bi-RkNN

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
RkNN		RkNN
common		common
data		data
images		images
test		test
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RkNN

RkNN

common

common

data

data

images

images

test

test

LICENSE

LICENSE

README.md

README.md

Repository files navigation

CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

Overview

Project structure

Usage

Evaluation

About

Releases

Packages

Languages

License

wanghaoccnu/CSD

Folders and files

Latest commit

History

Repository files navigation

CSD: Discriminance with Conic Section for Improving Reverse k Nearest Neighbors Queries

Overview

Project structure

Usage

Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages