The reverse k nearest neighbors (RkNN) requires to find every data point that has the query point as one of its k closest points. According to the characteristics of conic section, we propose a discriminance, named CSD (Conic Section Discriminance), to determine candidates whether belong to the RkNN set or not. With CSD, the vast majority of candidates can be verified with a computational complexity of O(1). Based on CSD, a novel RkNN algorithm CSD-RkNN is implemented. The comparative experiments are conducted between CSD-RkNN and other two state-of-the-art RkNN algorithms, SLICE and VR-RkNN. The experimental results indicate that the efficiency of CSD-RkNN is significantly higher than the other two algorithms.
├── data/: real data set
│ └── us50000.txt
├── common/: common data structures, spatial indices and ploting functions
│ ├── __init__.py
│ ├── VoronoiDiagram.py
│ ├── Rtree.py
│ ├── KDtree.py
│ ├── VoRtree.py
│ └── VoKDtree.py
├── RkNN/: RkNN algorithms (including CSD-RkNN, SLICE and VR-RkNN)
│ ├── __init__.py
│ ├── CSD.py
│ ├── SLICE.py
│ └── VR.py
└── test/: test benchmarks
└── benchmark.py
Generate the facility set and user set:
>>> import numpy as np
>>> from shapely.geometry import Point
>>> bounds = (0, 0, 10000, 10000)
>>> users = [(i, Point(np.random.uniform(bounds[0], bounds[2]), np.random.uniform(bounds[1], bounds[3])))
for i in range(1000)]
>>> facilities = [(i, Point(np.random.uniform(bounds[0], bounds[2]), np.random.uniform(bounds[1], bounds[3])))
for i in range(1000)]
Index the facilities and users:
>>> from common.VoKDtree import VoKDtreeIndex
>>> user_index = VoKDtreeIndex(bounds[0], bounds[1], bounds[2], bounds[3], users)
>>> facility_index = VoKDtreeIndex(bounds[0], bounds[1], bounds[2], bounds[3], facilities)
Retrieve the Bi-RkNNs of the (q-1)-th facility form the user set:
>>> from RkNN.CSD import BiRkNN, MonoRkNN
>>> q, k = np.random.randint(0, len(facilities)), 10
>>> bi_rknn = BiRkNN(q, k, facility_index, user_index)
>>> print bi_rknn
[(727, <shapely.geometry.point.Point object at 0x1037d1590>),
(175, <shapely.geometry.point.Point object at 0x103796950>),
..., (13, <shapely.geometry.point.Point object at 0x10375f050>)]
Retrieve the Mono-RkNNs of the (q-1)-th facility form the facility set:
>>> mono_rknn = BiRkNN(q, k, facility_index, user_index)
>>> print mono_rknn
[(334, <shapely.geometry.point.Point object at 0x10380dfd0>),
(906, <shapely.geometry.point.Point object at 0x103850110>),
..., (712, <shapely.geometry.point.Point object at 0x103835fd0>)]
Plot the result:
>>> import matplotlib.pyplot as plt
>>> from common import plot_points, plot_stars
>>> fig = plt.figure(figsize=(16, 8))
>>> bi_rknn_ax = fig.add_subplot(121)
>>> plot_points(bi_rknn_ax, facilities, 'gray', 4, 'Facilities')
>>> plot_points(bi_rknn_ax, users, 'blue', 4, 'Users')
>>> plot_stars(bi_rknn_ax,[facilities[q]],'green',20,'Query facility')
>>> plot_points(bi_rknn_ax, bi_rknn, 'red', 8, 'RkNNs')
>>> bi_rknn_ax.set_xlabel('Bi-RkNN')
>>> bi_rknn_ax.legend()
>>> mono_rknn_ax = fig.add_subplot(122)
>>> plot_points(mono_rknn_ax, facilities, 'gray', 4, 'Facilities')
>>> plot_stars(mono_rknn_ax,[facilities[q]],'green',20,'Query facility')
>>> plot_points(mono_rknn_ax, mono_rknn, 'red', 8, 'RkNNs')
>>> mono_rknn_ax.set_xlabel('Mono-RkNN')
>>> mono_rknn_ax.legend()
>>> plt.show()
If you want to evaluate the performance of CSD-RkNN against other algorithms (e.g., SLICE and VR-RkNN), we provide some benchmarks:
>>> from test import benchmark
>>> benchmark.Effect_of_data_size_on_BiRkNN(30) # evaluate the effect of data size on Bi-RkNN
>>> benchmark.Effect_of_data_size_on_MonoRkNN(30) # evaluate the effect of data size on Mono-RkNN
>>> benchmark.Effect_of_k_on_BiRkNN_in_real_distribution(30) # evaluate the effect of k on Bi-RkNN in real distribution
>>> benchmark.Effect_of_k_on_MonoRkNN_in_real_distribution(30) # evaluate the effect of k on Mono-RkNN in real distribution
>>> benchmark.Effect_of_k_on_BiRkNN_in_uniform_distribution(30) # evaluate the effect of k on Bi-RkNN in uniform distribution
>>> benchmark.Effect_of_k_on_MonoRkNN_in_uniform_distribution(30) # evaluate the effect of k on Mono-RkNN in uniform distribution
>>> benchmark.Effect_of_user_num_relative_to_facility_num(30) # evaluate the effect of the number of users relative to the number of facilities on Bi-RkNN
>>> benchmark.Effect_of_distribution(30) # evaluate the effect of data distribution on Bi-RkNN