What is the most efficient way to search/index for nearest neighbor/query within radius #528

nawarhalabi · 2021-10-29T10:47:14Z

nawarhalabi
Oct 29, 2021

Hi,

I am trying to implement two functions:

Retrieve the nearest point in an array of hex ids to a target hex id (other than the target itself of course)
Retrieve all hex ids in an array within a radius from a target hex id

Here is my ideas for both problems:

Nearest Neighbor:
a. Using k_ring with an (exponentially) increasing radius from target until I find an intersection between the set of hex ids in the k_ring with the hex ids in the original data. Problem is that k_ring gets slower for higher radius. Performance for this was much faster (10 fold on average) with kdtree or balltree
b. Using multiple indexes with different resolutions: Problem here is that keeping track of all hex ids in different resolutions for hundreds of millions of data is disk space consuming
Query with in radius:
a. Using k_ring with a fixed radius: Same issue as above, I have k_ring taking in the order of 100s of milliseconds for radius 50 for example. This is too slow
b. Using multiple indexes: Same issue as above

IDEA Is using a tree-like index which allows efficient prefix queries on string an option to use with the hex ids?

Happy to hear your thoughts