This folder contains scripts that run all nearest neighbor searches in a number of libraries. For the most part, the scripts are very bare-bones. For example, they don't even output the results.
To run the scripts, you'll obviously first need to install the libraries.
The /install
folder in this repo contains scripts for installing all of these libraries.
With all the libraries installed, just call the runtest.sh
script with a single parameter that is the dataset to test on.
The table below provides a brief description of the libraries compared against.
Library | Description |
---|---|
FLANN | The Fast Library for Approximate Nearest Neighbor queries. This C++ library is the standard method for nearest neighbor in Matlab/Octave and the OpenCV computer vision toolkit. |
Julia | A popular new language designed from the ground up for fast data processing. Julia supports faster nearest neighbor queries using the KDTrees.jl package. |
Langford's cover tree | A reference implementation for the cover tree data structure created by John Langford. The implementation is in C, and the data structure is widely included in C/C++ machine learning libraries. |
MLPack | A C++ library for machine learning. MLPack was the first library to demonstrate the utility of generic programming in machine learning. The interface for nearest neighbor queries lets you use either a cover tree or kdtree. |
R | A popular language for statisticians. Nearest neighbor queries are implemented in the FNN package, which provides bindings to the C-based ANN library for kdtrees. |
scikit-learn | The Python machine learning toolkit. The documentation is very beginner friendly and easy to learn. The interface for nearest neighbor queries lets you use either a ball tree or kdtree to speed up the calculations. Both data structures were written in Cython. |
Weka | A Java data mining tool with a popular GUI frontend. Nearest neighbor queries in Weka are very, very slow for me and not remotely competitive with any of the libraries above. |