[Doc] Add KNN Benchmark Results (dmlc#3055)

* add knn benchmark result * update
XINyexun · Jun 28, 2021 · 789b03f · 789b03f
1 parent 81456af
commit 789b03f
Show file tree

Hide file tree

Showing 2 changed files with 130 additions and 0 deletions.
diff --git a/docs/source/api/python/knn_benchmark.rst b/docs/source/api/python/knn_benchmark.rst
@@ -0,0 +1,128 @@
+.. _knn_benchmark:
+
+Benchmark the performance of KNN algorithms
+===========================================
+
+In this doc, we benchmark the performance on multiple K-Nearest Neighbor algorithms implemented by :func:`dgl.knn_graph`.
+
+Given a dataset of ``N`` samples with ``D`` dimensions, the common use case of KNN algorithms in graph learning is to build a KNN graph by finding the ``K`` nearest neighbors for each of the ``N`` samples among the dataset.
+
+Empirically, the three parameters, ``N``, ``D``, and ``K``, all have impact on the computation cost. To benchmark the algorithms, we pick a few represensitive datasets to cover most common scenarios:
+
+* A synthetic dataset with mixed gaussian samples: ``N = 1000``, ``D = 3``.
+* A point cloud sample from ModelNet: ``N = 10000``, ``D = 3``.
+* Subsets of MNIST
+    - A small subset: ``N = 1000``, ``D = 784``
+    - A medium subset: ``N = 10000``, ``D = 784``
+    - A large subset: ``N = 50000``, ``D = 784``
+
+Some notes:
+
+* ``bruteforce-sharemem`` is an optimized implementation of ``bruteforce`` on GPU.
+* ``kd-tree`` is currently only implemented on CPU.
+* ``bruteforce-blas`` conducts matrix multiplication, thus is memory inefficient.
+* ``nn-descent`` is an approximate algorithm, and we also report the recall rate of its result.
+
+Results
+-------
+
+In this section, we show the runtime and recall rate (where applicable) for the algorithms under various scenarios.
+
+The experiments are run on an Amazon EC2 P3.2xlarge instance. This instance has 8 vCPUs with 61GB RAM, and one Tesla V100 GPU with 16GB RAM. In terms of the environment, we obtain the numbers with DGL==0.7.0(`64d0f3f <https://github.com/dmlc/dgl/commit/64d0f3f3554911ec06d015f1c9659180796adf9a>`_), PyTorch==1.8.1, CUDA==11.1 on Ubuntu 18.04.5 LTS.
+
+* **Mixed Gaussian:**
+
++---------------------+------------------+-------------------+------------------+------------------+
+| Model               | CPU                                  | GPU                                 |
+|                     +------------------+-------------------+------------------+------------------+
+|                     | K = 8            | K = 64            | K = 8            | K = 64           |
++=====================+==================+===================+==================+==================+
+| bruteforce-blas     | 0.010            | 0.011             | 0.002            | 0.003            |
++---------------------+------------------+-------------------+------------------+------------------+
+| kd-tree             | 0.004            | 0.006             | n/a              | n/a              |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce          | 0.004            | 0.006             | 0.126            | 0.009            |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce-sharemem | n/a              | n/a               | 0.002            | 0.003            |
++---------------------+------------------+-------------------+------------------+------------------+
+| nn-descent          | 0.014 (R: 0.985) | 0.148 (R: 1.000)  | 0.016 (R: 0.973) | 0.077 (R: 1.000) |
++---------------------+------------------+-------------------+------------------+------------------+
+
+* **Point Cloud**
+
++---------------------+------------------+-------------------+------------------+------------------+
+| Model               | CPU                                  | GPU                                 |
+|                     +------------------+-------------------+------------------+------------------+
+|                     | K = 8            | K = 64            | K = 8            | K = 64           |
++=====================+==================+===================+==================+==================+
+| bruteforce-blas     | 0.359            | 0.432             | 0.010            | 0.010            |
++---------------------+------------------+-------------------+------------------+------------------+
+| kd-tree             | 0.007            | 0.026             | n/a              | n/a              |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce          | 0.074            | 0.167             | 0.008            | 0.039            |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce-sharemem | n/a              | n/a               | 0.004            | 0.017            |
++---------------------+------------------+-------------------+------------------+------------------+
+| nn-descent          | 0.161 (R: 0.977) | 1.345 (R: 0.999)  | 0.086 (R: 0.966) | 0.445 (R: 0.999) |
++---------------------+------------------+-------------------+------------------+------------------+
+
+* **Small MNIST**
+
++---------------------+------------------+-------------------+------------------+------------------+
+| Model               | CPU                                  | GPU                                 |
+|                     +------------------+-------------------+------------------+------------------+
+|                     | K = 8            | K = 64            | K = 8            | K = 64           |
++=====================+==================+===================+==================+==================+
+| bruteforce-blas     | 0.014            | 0.015             | 0.002            | 0.002            |
++---------------------+------------------+-------------------+------------------+------------------+
+| kd-tree             | 0.179            | 0.182             | n/a              | n/a              |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce          | 0.173            | 0.228             | 0.123            | 0.170            |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce-sharemem | n/a              | n/a               | 0.045            | 0.054            |
++---------------------+------------------+-------------------+------------------+------------------+
+| nn-descent          | 0.060 (R: 0.878) | 1.077 (R: 0.999)  | 0.030 (R: 0.952) | 0.457 (R: 0.999) |
++---------------------+------------------+-------------------+------------------+------------------+
+
+* **Medium MNIST**
+
++---------------------+------------------+-------------------+------------------+------------------+
+| Model               | CPU                                  | GPU                                 |
+|                     +------------------+-------------------+------------------+------------------+
+|                     | K = 8            | K = 64            | K = 8            | K = 64           |
++=====================+==================+===================+==================+==================+
+| bruteforce-blas     | 0.897            | 0.970             | 0.019            | 0.023            |
++---------------------+------------------+-------------------+------------------+------------------+
+| kd-tree             | 18.902           | 18.928            | n/a              | n/a              |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce          | 14.495           | 17.652            | 2.058            | 2.588            |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce-sharemem | n/a              | n/a               | 2.257            | 2.524            |
++---------------------+------------------+-------------------+------------------+------------------+
+| nn-descent          | 0.804 (R: 0.755) | 14.108 (R: 0.999) | 0.158 (R: 0.900) | 1.794 (R: 0.999) |
++---------------------+------------------+-------------------+------------------+------------------+
+
+* **Large MNIST**
+
++---------------------+------------------+-------------------+------------------+------------------+
+| Model               | CPU                                  | GPU                                 |
+|                     +------------------+-------------------+------------------+------------------+
+|                     | K = 8            | K = 64            | K = 8            | K = 64           |
++=====================+==================+===================+==================+==================+
+| bruteforce-blas     | 21.829           | 22.135            | Out of Memory    | Out of Memory    |
++---------------------+------------------+-------------------+------------------+------------------+
+| kd-tree             | 542.688          | 573.379           | n/a              | n/a              |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce          | 373.823          | 432.963           | 10.317           | 12.639           |
++---------------------+------------------+-------------------+------------------+------------------+
+| bruteforce-sharemem | n/a              | n/a               | 53.133           | 58.419           |
++---------------------+------------------+-------------------+------------------+------------------+
+| nn-descent          | 4.995 (R: 0.658) | 75.487 (R: 0.999) | 1.478 (R: 0.860) | 15.698 (R: 0.999)| 
++---------------------+------------------+-------------------+------------------+------------------+
+
+Conclusion
+----------
+
+- As long as you have enough memory, ``bruteforce-blas`` is the default algorithm to go with.
+- Specifically, when ``D`` is small and the data is on CPU, ``kd-tree`` is the best algorithm.
+
diff --git a/python/dgl/transform.py b/python/dgl/transform.py
@@ -80,6 +80,8 @@ def knn_graph(x, k, algorithm='bruteforce-blas', dist='euclidean'):
     into a separate graph. DGL then composes the graphs into a large
     graph of multiple connected components.
 
+    See :doc:`the benchmark <../api/python/knn_benchmark>` for a complete benchmark result.
+
     Parameters
     ----------
     x : Tensor