[DOCS] Add training on CPU sections to docs (dmlc#3398)

ammuaj · Oct 14, 2021 · a47ab71 · a47ab71
1 parent 1886306
commit a47ab71
Show file tree

Hide file tree

Showing 4 changed files with 58 additions and 2 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -200,12 +200,14 @@
                  '../../tutorials/large',
                  '../../tutorials/dist',
                  '../../tutorials/models',
-                 '../../tutorials/multi']  # path to find sources
+                 '../../tutorials/multi',
+                 '../../tutorials/cpu']  # path to find sources
 gallery_dirs = ['tutorials/blitz/',
                 'tutorials/large/',
                 'tutorials/dist/',
                 'tutorials/models/',
-                'tutorials/multi/']  # path to generate docs
+                'tutorials/multi/',
+                'tutorials/cpu']  # path to generate docs
 reference_url = {
     'dgl' : None,
     'numpy': 'http://docs.scipy.org/doc/numpy/',

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -25,6 +25,7 @@ Welcome to Deep Graph Library Tutorials and Documentation
    guide/index
    guide_cn/index
    tutorials/large/index
+   tutorials/cpu/index
    tutorials/multi/index
    tutorials/dist/index
    tutorials/models/index

diff --git a/tutorials/cpu/README.txt b/tutorials/cpu/README.txt
@@ -0,0 +1,2 @@
+Training on CPUs
+=========================
diff --git a/tutorials/cpu/cpu_best_practises.py b/tutorials/cpu/cpu_best_practises.py
@@ -0,0 +1,51 @@
+"""
+CPU Best Pratices
+=====================================================
+
+This chapter focus on providing best practises for environment setup
+to get the best performance during training and inference on the CPU.
+
+Intel
+`````````````````````````````
+
+Hyper-treading
+---------------------------
+
+For specific workloads as GNN’s domain, suggested default setting for having best performance
+is to turn off hyperthreading.
+Turning off the hyper threading feature can be done at BIOS [#f1]_ or operating system level [#f2]_ [#f3]_ .
+
+
+OpenMP settings
+---------------------------
+
+During training on CPU, the training and dataloading part need to be maintained simultaneously.
+Best performance of parallelization in OpenMP
+can be achieved by setting up the optimal number of working threads and dataloading workers.
+
+**GNU OpenMP**
+    Default BKM for setting the number of OMP threads with Pytorch backend:
+
+    ``OMP_NUM_THREADS`` = number of physical cores – ``num_workers``
+
+    Number of physical cores can be checked by using ``lscpu`` ("Core(s) per socket")
+    or ``nproc`` command in Linux command line.
+    Below simple bash script example for setting the OMP threads and ``pytorch`` backend dataloader workers:
+
+    .. code:: bash
+
+        cores=`nproc`
+        num_workers=4
+        export OMP_NUM_THREADS=$(($cores-$num_workers))
+        python script.py --gpu -1 --num_workers=$num_workers
+
+    Depending on the dataset, model and CPU optimal number of dataloader workers and OpemMP threads may vary
+    but close to the general default advise presented above [#f4]_ .
+
+.. rubric:: Footnotes
+
+.. [#f1] https://www.intel.com/content/www/us/en/support/articles/000007645/boards-and-kits/desktop-boards.html
+.. [#f2] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/
+.. [#f3] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-ec2-windows-instances/
+.. [#f4] https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html
+"""