[DistGNN, Graph partitioning] Libra partition (dmlc#3376)

* added distgnn plus libra codebase * Dist application codes * added comments in partition code. changed the interface of partitioning call. * updated readme * create libra partitioning branch for the PR * removed disgnn files for first PR * updated kernel.cc * added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc * fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script. * removed libra2dgl.py * fixed the lint error and cleaned the code. * revisions due to PR comments. added distgnn/tools contains partitions routines * update 2 PR revision I * fixed errors; also improved the runtime by 10x. * fixed minor lint error * fixed some more lints * PR revision II changed the interface of libra partition function * rewrite docstring Co-authored-by: Quan (Andy) Gan <[email protected]>
KoyamaSohei · Dec 15, 2021 · 78e0dae · 78e0dae
1 parent 02880e9
commit 78e0dae
Show file tree

Hide file tree

Showing 11 changed files with 1,378 additions and 1 deletion.
diff --git a/examples/pytorch/graphsage/distgnn/README.md b/examples/pytorch/graphsage/distgnn/README.md
@@ -0,0 +1,32 @@
+## DistGNN vertex-cut based graph partitioning (using Libra)
+
+### How to run graph partitioning
+```python partition_graph.py --dataset <dataset> --num-parts <num_parts> --out-dir <output_location>```
+
+Example: The following command-line creates 4 partitions of pubmed graph   
+``` python partition_graph.py --dataset pubmed --num-parts 4 --out-dir ./```
+
+The ouptut partitions are created in the current directory in Libra_result_\<dataset\>/ folder.  
+The *upcoming DistGNN* application can directly use these partitions for distributed training.  
+
+### How Libra partitioning works
+Libra is a vertex-cut based graph partitioning method. It applies greedy heuristics to uniquely distribute the input graph edges among the partitions. It generates the partitions as a list of edges. Script ```libra_partition.py```  after generates the Libra partitions and converts the Libra output to DGL/DistGNN input format.  
+
+
+Note: Current Libra implementation is sequential. Extra overhead is paid due to the additional work of format conversion of the partitioned graph.  
+
+
+### Expected partitioning timinigs  
+Cora, Pubmed, Citeseer: < 10 sec (<10GB)  
+Reddit: ~150 sec (~ 25GB)  
+OGBN-Products: ~200 sec (~30GB)  
+Proteins: 1800 sec (Format conversion from public data takes time)  (~100GB)  
+OGBN-Paper100M: 2500 sec (~200GB)  
+
+
+### Settings
+Tested with:
+Cent OS 7.6
+gcc v8.3.0
+PyTorch 1.7.1
+Python 3.7.10
diff --git a/examples/pytorch/graphsage/distgnn/partition_graph.py b/examples/pytorch/graphsage/distgnn/partition_graph.py
@@ -0,0 +1,62 @@
+r"""
+Copyright (c) 2021 Intel Corporation
+ \file Graph partitioning
+ \brief Calls Libra - Vertex-cut based graph partitioner for distirbuted training
+ \author Vasimuddin Md <[email protected]>,
+         Guixiang Ma <[email protected]>
+         Sanchit Misra <[email protected]>,
+         Ramanarayan Mohanty <[email protected]>,
+         Sasikanth Avancha <[email protected]>
+         Nesreen K. Ahmed <[email protected]>
+"""
+
+
+import os
+import sys
+import numpy as np
+import csv
+from statistics import mean
+import random
+import time
+import argparse
+from load_graph import load_ogb
+import dgl
+from dgl.data import load_data
+from dgl.distgnn.partition import partition_graph
+from dgl.distgnn.tools import load_proteins
+from dgl.base import DGLError
+
+
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser()
+    argparser.add_argument('--dataset', type=str, default='cora')
+    argparser.add_argument('--num-parts', type=int, default=2)
+    argparser.add_argument('--out-dir', type=str, default='./')
+    args = argparser.parse_args()
+
+    dataset = args.dataset
+    num_community = args.num_parts
+    out_dir = 'Libra_result_' + dataset   ## "Libra_result_" prefix is mandatory
+    resultdir = os.path.join(args.out_dir, out_dir)
+
+    print("Input dataset for partitioning: ", dataset)
+    if args.dataset == 'ogbn-products':
+        print("Loading ogbn-products")
+        G, _ = load_ogb('ogbn-products')
+    elif args.dataset == 'ogbn-papers100M':
+        print("Loading ogbn-papers100M")
+        G, _ = load_ogb('ogbn-papers100M')
+    elif args.dataset == 'proteins':
+        G = load_proteins('proteins')
+    elif args.dataset == 'ogbn-arxiv':
+        print("Loading ogbn-arxiv")
+        G, _ = load_ogb('ogbn-arxiv')
+    else:
+        try:
+            G = load_data(args)[0]
+        except:
+            raise DGLError("Error: Dataset {} not found !!!".format(dataset))
+
+    print("Done loading the graph.", flush=True)
+
+    partition_graph(num_community, G, resultdir)
diff --git a/examples/pytorch/graphsage/experimental/README.md b/examples/pytorch/graphsage/experimental/README.md
@@ -1,3 +1,38 @@
+## DistGNN vertex-cut based graph partitioning (using Libra)
+
+### How to run graph partitioning
+```python ../../../../python/dgl/distgnn/partition/main_Libra.py <dataset> <#partitions>```
+
+Example: The following command-line creates 4 partitions of pubmed graph   
+```python ../../../../python/dgl/distgnn/partition/main_Libra.py pubmed 4```
+
+The ouptut partitions are created in the current directory in Libra_result_\<dataset\>/ folder.  
+The *upcoming DistGNN* application can directly use these partitions for distributed training.  
+
+### How Libra partitioning works
+Libra is a vertex-cut based graph partitioning method. It applies greedy heuristics to uniquely distribute the input graph edges among the partitions. It generates the partitions as a list of edges. Script ```main_Libra.py```  after getting the Libra partitions converts the Libra output to DGL/DistGNN input format.  
+
+
+Note: Current Libra implementation is sequential. Extra overhead is paid due to the additional work of format conversion of the partitioned graph.  
+
+
+### Expected partitioning timinigs  
+Cora, Pubmed, Citeseer: < 10 sec (<10GB)  
+Reddit: 1500 sec (~ 25GB)  
+OGBN-Products: ~2000 sec (~30GB)  
+Proteins: 18000 sec (Format conversion from public data takes time)  (~100GB)  
+OGBN-Paper100M: 25000 sec (~200GB)  
+
+
+### Settings
+Tested with:
+Cent OS 7.6
+gcc v8.3.0
+PyTorch 1.7.1
+Python 3.7.10
+
+
+
 ## Distributed training
 
 This is an example of training GraphSage in a distributed fashion. Before training, please install some python libs by pip:

diff --git a/python/dgl/distgnn/__init__.py b/python/dgl/distgnn/__init__.py
@@ -0,0 +1,5 @@
+"""
+This package contains DistGNN and Libra based graph partitioning tools.
+"""
+from . import partition
+from . import tools
diff --git a/python/dgl/distgnn/partition/__init__.py b/python/dgl/distgnn/partition/__init__.py
@@ -0,0 +1,4 @@
+"""
+This package contains Libra graph partitioner.
+"""
+from .libra_partition import partition_graph