Skip to content

Commit

Permalink
[DistGNN, Graph partitioning] Libra partition (dmlc#3376)
Browse files Browse the repository at this point in the history
* added distgnn plus libra codebase

* Dist application codes

* added comments in partition code. changed the interface of partitioning call.

* updated readme

* create libra partitioning branch for the PR

* removed disgnn files for first PR

* updated kernel.cc

* added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc

* fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script.

* removed libra2dgl.py

* fixed the lint error and cleaned the code.

* revisions due to PR comments. added distgnn/tools contains partitions routines

* update 2 PR revision I

* fixed errors; also improved the runtime by 10x.

* fixed minor lint error

* fixed some more lints

* PR revision II changed the interface of libra partition function

* rewrite docstring

Co-authored-by: Quan (Andy) Gan <[email protected]>
  • Loading branch information
yuk12 and BarclayII authored Dec 15, 2021
1 parent 02880e9 commit 78e0dae
Show file tree
Hide file tree
Showing 11 changed files with 1,378 additions and 1 deletion.
32 changes: 32 additions & 0 deletions examples/pytorch/graphsage/distgnn/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## DistGNN vertex-cut based graph partitioning (using Libra)

### How to run graph partitioning
```python partition_graph.py --dataset <dataset> --num-parts <num_parts> --out-dir <output_location>```

Example: The following command-line creates 4 partitions of pubmed graph
``` python partition_graph.py --dataset pubmed --num-parts 4 --out-dir ./```

The ouptut partitions are created in the current directory in Libra_result_\<dataset\>/ folder.
The *upcoming DistGNN* application can directly use these partitions for distributed training.

### How Libra partitioning works
Libra is a vertex-cut based graph partitioning method. It applies greedy heuristics to uniquely distribute the input graph edges among the partitions. It generates the partitions as a list of edges. Script ```libra_partition.py``` after generates the Libra partitions and converts the Libra output to DGL/DistGNN input format.


Note: Current Libra implementation is sequential. Extra overhead is paid due to the additional work of format conversion of the partitioned graph.


### Expected partitioning timinigs
Cora, Pubmed, Citeseer: < 10 sec (<10GB)
Reddit: ~150 sec (~ 25GB)
OGBN-Products: ~200 sec (~30GB)
Proteins: 1800 sec (Format conversion from public data takes time) (~100GB)
OGBN-Paper100M: 2500 sec (~200GB)


### Settings
Tested with:
Cent OS 7.6
gcc v8.3.0
PyTorch 1.7.1
Python 3.7.10
62 changes: 62 additions & 0 deletions examples/pytorch/graphsage/distgnn/partition_graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
r"""
Copyright (c) 2021 Intel Corporation
\file Graph partitioning
\brief Calls Libra - Vertex-cut based graph partitioner for distirbuted training
\author Vasimuddin Md <[email protected]>,
Guixiang Ma <[email protected]>
Sanchit Misra <[email protected]>,
Ramanarayan Mohanty <[email protected]>,
Sasikanth Avancha <[email protected]>
Nesreen K. Ahmed <[email protected]>
"""


import os
import sys
import numpy as np
import csv
from statistics import mean
import random
import time
import argparse
from load_graph import load_ogb
import dgl
from dgl.data import load_data
from dgl.distgnn.partition import partition_graph
from dgl.distgnn.tools import load_proteins
from dgl.base import DGLError


if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('--dataset', type=str, default='cora')
argparser.add_argument('--num-parts', type=int, default=2)
argparser.add_argument('--out-dir', type=str, default='./')
args = argparser.parse_args()

dataset = args.dataset
num_community = args.num_parts
out_dir = 'Libra_result_' + dataset ## "Libra_result_" prefix is mandatory
resultdir = os.path.join(args.out_dir, out_dir)

print("Input dataset for partitioning: ", dataset)
if args.dataset == 'ogbn-products':
print("Loading ogbn-products")
G, _ = load_ogb('ogbn-products')
elif args.dataset == 'ogbn-papers100M':
print("Loading ogbn-papers100M")
G, _ = load_ogb('ogbn-papers100M')
elif args.dataset == 'proteins':
G = load_proteins('proteins')
elif args.dataset == 'ogbn-arxiv':
print("Loading ogbn-arxiv")
G, _ = load_ogb('ogbn-arxiv')
else:
try:
G = load_data(args)[0]
except:
raise DGLError("Error: Dataset {} not found !!!".format(dataset))

print("Done loading the graph.", flush=True)

partition_graph(num_community, G, resultdir)
35 changes: 35 additions & 0 deletions examples/pytorch/graphsage/experimental/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,38 @@
## DistGNN vertex-cut based graph partitioning (using Libra)

### How to run graph partitioning
```python ../../../../python/dgl/distgnn/partition/main_Libra.py <dataset> <#partitions>```

Example: The following command-line creates 4 partitions of pubmed graph
```python ../../../../python/dgl/distgnn/partition/main_Libra.py pubmed 4```

The ouptut partitions are created in the current directory in Libra_result_\<dataset\>/ folder.
The *upcoming DistGNN* application can directly use these partitions for distributed training.

### How Libra partitioning works
Libra is a vertex-cut based graph partitioning method. It applies greedy heuristics to uniquely distribute the input graph edges among the partitions. It generates the partitions as a list of edges. Script ```main_Libra.py``` after getting the Libra partitions converts the Libra output to DGL/DistGNN input format.


Note: Current Libra implementation is sequential. Extra overhead is paid due to the additional work of format conversion of the partitioned graph.


### Expected partitioning timinigs
Cora, Pubmed, Citeseer: < 10 sec (<10GB)
Reddit: 1500 sec (~ 25GB)
OGBN-Products: ~2000 sec (~30GB)
Proteins: 18000 sec (Format conversion from public data takes time) (~100GB)
OGBN-Paper100M: 25000 sec (~200GB)


### Settings
Tested with:
Cent OS 7.6
gcc v8.3.0
PyTorch 1.7.1
Python 3.7.10



## Distributed training

This is an example of training GraphSage in a distributed fashion. Before training, please install some python libs by pip:
Expand Down
5 changes: 5 additions & 0 deletions python/dgl/distgnn/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""
This package contains DistGNN and Libra based graph partitioning tools.
"""
from . import partition
from . import tools
4 changes: 4 additions & 0 deletions python/dgl/distgnn/partition/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
This package contains Libra graph partitioner.
"""
from .libra_partition import partition_graph
Loading

0 comments on commit 78e0dae

Please sign in to comment.