forked from dmlc/dgl
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DistGNN, Graph partitioning] Libra partition (dmlc#3376)
* added distgnn plus libra codebase * Dist application codes * added comments in partition code. changed the interface of partitioning call. * updated readme * create libra partitioning branch for the PR * removed disgnn files for first PR * updated kernel.cc * added libra_partition.cc and moved libra code from kernel.cc to libra_partition.cc * fixed lint error; merged libra2dgl.py and main_Libra.py to libra_partition.py; added graphsage/distgnn folder and partition script. * removed libra2dgl.py * fixed the lint error and cleaned the code. * revisions due to PR comments. added distgnn/tools contains partitions routines * update 2 PR revision I * fixed errors; also improved the runtime by 10x. * fixed minor lint error * fixed some more lints * PR revision II changed the interface of libra partition function * rewrite docstring Co-authored-by: Quan (Andy) Gan <[email protected]>
- Loading branch information
Showing
11 changed files
with
1,378 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## DistGNN vertex-cut based graph partitioning (using Libra) | ||
|
||
### How to run graph partitioning | ||
```python partition_graph.py --dataset <dataset> --num-parts <num_parts> --out-dir <output_location>``` | ||
|
||
Example: The following command-line creates 4 partitions of pubmed graph | ||
``` python partition_graph.py --dataset pubmed --num-parts 4 --out-dir ./``` | ||
|
||
The ouptut partitions are created in the current directory in Libra_result_\<dataset\>/ folder. | ||
The *upcoming DistGNN* application can directly use these partitions for distributed training. | ||
|
||
### How Libra partitioning works | ||
Libra is a vertex-cut based graph partitioning method. It applies greedy heuristics to uniquely distribute the input graph edges among the partitions. It generates the partitions as a list of edges. Script ```libra_partition.py``` after generates the Libra partitions and converts the Libra output to DGL/DistGNN input format. | ||
|
||
|
||
Note: Current Libra implementation is sequential. Extra overhead is paid due to the additional work of format conversion of the partitioned graph. | ||
|
||
|
||
### Expected partitioning timinigs | ||
Cora, Pubmed, Citeseer: < 10 sec (<10GB) | ||
Reddit: ~150 sec (~ 25GB) | ||
OGBN-Products: ~200 sec (~30GB) | ||
Proteins: 1800 sec (Format conversion from public data takes time) (~100GB) | ||
OGBN-Paper100M: 2500 sec (~200GB) | ||
|
||
|
||
### Settings | ||
Tested with: | ||
Cent OS 7.6 | ||
gcc v8.3.0 | ||
PyTorch 1.7.1 | ||
Python 3.7.10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
r""" | ||
Copyright (c) 2021 Intel Corporation | ||
\file Graph partitioning | ||
\brief Calls Libra - Vertex-cut based graph partitioner for distirbuted training | ||
\author Vasimuddin Md <[email protected]>, | ||
Guixiang Ma <[email protected]> | ||
Sanchit Misra <[email protected]>, | ||
Ramanarayan Mohanty <[email protected]>, | ||
Sasikanth Avancha <[email protected]> | ||
Nesreen K. Ahmed <[email protected]> | ||
""" | ||
|
||
|
||
import os | ||
import sys | ||
import numpy as np | ||
import csv | ||
from statistics import mean | ||
import random | ||
import time | ||
import argparse | ||
from load_graph import load_ogb | ||
import dgl | ||
from dgl.data import load_data | ||
from dgl.distgnn.partition import partition_graph | ||
from dgl.distgnn.tools import load_proteins | ||
from dgl.base import DGLError | ||
|
||
|
||
if __name__ == "__main__": | ||
argparser = argparse.ArgumentParser() | ||
argparser.add_argument('--dataset', type=str, default='cora') | ||
argparser.add_argument('--num-parts', type=int, default=2) | ||
argparser.add_argument('--out-dir', type=str, default='./') | ||
args = argparser.parse_args() | ||
|
||
dataset = args.dataset | ||
num_community = args.num_parts | ||
out_dir = 'Libra_result_' + dataset ## "Libra_result_" prefix is mandatory | ||
resultdir = os.path.join(args.out_dir, out_dir) | ||
|
||
print("Input dataset for partitioning: ", dataset) | ||
if args.dataset == 'ogbn-products': | ||
print("Loading ogbn-products") | ||
G, _ = load_ogb('ogbn-products') | ||
elif args.dataset == 'ogbn-papers100M': | ||
print("Loading ogbn-papers100M") | ||
G, _ = load_ogb('ogbn-papers100M') | ||
elif args.dataset == 'proteins': | ||
G = load_proteins('proteins') | ||
elif args.dataset == 'ogbn-arxiv': | ||
print("Loading ogbn-arxiv") | ||
G, _ = load_ogb('ogbn-arxiv') | ||
else: | ||
try: | ||
G = load_data(args)[0] | ||
except: | ||
raise DGLError("Error: Dataset {} not found !!!".format(dataset)) | ||
|
||
print("Done loading the graph.", flush=True) | ||
|
||
partition_graph(num_community, G, resultdir) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
""" | ||
This package contains DistGNN and Libra based graph partitioning tools. | ||
""" | ||
from . import partition | ||
from . import tools |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
""" | ||
This package contains Libra graph partitioner. | ||
""" | ||
from .libra_partition import partition_graph |
Oops, something went wrong.