[Doc] New README (dmlc#1259)

* new readme * try resize * try center * more imgs * fix
ZSCDumin · Feb 12, 2020 · bdf1bb5 · bdf1bb5
1 parent 23893bb
commit bdf1bb5
Showing 1 changed file with 138 additions and 121 deletions.
diff --git a/README.md b/README.md
@@ -1,67 +1,154 @@
 # Deep Graph Library (DGL)
+
 [![Build Status](http://ci.dgl.ai:80/buildStatus/icon?job=DGL/master)](http://ci.dgl.ai:80/job/DGL/job/master/)
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE)
 
-[Documentation](https://docs.dgl.ai) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) |
-[Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Discussion Forum](https://discuss.dgl.ai)
+[Documentation](https://docs.dgl.ai) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) | [Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Discussion Forum](https://discuss.dgl.ai)
 
-Model Zoos: [Chemistry](https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo) | [Citation Networks](https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo/citation_network)
 
-DGL is a Python package that interfaces between existing tensor libraries and data being expressed as
-graphs.
+DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.
 
-It makes implementing graph neural networks (including Graph Convolution Networks, TreeLSTM, and many others) easy while
-maintaining high computation efficiency.
+<p align="center">
+  <img src="https://i.imgur.com/DwA1NbZ.png" alt="DGL v0.4 architecture" width="600">
+  <br>
+  <b>Figure</b>: DGL Overall Architecture
+</p>
 
-All model examples can be found [here](https://github.com/dmlc/dgl/tree/master/examples).
 
-A summary of part of the model accuracy and training speed with the Pytorch backend (on Amazon EC2 p3.2x instance (w/ V100 GPU)), as compared with the best open-source implementations:
+## Using DGL
 
-| Model                                                            | Reported <br> Accuracy | DGL <br> Accuracy | Author's training speed (epoch time)                                          | DGL speed (epoch time) | Improvement |
-| ---------------------------------------------------------------- | ---------------------- | ----------------- | ----------------------------------------------------------------------------- | ---------------------- | ----------- |
-| [GCN](https://arxiv.org/abs/1609.02907)                          | 81.5%                  | 81.0%             | [0.0051s (TF)](https://github.com/tkipf/gcn)                                  | 0.0031s                | 1.64x       |
-| [GAT](https://arxiv.org/abs/1710.10903)                          | 83.0%                  | 83.9%             | [0.0982s (TF)](https://github.com/PetarV-/GAT)                                | 0.0113s                | 8.69x       |
-| [SGC](https://arxiv.org/abs/1902.07153)                          | 81.0%                  | 81.9%             | n/a                                                                           | 0.0008s                | n/a         |
-| [TreeLSTM](http://arxiv.org/abs/1503.00075)                      | 51.0%                  | 51.72%            | [14.02s (DyNet)](https://github.com/clab/dynet/tree/master/examples/treelstm) | 3.18s                  | 4.3x        |
-| [R-GCN <br> (classification)](https://arxiv.org/abs/1703.06103)  | 73.23%                 | 73.53%            | [0.2853s (Theano)](https://github.com/tkipf/relational-gcn)                   | 0.0075s                | 38.2x       |
-| [R-GCN <br> (link prediction)](https://arxiv.org/abs/1703.06103) | 0.158                  | 0.151             | [2.204s (TF)](https://github.com/MichSchli/RelationPrediction)                | 0.453s                 | 4.86x       |
-| [JTNN](https://arxiv.org/abs/1802.04364)                         | 96.44%                 | 96.44%            | [1826s (Pytorch)](https://github.com/wengong-jin/icml18-jtnn)                 | 743s                   | 2.5x        |
-| [LGNN](https://arxiv.org/abs/1705.08415)                         | 94%                    | 94%               | n/a                                                                           | 1.45s                  | n/a         |
-| [DGMG](https://arxiv.org/pdf/1803.03324.pdf)                     | 84%                    | 90%               | n/a                                                                           | 238s                   | n/a         |
-| [GraphWriter](https://www.aclweb.org/anthology/N19-1238.pdf)     | 14.3(BLEU)             | 14.31(BLEU)       | [1970s (PyTorch)](https://github.com/rikdz/GraphWriter)                       | 1192s                  | 1.65x       |
+**A data scientist** may want to apply a pre-trained model to your data right away. For this you can use DGL's [Application packages, formally *Model Zoo*](https://github.com/dmlc/dgl/tree/master/apps). Application packages are developed for domain applications, as is the case for [DGL-LifeScience](https://github.com/dmlc/dgl/tree/master/apps/life_sci). We will soon add model zoo for knowledge graph embedding learning and recommender systems. Here's how you will use a pretrained model:
+```python
+from dgl.data.chem import Tox21, smiles_to_bigraph, CanonicalAtomFeaturizer
+from dgl import model_zoo
 
-With the MXNet/Gluon backend , we scaled a graph of 50M nodes and 150M edges on a P3.8xlarge instance, 
-with 160s per epoch, on SSE ([Stochastic Steady-state Embedding](https://www.cc.gatech.edu/~hdai8/pdf/equilibrium_embedding.pdf)), 
-a model similar to GCN. 
+dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
+model = model_zoo.chem.load_pretrained('GCN_Tox21') # Pretrained model loaded
+model.eval()
 
+smiles, g, label, mask = dataset[0]
+feats = g.ndata.pop('h')
+label_pred = model(g, feats)
+```
 
-We are currently in Beta stage.  More features and improvements are coming.
+**Further reading**: DGL is released as a managed service on AWS SageMaker, see the medium posts for an easy trip to DGL on SageMaker([part1](https://medium.com/@julsimon/a-primer-on-graph-neural-networks-with-amazon-neptune-and-the-deep-graph-library-5ce64984a276) and [part2](https://medium.com/@julsimon/deep-graph-library-part-2-training-on-amazon-sagemaker-54d318dfc814)).
+
+**Researchers** can start from the growing list of [models implemented in DGL](https://github.com/dmlc/dgl/tree/master/examples). Developing new models does not mean that you have to start from scratch. Instead, you can reuse many [pre-built modules](https://docs.dgl.ai/api/python/nn.html). Here is how to get a standard two-layer graph convolutional model with a pre-built GraphConv module:
+```python
+from dgl.nn.pytorch import GraphConv
+import torch.nn.functional as F
+
+# build a two-layer GCN with ReLU as the activation in between
+class GCN(nn.Module):
+    def __init__(self, in_feats, h_feats, num_classes):
+        super(GCN, self).__init__()
+        self.gcn_layer1 = GraphConv(in_feats, h_feats)
+        self.gcn_layer2 = GraphConv(h_feats, num_classes)
+
+    def forward(self, graph, inputs):
+        h = self.gcn_layer1(graph, inputs)
+        h = F.relu(h)
+        h = self.gcn_layer2(graph, h)
+        return h
+```
+
+Next level down, you may want to innovate your own module. DGL offers a succinct message-passing interface (see tutorial [here](https://docs.dgl.ai/tutorials/basics/3_pagerank.html)). Here is how Graph Attention Network (GAT) is implemented ([complete codes](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv)). Of course, you can also find GAT as a module [GATConv](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv):
+```python
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Define a GAT layer
+class GATLayer(nn.Module):
+    def __init__(self, in_feats, out_feats):
+        super(GATLayer, self).__init__()
+        self.linear_func = nn.Linear(in_feats, out_feats, bias=False)
+        self.attention_func = nn.Linear(2 * out_feats, 1, bias=False)
+
+    def edge_attention(self, edges):
+        concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
+        src_e = self.attention_func(concat_z)
+        src_e = F.leaky_relu(src_e)
+        return {'e': src_e}
+
+    def message_func(self, edges):
+        return {'z': edges.src['z'], 'e':edges.data['e']}
+
+    def reduce_func(self, nodes):
+        a = F.softmax(nodes.mailbox['e'], dim=1)
+        h = torch.sum(a * nodes.mailbox['z'], dim=1)
+        return {'h': h}
+
+    def forward(self, graph, h):
+        z = self.linear_func(h)
+        graph.ndata['z'] = z
+        graph.apply_edges(self.edge_attention)
+        graph.update_all(self.message_func, self.reduce_func)
+        return graph.ndata.pop('h')
+```
+## Performance and Scalability
+
+**Microbenchmark on speed and memory usage**: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here's a comparison to another pupular package -- PyG. The short story is that raw speed is similar, but DGL has much better memory management.
+
+
+| Dataset  |    Model     |                   Accuracy                   |                    Time <br> PyG &emsp;&emsp; DGL                    |           Memory <br> PyG &emsp;&emsp; DGL            |
+| -------- |:------------:|:--------------------------------------------:|:--------------------------------------------------------------------:|:-----------------------------------------------------:|
+| Cora     | GCN <br> GAT | 81.31 &plusmn; 0.88 <br> 83.98 &plusmn; 0.52 | <b>0.478</b> &emsp;&emsp; 0.666 <br> 1.608 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.2 &emsp;&emsp; <b>1.1</b> |
+| CiteSeer | GCN <br> GAT | 70.98 &plusmn; 0.68 <br> 69.96 &plusmn; 0.53 | <b>0.490</b> &emsp;&emsp; 0.674 <br> 1.606 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.3 &emsp;&emsp; <b>1.1</b> |
+| PubMed   | GCN <br> GAT | 79.00 &plusmn; 0.41 <br> 77.65 &plusmn; 0.32 | <b>0.491</b> &emsp;&emsp; 0.690 <br> 1.946 &emsp;&emsp; <b>1.393</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.6 &emsp;&emsp; <b>1.1</b> |
+| Reddit   |     GCN      |             93.46 &plusmn; 0.06              |                    *OOM*&emsp;&emsp; <b>28.6</b>                     |            *OOM* &emsp;&emsp; <b>11.7</b>             |
+| Reddit-S |     GCN      |                     N/A                      |                    29.12 &emsp;&emsp; <b>9.44</b>                    |             15.7 &emsp;&emsp; <b>3.6</b>              |
+
+Table: Training time(in seconds) for 200 epochs and memory consumption(GB)
+
+High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.
+| <img src="https://i.imgur.com/CvXc9Uu.png" width="400"> | <img src="https://i.imgur.com/HnCfJyU.png" width="400"> |
+| -------- | -------- |
 
-## News
+**Scalability**: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.
 
-v0.4 has just been released! DGL now support **heterogeneous graphs**, and comes
-with a subpackage **DGL-KE** that computes embeddings for large knowledge graphs
-such as Freebase (1.9 billion triplets).
-See release note [here](https://github.com/dmlc/dgl/releases/tag/v0.4).
+<p align="center">
+  <img src="https://i.imgur.com/IGERtVX.png" width="600">
+</p>
 
-We presented DGL at [GTC 2019](https://www.nvidia.com/en-us/gtc/) as an
-instructor-led training session. Check out our slides and tutorial materials
-[here](https://github.com/dglai/DGL-GTC2019)!!!
+| <img src="https://i.imgur.com/BugYro2.png"> |  <img src="https://i.imgur.com/KQ4nVdX.png"> | 
+| :---------------------------------------: | -- |
 
-## System requirements
+
+**Further reading**: Detailed comparison of DGL and other Graph alternatives can be found [here](https://arxiv.org/abs/1909.01315).
+
+## DGL Models and Applications
+
+### DGL for research
+Overall there are 30+ models implemented by using DGL:
+- [PyTorch](https://github.com/dmlc/dgl/tree/master/examples/pytorch)
+- [MXNet](https://github.com/dmlc/dgl/tree/master/examples/mxnet)
+- [TensorFlow](https://github.com/dmlc/dgl/tree/master/examples/tensorflow)
+
+### DGL for domain applications
+- [DGL-LifeSci](https://github.com/dmlc/dgl/tree/master/apps/life_sci), previously DGL-Chem
+- [DGL-KE](https://github.com/dmlc/dgl/tree/master/apps/kg)
+- DGL-RecSys(coming soon)
+
+### DGL for NLP/CV problems
+- [LSTM](https://github.com/dmlc/dgl/tree/master/examples/pytorch/tree_lstm)
+- [GraphWriter](https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphwriter)
+- [Capsule Network](https://github.com/dmlc/dgl/tree/master/examples/pytorch/capsule)
+
+We are currently in Beta stage.  More features and improvements are coming.
+
+
+## Installation
 
 DGL should work on
 
 * all Linux distributions no earlier than Ubuntu 16.04
 * macOS X
 * Windows 10
 
-DGL also requires Python 3.5 or later.  Python 2 support is coming.
+DGL requires Python 3.5 or later.
 
-Right now, DGL works on [PyTorch](https://pytorch.org) 0.4.1+ and [MXNet](https://mxnet.apache.org) nightly
-build.
+Right now, DGL works on [PyTorch](https://pytorch.org) 1.1.0+, [MXNet](https://mxnet.apache.org) nightly build, and [TensorFlow](https://tensorflow.org) 2.0+.
 
-## Installation
 
 ### Using anaconda
 
@@ -84,105 +171,35 @@ conda install -c dglteam dgl-cuda10.1  # CUDA 10.1
 | CUDA 10.0 | `pip install --pre dgl-cu100` | `pip install dgl-cu100` |
 | CUDA 10.1 | `pip install --pre dgl-cu101` | `pip install dgl-cu101` |
 
-### From source
+### Built from source code
 
 Refer to the guide [here](https://docs.dgl.ai/install/index.html#install-from-source).
 
-## How DGL looks like
 
-A graph can be constructed with feature tensors like this:
+## DGL Major Releases
 
-```python
-import dgl
-import torch as th
-
-g = dgl.DGLGraph()
-g.add_nodes(5)                          # add 5 nodes
-g.add_edges([0, 0, 0, 0], [1, 2, 3, 4]) # add 4 edges 0->1, 0->2, 0->3, 0->4
-g.ndata['h'] = th.randn(5, 3)           # assign one 3D vector to each node
-g.edata['h'] = th.randn(4, 4)           # assign one 4D vector to each edge
-```
+| Releases  | Date   | Features |
+|-----------|--------|-------------------------|
+| v0.4.2      | 01/24/2020 |  - Heterograph support <br> - TensorFlow support (experimental) <br> - MXNet GNN modules <br> | 
+| v0.3.1 | 08/23/2019 | - APIs for GNN modules <br> - Model zoo (DGL-Chem) <br> - New installation |
+| v0.2 | 03/09/2019 | - Graph sampling APIs <br> - Speed improvement |
+| v0.1 | 12/07/2018 | - Basic DGL APIs <br> - PyTorch and MXNet support <br> - GNN model examples and tutorials |
 
-This is *everything* to implement a single layer for Graph Convolutional Network on PyTorch:
-
-```python
-import dgl.function as fn
-import torch.nn as nn
-import torch.nn.functional as F
-from dgl import DGLGraph
-
-msg_func = fn.copy_src(src='h', out='m')
-reduce_func = fn.sum(msg='m', out='h')
-
-class GCNLayer(nn.Module):
-    def __init__(self, in_feats, out_feats):
-        super(GCNLayer, self).__init__()
-        self.linear = nn.Linear(in_feats, out_feats)
-
-    def apply(self, nodes):
-        return {'h': F.relu(self.linear(nodes.data['h']))}
-
-    def forward(self, g, feature):
-        g.ndata['h'] = feature
-        g.update_all(msg_func, reduce_func)
-        g.apply_nodes(func=self.apply)
-        return g.ndata.pop('h')
-```
-
-One can also customize how message and reduce function works.  The following code
-demonstrates a (simplified version of) Graph Attention Network (GAT) layer:
-
-```python
-def msg_func(edges):
-    return {'k': edges.src['k'], 'v': edges.src['v']}
-
-def reduce_func(nodes):
-    # nodes.data['q'] has the shape
-    #     (number_of_nodes, feature_dims)
-    # nodes.data['k'] and nodes.data['v'] have the shape
-    #     (number_of_nodes, number_of_incoming_messages, feature_dims)
-    # You only need to deal with the case where all nodes have the same number
-    # of incoming messages.
-    q = nodes.data['q'][:, None]
-    k = nodes.mailbox['k']
-    v = nodes.mailbox['v']
-    s = F.softmax((q * k).sum(-1), 1)[:, :, None]
-    return {'v': th.sum(s * v, 1)}
-
-class GATLayer(nn.Module):
-    def __init__(self, in_feats, out_feats):
-        super(GATLayer, self).__init__()
-        self.Q = nn.Linear(in_feats, out_feats)
-        self.K = nn.Linear(in_feats, out_feats)
-        self.V = nn.Linear(in_feats, out_feats)
-
-    def apply(self, nodes):
-        return {'v': F.relu(self.linear(nodes.data['v']))}
-
-    def forward(self, g, feature):
-        g.ndata['v'] = self.V(feature)
-        g.ndata['q'] = self.Q(feature)
-        g.ndata['k'] = self.K(feature)
-        g.update_all(msg_func, reduce_func)
-        g.apply_nodes(func=self.apply)
-        return g.ndata['v']
-```
-
-For the basics of coding with DGL, please see [DGL basics](https://docs.dgl.ai/tutorials/basics/index.html).
-
-For more realistic, end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).
+## New to Deep Learning and Graph Deep Learning?
 
+Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/).
 
-## New to Deep Learning?
+For those who are new to graph nerual network, please see the [basic of DGL](https://docs.dgl.ai/tutorials/basics/index.html).
 
-Check out the open source book [*Dive into Deep Learning*](http://gluon.ai/).
+For audience who are looking for more advanced, realistic, and end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).
 
 
 ## Contributing
 
 Please let us know if you encounter a bug or have any suggestions by [filing an issue](https://github.com/dmlc/dgl/issues).
 
 We welcome all contributions from bug fixes to new features and extensions.
+
 We expect all contributions discussed in the issue tracker and going through PRs.  Please refer to our [contribution guide](https://docs.dgl.ai/contribute.html).
 
 ## Cite