Skip to content

Commit

Permalink
[Doc] New README (dmlc#3178)
Browse files Browse the repository at this point in the history
New readme structure
  • Loading branch information
jermainewang authored Jul 23, 2021
1 parent 73c85b9 commit 0f25773
Showing 1 changed file with 38 additions and 185 deletions.
223 changes: 38 additions & 185 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@
[![Benchmark by ASV](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat)](https://asv.dgl.ai/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE)

Documentation ([Latest](https://docs.dgl.ai/en/latest/) | [Stable](https://docs.dgl.ai)) | [DGL at a glance](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-tutorials-basics-1-first-py) | [Model Tutorials](https://docs.dgl.ai/tutorials/models/index.html) | [Official Examples](examples/README.md) | [Discussion Forum](https://discuss.dgl.ai) | [Slack Channel](https://join.slack.com/t/deep-graph-library/shared_invite/zt-eb4ict1g-xcg3PhZAFAB8p6dtKuP6xQ)

**For a full list of official DGL examples, see [here](examples).**
[Website](https://www.dgl.ai) | [A Blitz Introduction to DGL](https://docs.dgl.ai/tutorials/blitz/index.html) | Documentation ([Latest](https://docs.dgl.ai/en/latest/) | [Stable](https://docs.dgl.ai)) | [Official Examples](examples/README.md) | [Discussion Forum](https://discuss.dgl.ai) | [Slack Channel](https://join.slack.com/t/deep-graph-library/shared_invite/zt-eb4ict1g-xcg3PhZAFAB8p6dtKuP6xQ)

DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.

Expand All @@ -20,151 +18,66 @@ DGL is an easy-to-use, high performance and scalable Python package for deep lea
<b>Figure</b>: DGL Overall Architecture
</p>

## <img src="http://data.dgl.ai/asset/image/new.png" width="30">DGL News
**07/22/2021**: The new **v0.7.0 release** includes a number of system optimizations, new models, new features and enhancements and bugfixes. See our [release note](https://github.com/dmlc/dgl/releases/tag/v0.7.0) for more details.

**02/26/2021**: The new **v0.6.0 release** includes distributed heterogeneous graph support, 13 more model examples, a Chinese translation of user guide thank to community support, and a new tutorial. See our [release note](https://github.com/dmlc/dgl/releases/tag/v0.6.0) for more details.

**09/05/2020**: We invite you to participate in the survey [here](https://forms.gle/Ej3jHCocACmb49Gp8) to make DGL better fit for your needs. Thanks!

## Using DGL
## Highlighted Features

**A data scientist** may want to apply a pre-trained model to your data right away. For this you can use DGL's [Application packages, formally *Model Zoo*](https://github.com/dmlc/dgl/tree/master/apps). Application packages are developed for domain applications, as is the case for [DGL-LifeScience](https://github.com/awslabs/dgl-lifesci). We will soon add model zoo for knowledge graph embedding learning and recommender systems. Here's how you will use a pretrained model:
```python
from dgllife.data import Tox21
from dgllife.model import load_pretrained
from dgllife.utils import smiles_to_bigraph, CanonicalAtomFeaturizer
### A GPU-ready graph library

dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer())
model = load_pretrained('GCN_Tox21') # Pretrained model loaded
model.eval()
DGL provides a powerful graph object that can reside on either CPU or GPU. It bundles structural data as well as features for a better control. We provide a variety of functions for computing with graph objects including efficient and customizable message passing primitives for Graph Neural Networks.

smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
label_pred = model(g, feats)
print(smiles) # CCOc1ccc2nc(S(N)(=O)=O)sc2c1
print(label_pred[:, mask != 0]) # Mask non-existing labels
# tensor([[ 1.4190, -0.1820, 1.2974, 1.4416, 0.6914,
# 2.0957, 0.5919, 0.7715, 1.7273, 0.2070]])
```
### Models, modules and benchmarks for GNN researchers

**Further reading**: DGL is released as a managed service on AWS SageMaker, see the medium posts for an easy trip to DGL on SageMaker([part1](https://medium.com/@julsimon/a-primer-on-graph-neural-networks-with-amazon-neptune-and-the-deep-graph-library-5ce64984a276) and [part2](https://medium.com/@julsimon/deep-graph-library-part-2-training-on-amazon-sagemaker-54d318dfc814)).

**Researchers** can start from the growing list of [models implemented in DGL](https://github.com/dmlc/dgl/tree/master/examples). Developing new models does not mean that you have to start from scratch. Instead, you can reuse many [pre-built modules](https://docs.dgl.ai/api/python/nn.html). Here is how to get a standard two-layer graph convolutional model with a pre-built GraphConv module:
```python
from dgl.nn.pytorch import GraphConv
import torch.nn.functional as F

# build a two-layer GCN with ReLU as the activation in between
class GCN(nn.Module):
def __init__(self, in_feats, h_feats, num_classes):
super(GCN, self).__init__()
self.gcn_layer1 = GraphConv(in_feats, h_feats)
self.gcn_layer2 = GraphConv(h_feats, num_classes)

def forward(self, graph, inputs):
h = self.gcn_layer1(graph, inputs)
h = F.relu(h)
h = self.gcn_layer2(graph, h)
return h
```
The field of graph deep learning is still rapidly evolving and many research ideas emerge by standing on the shoulders of giants. To ease the process, DGL collects a rich set of [example implementations](https://github.com/dmlc/dgl/tree/master/examples) of popular GNN models of a wide range of topics. Researchers can [search](https://www.dgl.ai/) for related models to innovate new ideas from or use them as baselines for experiments. Moreover, DGL provides many state-of-the-art [GNN layers and modules](https://docs.dgl.ai/api/python/nn.html) for users to build new model architectures. DGL is one of the preferred platforms for many standard graph deep learning benchmarks including [OGB](https://ogb.stanford.edu/) and [GNNBenchmarks](https://github.com/graphdeeplearning/benchmarking-gnns).

Next level down, you may want to innovate your own module. DGL offers a succinct message-passing interface (see tutorial [here](https://docs.dgl.ai/tutorials/basics/3_pagerank.html)). Here is how Graph Attention Network (GAT) is implemented ([complete codes](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv)). Of course, you can also find GAT as a module [GATConv](https://docs.dgl.ai/api/python/nn.pytorch.html#gatconv):
```python
import torch.nn as nn
import torch.nn.functional as F

# Define a GAT layer
class GATLayer(nn.Module):
def __init__(self, in_feats, out_feats):
super(GATLayer, self).__init__()
self.linear_func = nn.Linear(in_feats, out_feats, bias=False)
self.attention_func = nn.Linear(2 * out_feats, 1, bias=False)

def edge_attention(self, edges):
concat_z = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
src_e = self.attention_func(concat_z)
src_e = F.leaky_relu(src_e)
return {'e': src_e}

def message_func(self, edges):
return {'z': edges.src['z'], 'e':edges.data['e']}

def reduce_func(self, nodes):
a = F.softmax(nodes.mailbox['e'], dim=1)
h = torch.sum(a * nodes.mailbox['z'], dim=1)
return {'h': h}

def forward(self, graph, h):
z = self.linear_func(h)
graph.ndata['z'] = z
graph.apply_edges(self.edge_attention)
graph.update_all(self.message_func, self.reduce_func)
return graph.ndata.pop('h')
```
## Performance and Scalability
### Easy to learn and use

**Microbenchmark on speed and memory usage**: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here's a comparison to another popular package -- PyTorch Geometric (PyG). The short story is that raw speed is similar, but DGL has much better memory management.
DGL provides a plenty of learning materials for all kinds of users from ML researcher to domain experts. The [Blitz Introduction to DGL](https://docs.dgl.ai/tutorials/blitz/index.html) is a 120-minute tour of the basics of graph machine learning. The [User Guide](https://docs.dgl.ai/guide/index.html) explains in more details the concepts of graphs as well as the training methodology. All of them include code snippets in DGL that are runnable and ready to be plugged into one’s own pipeline.

### Scalable and efficient

| Dataset | Model | Accuracy | Time <br> PyG &emsp;&emsp; DGL | Memory <br> PyG &emsp;&emsp; DGL |
| -------- |:------------:|:--------------------------------------------:|:--------------------------------------------------------------------:|:-----------------------------------------------------:|
| Cora | GCN <br> GAT | 81.31 &plusmn; 0.88 <br> 83.98 &plusmn; 0.52 | <b>0.478</b> &emsp;&emsp; 0.666 <br> 1.608 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.2 &emsp;&emsp; <b>1.1</b> |
| CiteSeer | GCN <br> GAT | 70.98 &plusmn; 0.68 <br> 69.96 &plusmn; 0.53 | <b>0.490</b> &emsp;&emsp; 0.674 <br> 1.606 &emsp;&emsp; <b>1.399</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.3 &emsp;&emsp; <b>1.1</b> |
| PubMed | GCN <br> GAT | 79.00 &plusmn; 0.41 <br> 77.65 &plusmn; 0.32 | <b>0.491</b> &emsp;&emsp; 0.690 <br> 1.946 &emsp;&emsp; <b>1.393</b> | 1.1 &emsp;&emsp; 1.1 <br> 1.6 &emsp;&emsp; <b>1.1</b> |
| Reddit | GCN | 93.46 &plusmn; 0.06 | *OOM*&emsp;&emsp; <b>28.6</b> | *OOM* &emsp;&emsp; <b>11.7</b> |
| Reddit-S | GCN | N/A | 29.12 &emsp;&emsp; <b>9.44</b> | 15.7 &emsp;&emsp; <b>3.6</b> |
It is convenient to train models using DGL on large-scale graphs across multiple GPUs or multiple machines. DGL extensively optimizes the whole stack to reduce the overhead in communication, memory consumption and synchronization. As a result, DGL can easily scale to billion-sized graphs. See the [system performance note](https://docs.dgl.ai/performance.html) for the comparison with the other tools.

Table: Training time(in seconds) for 200 epochs and memory consumption(GB)
## Get Started

Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):
Users can install DGL from [pip and conda](https://www.dgl.ai/pages/start.html). Advanced users can follow the [instructions](https://docs.dgl.ai/install/index.html#install-from-source) to install from source.

| Dateset | Model | DGL | GraphNet | tf_geometric |
| ------- | ----- | --- | -------- | ------------ |
| Core | GCN | 0.0148 | 0.0152 | 0.0192 |
| Reddit | GCN | 0.1095 | OOM | OOM |
| PubMed | GCN | 0.0156 | 0.0553 | 0.0185 |
| PPI | GCN | 0.09 | 0.16 | 0.21 |
| Cora | GAT | 0.0442 | n/a | 0.058 |
| PPI | GAT | 0.398 | n/a | 0.752 |
For absolute beginners, start with [the Blitz Introduction to DGL](https://docs.dgl.ai/tutorials/blitz/index.html). It covers the basic concepts of common graph machine learning tasks and a step-by-step on building Graph Neural Networks (GNNs) to solve them.

High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.
| <img src="http://data.dgl.ai/asset/image/DGLvsPyG-time1.png" width="400"> | <img src="http://data.dgl.ai/asset/image/DGLvsPyG-time2.png" width="400"> |
| -------- | -------- |
For acquainted users who wish to learn more,

**Scalability**: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.
* Learn DGL by [example implementations](https://www.dgl.ai/) of popular GNN models.
* Read the [User Guide](https://docs.dgl.ai/guide/index.html) ([中文版链接](https://docs.dgl.ai/guide_cn/index.html)), which explains the concepts and usage of DGL in much more details.
* Go through the tutorials for advanced features like [stochastic training of GNNs](https://docs.dgl.ai/tutorials/large/index.html), training on [multi-GPU](https://docs.dgl.ai/tutorials/multi/index.html) or [multi-machine](https://docs.dgl.ai/tutorials/dist/index.html).
* [Study classical papers](https://docs.dgl.ai/tutorials/models/index.html) on graph machine learning alongside DGL.
* Search for the usage of a specific API in the [API reference manual](https://docs.dgl.ai/api/python/index.html), which organizes all DGL APIs by their namespace.

<p align="center">
<img src="http://data.dgl.ai/asset/image/one-four-GPUs.png" width="600">
</p>
All the learning materials are available at our [documentation site](https://docs.dgl.ai/). If you are new to deep learning in general,
check out the open source book [Dive into Deep Learning](https://d2l.ai/).

| <img src="http://data.dgl.ai/asset/image/one-four-GPUs-DGLvsGraphVite.png"> | <img src="http://data.dgl.ai/asset/image/one-fourMachines.png"> |
| :---------------------------------------: | -- |

## Community

**Further reading**: Detailed comparison of DGL and other Graph alternatives can be found [here](https://arxiv.org/abs/1909.01315).
### Get connected

## DGL Models and Applications
We provide multiple channels to connect you to the community of the DGL developers, users, and the general GNN academic researchers:

### DGL for research
Overall there are 30+ models implemented by using DGL:
- [PyTorch](https://github.com/dmlc/dgl/tree/master/examples/pytorch)
- [MXNet](https://github.com/dmlc/dgl/tree/master/examples/mxnet)
- [TensorFlow](https://github.com/dmlc/dgl/tree/master/examples/tensorflow)
* Our Slack channel, [click to join](https://join.slack.com/t/deep-graph-library/shared_invite/zt-eb4ict1g-xcg3PhZAFAB8p6dtKuP6xQ)
* Our discussion forum: https://discuss.dgl.ai/
* Our [Zhihu blog (in Chinese)](https://www.zhihu.com/column/c_1070749881013936128)
* Monthly GNN User Group online seminar ([event link](https://www.eventbrite.com/e/graph-neural-networks-user-group-tickets-137512275919?utm-medium=discovery&utm-campaign=social&utm-content=attendeeshare&aff=escb&utm-source=cp&utm-term=listing) | [past videos](https://www.youtube.com/channel/UCnmuSDY1pTlaFH1WRQElfTg))

### DGL for domain applications
- [DGL-LifeSci](https://github.com/awslabs/dgl-lifesci), previously DGL-Chem
- [DGL-KE](https://github.com/awslabs/dgl-ke)
- DGL-RecSys(coming soon)
Take the survey [here](https://forms.gle/Ej3jHCocACmb49Gp8) and leave any feedback to make DGL better fit for your needs. Thanks!

### DGL for NLP/CV problems
- [TreeLSTM](https://github.com/dmlc/dgl/tree/master/examples/pytorch/tree_lstm)
- [GraphWriter](https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphwriter)
- [Capsule Network](https://github.com/dmlc/dgl/tree/master/examples/pytorch/capsule)
### DGL-powered projects

We are currently in Beta stage. More features and improvements are coming.
* DGL-LifeSci: a DGL-based package for various applications in life science with graph neural networks. https://github.com/awslabs/dgl-lifesci
* DGL-KE: a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. https://github.com/awslabs/dgl-ke
* Benchmarking GNN: https://github.com/graphdeeplearning/benchmarking-gnns
* OGB: a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. https://ogb.stanford.edu/
* Graph4NLP: an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing. https://github.com/graph4ai/graph4nlp
* GNN-RecSys: https://github.com/je-dbl/GNN-RecSys
* Amazon Neptune ML: a new capability of Neptune that uses Graph Neural Networks (GNNs), a machine learning technique purpose-built for graphs, to make easy, fast, and more accurate predictions using graph data. https://aws.amazon.com/cn/neptune/machine-learning/

## Awesome Papers Using DGL
### Awesome Papers Using DGL

1. [**Benchmarking Graph Neural Networks**](https://arxiv.org/pdf/2003.00982.pdf), *Vijay Prakash Dwivedi, Chaitanya K. Joshi, Thomas Laurent, Yoshua Bengio, Xavier Bresson*

Expand Down Expand Up @@ -361,66 +274,6 @@ We are currently in Beta stage. More features and improvements are coming.

</details>

## Installation

DGL should work on

* all Linux distributions no earlier than Ubuntu 16.04
* macOS X
* Windows 10 (with [VC2015 Redistributable](https://www.microsoft.com/en-us/download/details.aspx?id=48145) Installed)

DGL requires Python 3.6 or later.

Right now, DGL works on [PyTorch](https://pytorch.org) 1.5.0+, [MXNet](https://mxnet.apache.org) 1.6+, and [TensorFlow](https://tensorflow.org) 2.3+.


### Using anaconda

```
conda install -c dglteam dgl # cpu version
conda install -c dglteam dgl-cuda9.2 # CUDA 9.2
conda install -c dglteam dgl-cuda10.1 # CUDA 10.1
conda install -c dglteam dgl-cuda10.2 # CUDA 10.2
conda install -c dglteam dgl-cuda11.0 # CUDA 11.0
conda install -c dglteam dgl-cuda11.1 # CUDA 11.1
```

### Using pip


| | Latest Nightly Build Version | Stable Version |
|-----------|-------------------------------|-------------------------|
| CPU | `pip install --pre dgl -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl -f https://data.dgl.ai/wheels-test/repo.html` |
| CUDA 9.2 | `pip install --pre dgl-cu92 -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl-cu92 -f https://data.dgl.ai/wheels-test/repo.html` |
| CUDA 10.1 | `pip install --pre dgl-cu101 -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl-cu101 -f https://data.dgl.ai/wheels-test/repo.html` |
| CUDA 10.2 | `pip install --pre dgl-cu102 -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl-cu102 -f https://data.dgl.ai/wheels-test/repo.html` |
| CUDA 11.0 | `pip install --pre dgl-cu110 -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl-cu110 -f https://data.dgl.ai/wheels-test/repo.html` |
| CUDA 11.1 | `pip install --pre dgl-cu111 -f https://data.dgl.ai/wheels-test/repo.html` | `pip install dgl-cu111 -f https://data.dgl.ai/wheels-test/repo.html` |

### Built from source code

Refer to the guide [here](https://docs.dgl.ai/install/index.html#install-from-source).


## DGL Major Releases

| Releases | Date | Features |
|-----------|--------|-------------------------|
| v0.4.3 | 03/31/2020 | - TensorFlow support <br> - DGL-KE <br> - DGL-LifeSci <br> - Heterograph sampling APIs (experimental) |
| v0.4.2 | 01/24/2020 | - Heterograph support <br> - TensorFlow support (experimental) <br> - MXNet GNN modules <br> |
| v0.3.1 | 08/23/2019 | - APIs for GNN modules <br> - Model zoo (DGL-Chem) <br> - New installation |
| v0.2 | 03/09/2019 | - Graph sampling APIs <br> - Speed improvement |
| v0.1 | 12/07/2018 | - Basic DGL APIs <br> - PyTorch and MXNet support <br> - GNN model examples and tutorials |

## New to Deep Learning and Graph Deep Learning?

Check out the open source book [*Dive into Deep Learning*](https://d2l.ai/).

For those who are new to graph neural network, please see the [basic of DGL](https://docs.dgl.ai/tutorials/basics/index.html).

For audience who are looking for more advanced, realistic, and end-to-end examples, please see [model tutorials](https://docs.dgl.ai/tutorials/models/index.html).


## Contributing

Please let us know if you encounter a bug or have any suggestions by [filing an issue](https://github.com/dmlc/dgl/issues).
Expand Down

0 comments on commit 0f25773

Please sign in to comment.