-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DGL-Go] Change name to dglgo (dmlc#3778)
* add * remove * fix * rework the readme and some changes * add png * update png * add recipe get Co-authored-by: Minjie Wang <[email protected]> Co-authored-by: Quan (Andy) Gan <[email protected]>
- Loading branch information
1 parent
d41d07d
commit 266b21e
Showing
58 changed files
with
657 additions
and
330 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,397 @@ | ||
# DGL-Go | ||
|
||
|
||
DGL-Go is a command line tool for users to get started with training, using and | ||
studying Graph Neural Networks (GNNs). Data scientists can quickly apply GNNs | ||
to their problems, whereas researchers will find it useful to customize their | ||
experiments. | ||
|
||
|
||
## Installation and get started | ||
|
||
DGL-Go requires DGL v0.8+ so please make sure DGL is updated properly. | ||
Install DGL-Go by `pip install dglgo` and type `dgl` in your console: | ||
``` | ||
Usage: dgl [OPTIONS] COMMAND [ARGS]... | ||
Options: | ||
--help Show this message and exit. | ||
Commands: | ||
configure Generate a configuration file | ||
export Export a runnable python script | ||
recipe Get example recipes | ||
train Launch training | ||
``` | ||
|
||
![img](./dglgo.png) | ||
|
||
Using DGL-Go is as easy as three steps: | ||
|
||
1. Use `dgl configure` to pick the task, dataset and model of your interests. It generates | ||
a configuration file for later use. You could also use `dgl recipe get` to retrieve | ||
a configuration file we provided. | ||
1. Use `dgl train` to launch training according to the configuration and see the results. | ||
1. Use `dgl export` to generate a *self-contained, reproducible* Python script for advanced | ||
customization, or try the model on custom data stored in CSV format. | ||
|
||
Next, we will walk through all these steps one-by-one. | ||
|
||
## Training GraphSAGE for node classification on Cora | ||
|
||
Let's use one of the most classical setups -- training a GraphSAGE model for node | ||
classification on the Cora citation graph dataset as an | ||
example. | ||
|
||
### Step one: `dgl configure` | ||
|
||
First step, use `dgl configure` to generate a YAML configuration file. | ||
|
||
``` | ||
dgl configure nodepred --data cora --model sage --cfg cora_sage.yaml | ||
``` | ||
|
||
Note that `nodepred` is the name of DGL-Go *pipeline*. For now, you can think of | ||
pipeline as training task: `nodepred` is for node prediction task; other | ||
options include `linkpred` for link prediction task, etc. The command will | ||
generate a configurate file `cora_sage.yaml` which includes: | ||
|
||
* Options for the selected dataset (i.e., `cora` here). | ||
* Model hyperparameters (e.g., number of layers, hidden size, etc.). | ||
* Training hyperparameters (e.g., learning rate, loss function, etc.). | ||
|
||
Different choices of task, model and datasets may give very different options, | ||
so DGL-Go also adds a comment for what each option does in the file. | ||
At this point you can also change options to explore optimization potentials. | ||
|
||
Below shows the configuration file generated by the command above. | ||
|
||
```yaml | ||
version: 0.0.1 | ||
pipeline_name: nodepred | ||
device: cpu | ||
data: | ||
name: cora | ||
split_ratio: # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset | ||
model: | ||
name: sage | ||
embed_size: -1 # The dimension of created embedding table. -1 means using original node embedding | ||
hidden_size: 16 # Hidden size. | ||
num_layers: 1 # Number of hidden layers. | ||
activation: relu # Activation function name under torch.nn.functional | ||
dropout: 0.5 # Dropout rate. | ||
aggregator_type: gcn # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``). | ||
general_pipeline: | ||
early_stop: | ||
patience: 20 # Steps before early stop | ||
checkpoint_path: checkpoint.pth # Early stop checkpoint model file path | ||
num_epochs: 200 # Number of training epochs | ||
eval_period: 5 # Interval epochs between evaluations | ||
optimizer: | ||
name: Adam | ||
lr: 0.01 | ||
weight_decay: 0.0005 | ||
loss: CrossEntropyLoss | ||
save_path: model.pth # Path to save the model | ||
num_runs: 1 # Number of experiments to run | ||
``` | ||
Apart from `dgl configure`, you could also get one of DGL-Go's built-in configuration files | ||
(called *recipe*) using `dgl recipe`. There are two sub-commands: | ||
|
||
``` | ||
dgl recipe list | ||
``` | ||
|
||
will list the available recipes: | ||
|
||
``` | ||
➜ dgl recipe list | ||
=============================================================================== | ||
| Filename | Pipeline | Dataset | | ||
=============================================================================== | ||
| linkpred_citation2_sage.yaml | linkpred | ogbl-citation2 | | ||
| linkpred_collab_sage.yaml | linkpred | ogbl-collab | | ||
| nodepred_citeseer_sage.yaml | nodepred | citeseer | | ||
| nodepred_citeseer_gcn.yaml | nodepred | citeseer | | ||
| nodepred-ns_arxiv_gcn.yaml | nodepred-ns | ogbn-arxiv | | ||
| nodepred_cora_gat.yaml | nodepred | cora | | ||
| nodepred_pubmed_sage.yaml | nodepred | pubmed | | ||
| linkpred_cora_sage.yaml | linkpred | cora | | ||
| nodepred_pubmed_gcn.yaml | nodepred | pubmed | | ||
| nodepred_pubmed_gat.yaml | nodepred | pubmed | | ||
| nodepred_cora_gcn.yaml | nodepred | cora | | ||
| nodepred_cora_sage.yaml | nodepred | cora | | ||
| nodepred_citeseer_gat.yaml | nodepred | citeseer | | ||
| nodepred-ns_product_sage.yaml | nodepred-ns | ogbn-products | | ||
=============================================================================== | ||
``` | ||
|
||
Then use | ||
|
||
``` | ||
dgl recipe get nodepred_cora_sage.yaml | ||
``` | ||
to copy the YAML configuration file to your local folder. | ||
### Step 2: `dgl train` | ||
Simply run `dgl train --cfg cora_sage.yaml` will start the training process. | ||
```log | ||
... | ||
Epoch 00190 | Loss 1.5225 | TrainAcc 0.9500 | ValAcc 0.6840 | ||
Epoch 00191 | Loss 1.5416 | TrainAcc 0.9357 | ValAcc 0.6840 | ||
Epoch 00192 | Loss 1.5391 | TrainAcc 0.9357 | ValAcc 0.6840 | ||
Epoch 00193 | Loss 1.5257 | TrainAcc 0.9643 | ValAcc 0.6840 | ||
Epoch 00194 | Loss 1.5196 | TrainAcc 0.9286 | ValAcc 0.6840 | ||
EarlyStopping counter: 12 out of 20 | ||
Epoch 00195 | Loss 1.4862 | TrainAcc 0.9643 | ValAcc 0.6760 | ||
Epoch 00196 | Loss 1.5142 | TrainAcc 0.9714 | ValAcc 0.6760 | ||
Epoch 00197 | Loss 1.5145 | TrainAcc 0.9714 | ValAcc 0.6760 | ||
Epoch 00198 | Loss 1.5174 | TrainAcc 0.9571 | ValAcc 0.6760 | ||
Epoch 00199 | Loss 1.5235 | TrainAcc 0.9714 | ValAcc 0.6760 | ||
Test Accuracy 0.7740 | ||
Accuracy across 1 runs: 0.774 ± 0.0 | ||
``` | ||
|
||
That's all! Basically you only need two commands to train a graph neural network. | ||
|
||
### Step 3: `dgl export` for more advanced customization | ||
|
||
That's not everything yet. You may want to open the hood and and invoke deeper | ||
customization. DGL-Go can export a **self-contained, reproducible** Python | ||
script for you to do anything you like. | ||
|
||
Try `dgl export --cfg cora_sage.yaml --output script.py`, | ||
and you'll get the script used to train the model. Here's the code snippet: | ||
|
||
```python | ||
... | ||
|
||
class GraphSAGE(nn.Module): | ||
def __init__(self, | ||
data_info: dict, | ||
embed_size: int = -1, | ||
hidden_size: int = 16, | ||
num_layers: int = 1, | ||
activation: str = "relu", | ||
dropout: float = 0.5, | ||
aggregator_type: str = "gcn"): | ||
"""GraphSAGE model | ||
Parameters | ||
---------- | ||
data_info : dict | ||
The information about the input dataset. | ||
embed_size : int | ||
The dimension of created embedding table. -1 means using original node embedding | ||
hidden_size : int | ||
Hidden size. | ||
num_layers : int | ||
Number of hidden layers. | ||
dropout : float | ||
Dropout rate. | ||
activation : str | ||
Activation function name under torch.nn.functional | ||
aggregator_type : str | ||
Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``). | ||
""" | ||
super(GraphSAGE, self).__init__() | ||
self.data_info = data_info | ||
self.embed_size = embed_size | ||
if embed_size > 0: | ||
self.embed = nn.Embedding(data_info["num_nodes"], embed_size) | ||
in_size = embed_size | ||
else: | ||
in_size = data_info["in_size"] | ||
self.layers = nn.ModuleList() | ||
self.dropout = nn.Dropout(dropout) | ||
self.activation = getattr(nn.functional, activation) | ||
|
||
for i in range(num_layers): | ||
in_hidden = hidden_size if i > 0 else in_size | ||
out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"] | ||
self.layers.append(dgl.nn.SAGEConv( in_hidden, out_hidden, aggregator_type)) | ||
|
||
def forward(self, graph, node_feat, edge_feat=None): | ||
if self.embed_size > 0: | ||
dgl_warning( | ||
"The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.", | ||
norepeat=True) | ||
h = self.embed.weight | ||
else: | ||
h = node_feat | ||
h = self.dropout(h) | ||
for l, layer in enumerate(self.layers): | ||
h = layer(graph, h, edge_feat) | ||
if l != len(self.layers) - 1: | ||
h = self.activation(h) | ||
h = self.dropout(h) | ||
return h | ||
|
||
... | ||
|
||
def train(cfg, pipeline_cfg, device, data, model, optimizer, loss_fcn): | ||
g = data[0] # Only train on the first graph | ||
g = dgl.remove_self_loop(g) | ||
g = dgl.add_self_loop(g) | ||
g = g.to(device) | ||
|
||
node_feat = g.ndata.get('feat', None) | ||
edge_feat = g.edata.get('feat', None) | ||
label = g.ndata['label'] | ||
train_mask, val_mask, test_mask = g.ndata['train_mask'].bool( | ||
), g.ndata['val_mask'].bool(), g.ndata['test_mask'].bool() | ||
|
||
stopper = EarlyStopping(**pipeline_cfg['early_stop']) | ||
|
||
val_acc = 0. | ||
for epoch in range(pipeline_cfg['num_epochs']): | ||
model.train() | ||
logits = model(g, node_feat, edge_feat) | ||
loss = loss_fcn(logits[train_mask], label[train_mask]) | ||
|
||
optimizer.zero_grad() | ||
loss.backward() | ||
optimizer.step() | ||
|
||
train_acc = accuracy(logits[train_mask], label[train_mask]) | ||
if epoch != 0 and epoch % pipeline_cfg['eval_period'] == 0: | ||
val_acc = accuracy(logits[val_mask], label[val_mask]) | ||
|
||
if stopper.step(val_acc, model): | ||
break | ||
|
||
print("Epoch {:05d} | Loss {:.4f} | TrainAcc {:.4f} | ValAcc {:.4f}". | ||
format(epoch, loss.item(), train_acc, val_acc)) | ||
|
||
stopper.load_checkpoint(model) | ||
|
||
model.eval() | ||
with torch.no_grad(): | ||
logits = model(g, node_feat, edge_feat) | ||
test_acc = accuracy(logits[test_mask], label[test_mask]) | ||
return test_acc | ||
|
||
|
||
def main(): | ||
cfg = { | ||
'version': '0.0.1', | ||
'device': 'cuda:0', | ||
'model': { | ||
'embed_size': -1, | ||
'hidden_size': 16, | ||
'num_layers': 2, | ||
'activation': 'relu', | ||
'dropout': 0.5, | ||
'aggregator_type': 'gcn'}, | ||
'general_pipeline': { | ||
'early_stop': { | ||
'patience': 100, | ||
'checkpoint_path': 'checkpoint.pth'}, | ||
'num_epochs': 200, | ||
'eval_period': 5, | ||
'optimizer': { | ||
'lr': 0.01, | ||
'weight_decay': 0.0005}, | ||
'loss': 'CrossEntropyLoss', | ||
'save_path': 'model.pth', | ||
'num_runs': 10}} | ||
device = cfg['device'] | ||
pipeline_cfg = cfg['general_pipeline'] | ||
# load data | ||
data = AsNodePredDataset(CoraGraphDataset()) | ||
# create model | ||
model_cfg = cfg["model"] | ||
cfg["model"]["data_info"] = { | ||
"in_size": model_cfg['embed_size'] if model_cfg['embed_size'] > 0 else data[0].ndata['feat'].shape[1], | ||
"out_size": data.num_classes, | ||
"num_nodes": data[0].num_nodes() | ||
} | ||
model = GraphSAGE(**cfg["model"]) | ||
model = model.to(device) | ||
loss = torch.nn.CrossEntropyLoss() | ||
optimizer = torch.optim.Adam( | ||
model.parameters(), | ||
**pipeline_cfg["optimizer"]) | ||
# train | ||
test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss) | ||
torch.save(model, pipeline_cfg["save_path"]) | ||
return test_acc | ||
|
||
... | ||
``` | ||
|
||
You can see that everything is collected into one Python script which includes the | ||
entire `GraphSAGE` model definition, data processing and training loop. Simply running | ||
`python script.py` will give you the *exact same* result as you've seen by `dgl train`. | ||
At this point, you can change any part as you wish such as plugging your own GNN module, | ||
changing the loss function and so on. | ||
|
||
## Use DGL-Go on your own dataset | ||
|
||
DGL-Go supports training a model on custom dataset by DGL's `CSVDataset`. | ||
|
||
### Step 1: Prepare your CSV and metadata file. | ||
|
||
Follow the tutorial at [Loading data from CSV | ||
files](https://docs.dgl.ai/en/latest/guide/data-loadcsv.html#guide-data-pipeline-loadcsv`) | ||
to prepare your dataset. Generally, the dataset folder should include: | ||
* At least one CSV file for node data. | ||
* At least one CSV file for edge data. | ||
* A metadata file called `meta.yaml`. | ||
|
||
### Step 2: `dgl configure` with `--data csv` option | ||
Run | ||
|
||
``` | ||
dgl configure nodepred --data csv --model sage --cfg csv_sage.yaml | ||
``` | ||
|
||
to generate the configuration file. You will see that the file includes a section like | ||
the followings: | ||
|
||
```yaml | ||
... | ||
data: | ||
name: csv | ||
split_ratio: # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset | ||
data_path: ./ # metadata.yaml, nodes.csv, edges.csv should in this folder | ||
... | ||
``` | ||
|
||
Fill in the `data_path` option with the path to your dataset folder. | ||
|
||
If your dataset does not have any native split for training, validation and test sets, | ||
you can set the split ratio in the `split_ratio` option, which will | ||
generate a random split for you. | ||
|
||
### Step 3: `train` the model / `export` the script | ||
Then you can do the same as the tutorial above, either train the model by | ||
`dgl train --cfg csv_sage.yaml` or use `dgl export --cfg csv_sage.yaml | ||
--output script.py` to get the training script. | ||
|
||
## FAQ | ||
|
||
**Q: What are the available options for each command?** | ||
A: You can use `--help` for all commands. For example, use `dgl --help` for general | ||
help message; use `dgl configure --help` for the configuration options; use | ||
`dgl configure nodepred --help` for the configuration options of node prediction pipeline. | ||
|
||
**Q: What exactly is nodepred/linkpred? How many are they?** | ||
A: They are called DGl-Go pipelines. A pipeline represents the training methodology for | ||
a certain task. Therefore, its naming convention is *<task_name>[-<method_name>]*. For example, | ||
`nodepred` trains the selected GNN model for node classification using full-graph training method; | ||
while `nodepred-ns` trains the model for node classifiation but using neighbor sampling. | ||
The first release included three training pipelines (`nodepred`, `nodepred-ns` and `linkpred`) | ||
but you can expect more will be coming in the future. Use `dgl configure --help` to see | ||
all the available pipelines. | ||
|
||
**Q: How to add my model to the official model recipe zoo?** | ||
A: Currently not supported. We will enable this feature soon. Please stay tuned! | ||
|
||
**Q: After training a model on some dataset, how can I apply it to another one?** | ||
A: The `save_path` option in the generated configuration file allows you to specify where | ||
to save the model after training. You can then modify the script generated by `dgl export` | ||
to load the the model checkpoint and evaluate it on another dataset. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
Oops, something went wrong.