Skip to content

Commit

Permalink
[Distributed] Deprecate old DistEmbedding impl, use synchronized embe…
Browse files Browse the repository at this point in the history
…dding impl (dmlc#3111)

* fix.

* fix.

* fix.

* fix.

* Fix test

* Deprecate old DistEmbedding impl, use synchronized embedding impl

* update doc

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Da Zheng <[email protected]>
Co-authored-by: Jinjing Zhou <[email protected]>
  • Loading branch information
5 people authored Jul 13, 2021
1 parent ee6bc95 commit d739076
Show file tree
Hide file tree
Showing 18 changed files with 62 additions and 251 deletions.
4 changes: 2 additions & 2 deletions docs/source/api/python/dgl.distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ Distributed Tensor

Distributed Node Embedding
---------------------
.. currentmodule:: dgl.distributed.nn.pytorch
.. currentmodule:: dgl.distributed

.. autoclass:: NodeEmbedding
.. autoclass:: DistEmbedding


Distributed embedding optimizer
Expand Down
12 changes: 6 additions & 6 deletions docs/source/guide/distributed-apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This section covers the distributed APIs used in the training script. DGL provid
data structures and various APIs for initialization, distributed sampling and workload split.
For distributed training/inference, DGL provides three distributed data structures:
:class:`~dgl.distributed.DistGraph` for distributed graphs, :class:`~dgl.distributed.DistTensor` for
distributed tensors and :class:`~dgl.distributed.nn.NodeEmbedding` for distributed learnable embeddings.
distributed tensors and :class:`~dgl.distributed.DistEmbedding` for distributed learnable embeddings.

Initialization of the DGL distributed module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -27,7 +27,7 @@ Typically, the initialization APIs should be invoked in the following order:
th.distributed.init_process_group(backend='gloo')
**Note**: If the training script contains user-defined functions (UDFs) that have to be invoked on
the servers (see the section of DistTensor and NodeEmbedding for more details), these UDFs have to
the servers (see the section of DistTensor and DistEmbedding for more details), these UDFs have to
be declared before :func:`~dgl.distributed.initialize`.

Distributed graph
Expand Down Expand Up @@ -153,10 +153,10 @@ computation operators, such as sum and mean.
when a machine runs multiple servers. This may result in data corruption. One way to avoid concurrent
writes to the same row of data is to run one server process on a machine.

Distributed NodeEmbedding
Distributed DistEmbedding
~~~~~~~~~~~~~~~~~~~~~

DGL provides :class:`~dgl.distributed.nn.NodeEmbedding` to support transductive models that require
DGL provides :class:`~dgl.distributed.DistEmbedding` to support transductive models that require
node embeddings. Creating distributed embeddings is very similar to creating distributed tensors.

.. code:: python
Expand All @@ -165,7 +165,7 @@ node embeddings. Creating distributed embeddings is very similar to creating dis
arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1)
return arr
emb = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(), 10, init_func=initializer)
emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=initializer)
Internally, distributed embeddings are built on top of distributed tensors, and, thus, has
very similar behaviors to distributed tensors. For example, when embeddings are created, they
Expand All @@ -192,7 +192,7 @@ the other for dense model parameters, as shown in the code below:
optimizer.step()
sparse_optimizer.step()
**Note**: :class:`~dgl.distributed.nn.NodeEmbedding` is not an Pytorch nn module, so we cannot
**Note**: :class:`~dgl.distributed.DistEmbedding` is not an Pytorch nn module, so we cannot
get access to it from parameters of a Pytorch nn module.

Distributed sampling
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guide/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Specifically, DGL's distributed training has three types of interacting processe
generate mini-batches for training.
* Trainers contain multiple classes to interact with servers. It has
:class:`~dgl.distributed.DistGraph` to get access to partitioned graph data and has
:class:`~dgl.distributed.nn.NodeEmbedding` and :class:`~dgl.distributed.DistTensor` to access
:class:`~dgl.distributed.DistEmbedding` and :class:`~dgl.distributed.DistTensor` to access
the node/edge features/embeddings. It has
:class:`~dgl.distributed.dist_dataloader.DistDataLoader` to
interact with samplers to get mini-batches.
Expand Down
10 changes: 5 additions & 5 deletions docs/source/guide_cn/distributed-apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
本节介绍了在训练脚本中使用的分布式计算API。DGL提供了三种分布式数据结构和多种API,用于初始化、分布式采样和数据分割。
对于分布式训练/推断,DGL提供了三种分布式数据结构:用于分布式图的 :class:`~dgl.distributed.DistGraph`、
用于分布式张量的 :class:`~dgl.distributed.DistTensor` 和用于分布式可学习嵌入的
:class:`~dgl.distributed.nn.NodeEmbedding`。
:class:`~dgl.distributed.DistEmbedding`。

DGL分布式模块的初始化
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -24,7 +24,7 @@ DGL分布式模块的初始化
dgl.distributed.initialize('ip_config.txt')
th.distributed.init_process_group(backend='gloo')
**Note**: 如果训练脚本里包含需要在服务器(细节内容可以在下面的DistTensor和NodeEmbedding章节里查看)上调用的用户自定义函数(UDF),
**Note**: 如果训练脚本里包含需要在服务器(细节内容可以在下面的DistTensor和DistEmbedding章节里查看)上调用的用户自定义函数(UDF),
这些UDF必须在 :func:`~dgl.distributed.initialize` 之前被声明。

分布式图
Expand Down Expand Up @@ -138,7 +138,7 @@ DGL为分布式张量提供了类似于单机普通张量的接口,以访问
分布式嵌入
~~~~~~~~~~~~~~~~~~~~~

DGL提供 :class:`~dgl.distributed.nn.NodeEmbedding` 以支持需要节点嵌入的直推(transductive)模型。
DGL提供 :class:`~dgl.distributed.DistEmbedding` 以支持需要节点嵌入的直推(transductive)模型。
分布式嵌入的创建与分布式张量的创建非常相似。

.. code:: python
Expand All @@ -147,7 +147,7 @@ DGL提供 :class:`~dgl.distributed.nn.NodeEmbedding` 以支持需要节点嵌入
arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1)
return arr
emb = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(), 10, init_func=initializer)
emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=initializer)
在内部,分布式嵌入建立在分布式张量之上,因此,其行为与分布式张量非常相似。
例如,创建嵌入时,DGL会将它们分片并存储在集群中的所有计算机上。(分布式嵌入)可以通过名称唯一标识。
Expand All @@ -169,7 +169,7 @@ DGL提供了一个稀疏的Adagrad优化器 :class:`~dgl.distributed.SparseAdagr
optimizer.step()
sparse_optimizer.step()
**Note**: :class:`~dgl.distributed.nn.NodeEmbedding` 不是PyTorch的nn模块,因此用户无法从nn模块的参数访问它。
**Note**: :class:`~dgl.distributed.DistEmbedding` 不是PyTorch的nn模块,因此用户无法从nn模块的参数访问它。

分布式采样
~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guide_cn/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ DGL实现了一些分布式组件以支持分布式训练,下图显示了这
这些服务器一起工作以将图数据提供给训练器。请注意,一台机器可能同时运行多个服务器进程,以并行化计算和网络通信。
* *采样器进程* 与服务器进行交互,并对节点和边采样以生成用于训练的小批次数据。
* *训练器进程* 包含多个与服务器交互的类。它用 :class:`~dgl.distributed.DistGraph` 来获取被划分的图分区数据,
:class:`~dgl.distributed.nn.NodeEmbedding` 和
:class:`~dgl.distributed.DistEmbedding` 和
:class:`~dgl.distributed.DistTensor` 来获取节点/边特征/嵌入,用
:class:`~dgl.distributed.dist_dataloader.DistDataLoader` 与采样器进行交互以获得小批次数据。

Expand Down
4 changes: 2 additions & 2 deletions examples/pytorch/graphsage/experimental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pyt
"python3 train_dist_transductive.py --graph_name ogb-product --ip_config ip_config.txt --batch_size 1000 --num_gpu 4 --eval_every 5"
```

To run supervised with transductive setting using dgl distributed NodeEmbedding
To run supervised with transductive setting using dgl distributed DistEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
Expand All @@ -188,7 +188,7 @@ python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pyt
"python3 train_dist_unsupervised_transductive.py --graph_name ogb-product --ip_config ip_config.txt --num_epochs 3 --batch_size 1000 --num_gpus 4"
```

To run unsupervised with transductive setting using dgl distributed NodeEmbedding
To run unsupervised with transductive setting using dgl distributed DistEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py --workspace ~/workspace/dgl/examples/pytorch/graphsage/experimental/ \
--num_trainers 4 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import dgl.function as fn
import dgl.nn.pytorch as dglnn
from dgl.distributed import DistDataLoader
from dgl.distributed.nn import NodeEmbedding
from dgl.distributed import DistEmbedding

import torch as th
import torch.nn as nn
Expand Down Expand Up @@ -91,7 +91,7 @@ def __init__(self, num_nodes, emb_size, dgl_sparse_emb=False, dev_id='cpu'):
self.emb_size = emb_size
self.dgl_sparse_emb = dgl_sparse_emb
if dgl_sparse_emb:
self.sparse_emb = NodeEmbedding(num_nodes, emb_size, name='sage', init_func=initializer)
self.sparse_emb = DistEmbedding(num_nodes, emb_size, name='sage', init_func=initializer)
else:
self.sparse_emb = th.nn.Embedding(num_nodes, emb_size, sparse=True)
nn.init.uniform_(self.sparse_emb.weight, -1.0, 1.0)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import torch.multiprocessing as mp
from dgl.distributed import DistDataLoader

from dgl.distributed.optim import SparseAdagrad
from train_dist_unsupervised import SAGE, NeighborSampler, PosNeighborSampler, CrossEntropyLoss, compute_acc
from train_dist_transductive import DistEmb, load_embs

Expand Down
6 changes: 3 additions & 3 deletions examples/pytorch/rgcn/experimental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ We can get the performance score at the second epoch:
Val Acc 0.4323, Test Acc 0.4255, time: 128.0379
```

The command below launches the same distributed training job using dgl distributed NodeEmbedding
The command below launches the same distributed training job using dgl distributed DistEmbedding
```bash
python3 ~/workspace/dgl/tools/launch.py \
--workspace ~/workspace/dgl/examples/pytorch/rgcn/experimental/ \
Expand All @@ -135,7 +135,7 @@ python3 ~/workspace/dgl/tools/launch.py \
--num_samplers 4 \
--part_config data/ogbn-mag.json \
--ip_config ip_config.txt \
"python3 entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 1024 --n-hidden 64 --lr 0.01 --eval-batch-size 1024 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --sparse-embedding --sparse-lr 0.06 --num_gpus 1"
"python3 entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 1024 --n-hidden 64 --lr 0.01 --eval-batch-size 1024 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --sparse-embedding --sparse-lr 0.06 --num_gpus 1 --dgl-sparse"
```

We can get the performance score at the second epoch:
Expand Down Expand Up @@ -218,5 +218,5 @@ python3 partition_graph.py --dataset ogbn-mag --num_parts 1

### Step 2: run the training script
```bash
python3 entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 512 --n-hidden 64 --lr 0.01 --eval-batch-size 128 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --conf-path 'data/ogbn-mag.json' --standalone --sparse-embedding --sparse-lr 0.06 --node-feats
DGL_DIST_MODE=standalone python3 entity_classify_dist.py --graph-name ogbn-mag --dataset ogbn-mag --fanout='25,25' --batch-size 512 --n-hidden 64 --lr 0.01 --eval-batch-size 128 --low-mem --dropout 0.5 --use-self-loop --n-bases 2 --n-epochs 3 --layer-norm --ip-config ip_config.txt --conf-path 'data/ogbn-mag.json' --standalone --sparse-embedding --sparse-lr 0.06
```
9 changes: 6 additions & 3 deletions examples/pytorch/rgcn/experimental/entity_classify_dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import itertools
import numpy as np
import time
import os
import os, gc
os.environ['DGLBACKEND']='pytorch'

import torch as th
Expand Down Expand Up @@ -162,7 +162,7 @@ def __init__(self,
# We only create embeddings for nodes without node features.
if feat_name not in g.nodes[ntype].data:
part_policy = g.get_node_partition_policy(ntype)
self.node_embeds[ntype] = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(ntype),
self.node_embeds[ntype] = dgl.distributed.DistEmbedding(g.number_of_nodes(ntype),
self.embed_size,
embed_name + '_' + ntype,
init_emb,
Expand Down Expand Up @@ -229,6 +229,7 @@ def evaluate(g, model, embed_layer, labels, eval_loader, test_loader, all_val_ni
global_results = dgl.distributed.DistTensor(labels.shape, th.long, 'results', persistent=True)

with th.no_grad():
th.cuda.empty_cache()
for sample_data in tqdm.tqdm(eval_loader):
seeds, blocks = sample_data
for block in blocks:
Expand All @@ -245,6 +246,7 @@ def evaluate(g, model, embed_layer, labels, eval_loader, test_loader, all_val_ni
test_logits = []
test_seeds = []
with th.no_grad():
th.cuda.empty_cache()
for sample_data in tqdm.tqdm(test_loader):
seeds, blocks = sample_data
for block in blocks:
Expand Down Expand Up @@ -347,7 +349,7 @@ def run(args, device, data):
# Create DataLoader for constructing blocks
test_dataloader = DistDataLoader(
dataset=test_nid,
batch_size=args.batch_size,
batch_size=args.eval_batch_size,
collate_fn=test_sampler.sample_blocks,
shuffle=False,
drop_last=False)
Expand Down Expand Up @@ -486,6 +488,7 @@ def run(args, device, data):
np.sum(backward_t[-args.log_every:]), np.sum(update_t[-args.log_every:])))
start = time.time()

gc.collect()
print('[{}]Epoch Time(s): {:.4f}, sample: {:.4f}, data copy: {:.4f}, forward: {:.4f}, backward: {:.4f}, update: {:.4f}, #train: {}, #input: {}'.format(
g.rank(), np.sum(step_time), np.sum(sample_t), np.sum(feat_copy_t), np.sum(forward_t), np.sum(backward_t), np.sum(update_t), number_train, number_input))
epoch += 1
Expand Down
3 changes: 1 addition & 2 deletions python/dgl/distributed/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@
from .dist_tensor import DistTensor
from .partition import partition_graph, load_partition, load_partition_book
from .graph_partition_book import GraphPartitionBook, PartitionPolicy
from .sparse_emb import SparseAdagrad, DistEmbedding
from . import nn
from .nn import *
from . import optim

from .rpc import *
Expand Down
2 changes: 1 addition & 1 deletion python/dgl/distributed/nn/pytorch/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""dgl distributed sparse optimizer for pytorch."""
from .sparse_emb import NodeEmbedding
from .sparse_emb import DistEmbedding
8 changes: 4 additions & 4 deletions python/dgl/distributed/nn/pytorch/sparse_emb.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from .... import utils
from ...dist_tensor import DistTensor

class NodeEmbedding:
class DistEmbedding:
'''Distributed node embeddings.
DGL provides a distributed embedding to support models that require learnable embeddings.
Expand Down Expand Up @@ -34,7 +34,7 @@ class NodeEmbedding:
The dimension size of embeddings.
name : str, optional
The name of the embeddings. The name can uniquely identify embeddings in a system
so that another NodeEmbedding object can referent to the same embeddings.
so that another DistEmbedding object can referent to the same embeddings.
init_func : callable, optional
The function to create the initial data. If the init function is not provided,
the values of the embeddings are initialized to zero.
Expand All @@ -49,7 +49,7 @@ class NodeEmbedding:
arr = th.zeros(shape, dtype=dtype)
arr.uniform_(-1, 1)
return arr
>>> emb = dgl.distributed.nn.NodeEmbedding(g.number_of_nodes(), 10, init_func=initializer)
>>> emb = dgl.distributed.DistEmbedding(g.number_of_nodes(), 10, init_func=initializer)
>>> optimizer = dgl.distributed.optim.SparseAdagrad([emb], lr=0.001)
>>> for blocks in dataloader:
... feats = emb(nids)
Expand All @@ -59,7 +59,7 @@ class NodeEmbedding:
Note
----
When a ``NodeEmbedding`` object is used when the deep learning framework is recording
When a ``DistEmbedding`` object is used when the deep learning framework is recording
the forward computation, users have to invoke
py:meth:`~dgl.distributed.optim.SparseAdagrad.step` afterwards. Otherwise, there will be
some memory leak.
Expand Down
Loading

0 comments on commit d739076

Please sign in to comment.