Skip to content

Commit

Permalink
[Sampling] New sampling pipeline plus asynchronous prefetching (dmlc#…
Browse files Browse the repository at this point in the history
…3665)

* initial update

* more

* more

* multi-gpu example

* cluster gcn, finalize homogeneous

* more explanation

* fix

* bunch of fixes

* fix

* RGAT example and more fixes

* shadow-gnn sampler and some changes in unit test

* fix

* wth

* more fixes

* remove shadow+node/edge dataloader tests for possible ux changes

* lints

* add legacy dataloading import just in case

* fix

* update pylint for f-strings

* fix

* lint

* lint

* lint again

* cherry-picking commit fa9f494

* oops

* fix

* add sample_neighbors in dist_graph

* fix

* lint

* fix

* fix

* fix

* fix tutorial

* fix

* fix

* fix

* fix warning

* remove debug

* add get_foo_storage apis

* lint
  • Loading branch information
BarclayII authored Jan 30, 2022
1 parent 5152a87 commit 701b4fc
Show file tree
Hide file tree
Showing 62 changed files with 4,011 additions and 1,487 deletions.
2 changes: 2 additions & 0 deletions docs/source/api/python/dgl.dataloading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ and an ``EdgeDataLoader`` for edge/link prediction task.
.. autoclass:: NodeDataLoader
.. autoclass:: EdgeDataLoader
.. autoclass:: GraphDataLoader
.. autoclass:: DistNodeDataLoader
.. autoclass:: DistEdgeDataLoader

.. _api-dataloading-neighbor-sampling:

Expand Down
26 changes: 13 additions & 13 deletions docs/source/guide/distributed-apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,20 +202,20 @@ DGL provides two levels of APIs for sampling nodes and edges to generate mini-ba
(see the section of mini-batch training). The low-level APIs require users to write code
to explicitly define how a layer of nodes are sampled (e.g., using :func:`dgl.sampling.sample_neighbors` ).
The high-level sampling APIs implement a few popular sampling algorithms for node classification
and link prediction tasks (e.g., :class:`~dgl.dataloading.pytorch.NodeDataloader` and
:class:`~dgl.dataloading.pytorch.EdgeDataloader` ).
and link prediction tasks (e.g., :class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ).

The distributed sampling module follows the same design and provides two levels of sampling APIs.
For the lower-level sampling API, it provides :func:`~dgl.distributed.sample_neighbors` for
distributed neighborhood sampling on :class:`~dgl.distributed.DistGraph`. In addition, DGL provides
a distributed Dataloader (:class:`~dgl.distributed.DistDataLoader` ) for distributed sampling.
The distributed Dataloader has the same interface as Pytorch DataLoader except that users cannot
a distributed DataLoader (:class:`~dgl.distributed.DistDataLoader` ) for distributed sampling.
The distributed DataLoader has the same interface as Pytorch DataLoader except that users cannot
specify the number of worker processes when creating a dataloader. The worker processes are created
in :func:`dgl.distributed.initialize`.

**Note**: When running :func:`dgl.distributed.sample_neighbors` on :class:`~dgl.distributed.DistGraph`,
the sampler cannot run in Pytorch Dataloader with multiple worker processes. The main reason is that
Pytorch Dataloader creates new sampling worker processes in every epoch, which leads to creating and
the sampler cannot run in Pytorch DataLoader with multiple worker processes. The main reason is that
Pytorch DataLoader creates new sampling worker processes in every epoch, which leads to creating and
destroying :class:`~dgl.distributed.DistGraph` objects many times.

When using the low-level API, the sampling code is similar to single-process sampling. The only
Expand All @@ -240,17 +240,17 @@ difference is that users need to use :func:`dgl.distributed.sample_neighbors` an
for batch in dataloader:
...
The same high-level sampling APIs (:class:`~dgl.dataloading.pytorch.NodeDataloader` and
:class:`~dgl.dataloading.pytorch.EdgeDataloader` ) work for both :class:`~dgl.DGLGraph`
and :class:`~dgl.distributed.DistGraph`. When using :class:`~dgl.dataloading.pytorch.NodeDataloader`
and :class:`~dgl.dataloading.pytorch.EdgeDataloader`, the distributed sampling code is exactly
the same as single-process sampling.
The high-level sampling APIs (:class:`~dgl.dataloading.pytorch.NodeDataLoader` and
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` ) has distributed counterparts
(:class:`~dgl.dataloading.pytorch.DistNodeDataLoader` and
:class:`~dgl.dataloading.pytorch.DistEdgeDataLoader`). The code is exactly the
same as single-process sampling otherwise.

.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
Expand Down
20 changes: 10 additions & 10 deletions docs/source/guide_cn/distributed-apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,9 +177,9 @@ DGL提供了一个稀疏的Adagrad优化器 :class:`~dgl.distributed.SparseAdagr
DGL提供了两个级别的API,用于对节点和边进行采样以生成小批次训练数据(请参阅小批次训练的章节)。
底层API要求用户编写代码以明确定义如何对节点层进行采样(例如,使用 :func:`dgl.sampling.sample_neighbors` )。
高层采样API为节点分类和链接预测任务实现了一些流行的采样算法(例如
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.NodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader` )。
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` )。

分布式采样模块遵循相同的设计,也提供两个级别的采样API。对于底层的采样API,它为
:class:`~dgl.distributed.DistGraph` 上的分布式邻居采样提供了
Expand All @@ -188,7 +188,7 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小
分布式数据加载器具有与PyTorch DataLoader相同的接口。其中的工作进程(worker)在 :func:`dgl.distributed.initialize` 中创建。

**Note**: 在 :class:`~dgl.distributed.DistGraph` 上运行 :func:`dgl.distributed.sample_neighbors` 时,
采样器无法在具有多个工作进程的PyTorch Dataloader中运行。主要原因是PyTorch Dataloader在每个训练周期都会创建新的采样工作进程
采样器无法在具有多个工作进程的PyTorch DataLoader中运行。主要原因是PyTorch DataLoader在每个训练周期都会创建新的采样工作进程
从而导致多次创建和删除 :class:`~dgl.distributed.DistGraph` 对象。

使用底层API时,采样代码类似于单进程采样。唯一的区别是用户需要使用
Expand All @@ -214,19 +214,19 @@ DGL提供了两个级别的API,用于对节点和边进行采样以生成小
for batch in dataloader:
...
:class:`~dgl.DGLGraph` 和 :class:`~dgl.distributed.DistGraph` 都可以使用相同的高级采样API(
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.NodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader`)。使用
:class:`~dgl.dataloading.pytorch.NodeDataloader`
:class:`~dgl.dataloading.pytorch.EdgeDataLoader` 有分布式的版本
:class:`~dgl.dataloading.pytorch.DistNodeDataLoader`
:class:`~dgl.dataloading.pytorch.EdgeDataloader` 时,分布式采样代码与单进程采样完全相同。
:class:`~dgl.dataloading.pytorch.DistEdgeDataLoader` 。使用
时分布式采样代码与单进程采样几乎完全相同。

.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
Expand Down
10 changes: 5 additions & 5 deletions docs/source/guide_ko/distributed-apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,8 @@ DGL은 노드 임베딩들을 필요로 하는 변환 모델(transductive models
분산 샘플링
~~~~~~~~

DGL은 미니-배치를 생성하기 위해 노드 및 에지 샘플링을 하는 두 수준의 API를 제공한다 (미니-배치 학습 섹션 참조). Low-level API는 노드들의 레이어가 어떻게 샘플링될지를 명시적으로 정의하는 코드를 직접 작성해야한다 (예를 들면, :func:`dgl.sampling.sample_neighbors` 사용해서). High-level API는 노드 분류 및 링크 예측(예, :class:`~dgl.dataloading.pytorch.NodeDataloader` 와
:class:`~dgl.dataloading.pytorch.EdgeDataloader`) 에 사용되는 몇 가지 유명한 샘플링 알고리즘을 구현하고 있다.
DGL은 미니-배치를 생성하기 위해 노드 및 에지 샘플링을 하는 두 수준의 API를 제공한다 (미니-배치 학습 섹션 참조). Low-level API는 노드들의 레이어가 어떻게 샘플링될지를 명시적으로 정의하는 코드를 직접 작성해야한다 (예를 들면, :func:`dgl.sampling.sample_neighbors` 사용해서). High-level API는 노드 분류 및 링크 예측(예, :class:`~dgl.dataloading.pytorch.NodeDataLoader` 와
:class:`~dgl.dataloading.pytorch.EdgeDataLoader`) 에 사용되는 몇 가지 유명한 샘플링 알고리즘을 구현하고 있다.

분산 샘플링 모듈도 같은 디자인을 따르고 있고, 두 level의 샘플링 API를 제공한다. Low-level 샘플링 API의 경우, :class:`~dgl.distributed.DistGraph` 에 대한 분산 이웃 샘플링을 위해 :func:`~dgl.distributed.sample_neighbors` 가 있다. 또한, DGL은 분산 샘플링을 위해 분산 데이터 로더, :class:`~dgl.distributed.DistDataLoader` 를 제공한다. 분산 DataLoader는 PyTorch DataLoader와 같은 인터페이스를 갖는데, 다른 점은 사용자가 데이터 로더를 생성할 때 worker 프로세스의 개수를 지정할 수 없다는 점이다. Worker 프로세스들은 :func:`dgl.distributed.initialize` 에서 만들어진다.

Expand All @@ -159,13 +159,13 @@ Low-level API를 사용할 때, 샘플링 코드는 단일 프로세스 샘플
for batch in dataloader:
...
동일한 high-level 샘플링 API들(:class:`~dgl.dataloading.pytorch.NodeDataloader` 와 :class:`~dgl.dataloading.pytorch.EdgeDataloader` )이 :class:`~dgl.DGLGraph` 와 :class:`~dgl.distributed.DistGraph` 에 대해서 동작한다. :class:`~dgl.dataloading.pytorch.NodeDataloader` 과 :class:`~dgl.dataloading.pytorch.EdgeDataloader` 를 사용할 때, 분산 샘플링 코드는 싱글-프로세스 샘플링 코드와 정확하게 같다.
동일한 high-level 샘플링 API들(:class:`~dgl.dataloading.pytorch.NodeDataLoader` 와 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` )이 :class:`~dgl.DGLGraph` 와 :class:`~dgl.distributed.DistGraph` 에 대해서 동작한다. :class:`~dgl.dataloading.pytorch.NodeDataLoader` 과 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 를 사용할 때, 분산 샘플링 코드는 싱글-프로세스 샘플링 코드와 정확하게 같다.

.. code:: python
sampler = dgl.sampling.MultiLayerNeighborSampler([10, 25])
dataloader = dgl.sampling.NodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
dataloader = dgl.sampling.DistNodeDataLoader(g, train_nid, sampler,
batch_size=batch_size, shuffle=True)
for batch in dataloader:
...
Expand Down
88 changes: 88 additions & 0 deletions examples/pytorch/__temporary__/cluster_gcn/cluster_gcn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchmetrics.functional as MF
import dgl
import dgl.nn as dglnn
import time
import numpy as np
from ogb.nodeproppred import DglNodePropPredDataset

USE_WRAPPER = True

class SAGE(nn.Module):
def __init__(self, in_feats, n_hidden, n_classes):
super().__init__()
self.layers = nn.ModuleList()
self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, 'mean'))
self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, 'mean'))
self.dropout = nn.Dropout(0.5)

def forward(self, sg, x):
h = x
for l, layer in enumerate(self.layers):
h = layer(sg, h)
if l != len(self.layers) - 1:
h = F.relu(h)
h = self.dropout(h)
return h

dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph.ndata['train_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, train_idx, True)
graph.ndata['valid_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, valid_idx, True)
graph.ndata['test_mask'] = torch.zeros(graph.num_nodes(), dtype=torch.bool).index_fill_(0, test_idx, True)

model = SAGE(graph.ndata['feat'].shape[1], 256, dataset.num_classes).cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=5e-4)

if USE_WRAPPER:
import dglnew
graph.create_formats_()
graph = dglnew.graph.wrapper.DGLGraphStorage(graph)

num_partitions = 1000
sampler = dgl.dataloading.ClusterGCNSampler(
graph, num_partitions,
prefetch_node_feats=['feat', 'label', 'train_mask', 'valid_mask', 'test_mask'])
# DataLoader for generic dataloading with a graph, a set of indices (any indices, like
# partition IDs here), and a graph sampler.
# NodeDataLoader and EdgeDataLoader are simply special cases of DataLoader where the
# indices are guaranteed to be node and edge IDs.
dataloader = dgl.dataloading.DataLoader(
graph,
torch.arange(num_partitions),
sampler,
device='cuda',
batch_size=100,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=8,
persistent_workers=True,
use_prefetch_thread=True) # TBD: could probably remove this argument

durations = []
for _ in range(10):
t0 = time.time()
for it, sg in enumerate(dataloader):
x = sg.ndata['feat']
y = sg.ndata['label'][:, 0]
m = sg.ndata['train_mask']
y_hat = model(sg, x)
loss = F.cross_entropy(y_hat[m], y[m])
opt.zero_grad()
loss.backward()
opt.step()
if it % 20 == 0:
acc = MF.accuracy(y_hat[m], y[m])
mem = torch.cuda.max_memory_allocated() / 1000000
print('Loss', loss.item(), 'Acc', acc.item(), 'GPU Mem', mem, 'MB')
tt = time.time()
print(tt - t0)
durations.append(tt - t0)
print(np.mean(durations[4:]), np.std(durations[4:]))
2 changes: 2 additions & 0 deletions examples/pytorch/__temporary__/dglnew/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from . import graph
from . import storages
3 changes: 3 additions & 0 deletions examples/pytorch/__temporary__/dglnew/graph/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .graph import *
from .other_feature import *
from .wrapper import *
150 changes: 150 additions & 0 deletions examples/pytorch/__temporary__/dglnew/graph/graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
class GraphStorage(object):
def get_node_storage(self, key, ntype=None):
pass

def get_edge_storage(self, key, etype=None):
pass

# Required for checking whether a single dict is allowed for ndata and edata.
@property
def ntypes(self):
pass

@property
def canonical_etypes(self):
pass

def etypes(self):
return [etype[1] for etype in self.canonical_etypes]

def sample_neighbors(self, seed_nodes, fanout, edge_dir='in', prob=None,
exclude_edges=None, replace=False, output_device=None):
"""Return a DGLGraph which is a subgraph induced by sampling neighboring edges of
the given nodes.
See ``dgl.sampling.sample_neighbors`` for detailed semantics.
Parameters
----------
seed_nodes : Tensor or dict[str, Tensor]
Node IDs to sample neighbors from.
This argument can take a single ID tensor or a dictionary of node types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
fanout : int or dict[etype, int]
The number of edges to be sampled for each node on each edge type.
This argument can take a single int or a dictionary of edge types and ints.
If a single int is given, DGL will sample this number of edges for each node for
every edge type.
If -1 is given for a single edge type, all the neighboring edges with that edge
type will be selected.
prob : str, optional
Feature name used as the (unnormalized) probabilities associated with each
neighboring edge of a node. The feature must have only one element for each
edge.
The features must be non-negative floats, and the sum of the features of
inbound/outbound edges for every node must be positive (though they don't have
to sum up to one). Otherwise, the result will be undefined.
If :attr:`prob` is not None, GPU sampling is not supported.
exclude_edges: tensor or dict
Edge IDs to exclude during sampling neighbors for the seed nodes.
This argument can take a single ID tensor or a dictionary of edge types and ID tensors.
If a single tensor is given, the graph must only have one type of nodes.
replace : bool, optional
If True, sample with replacement.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
A sampled subgraph with the same nodes as the original graph, but only the sampled neighboring
edges. The induced edge IDs will be in ``edata[dgl.EID]``.
"""
pass

# Required in Cluster-GCN
def subgraph(self, nodes, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given nodes.
This has the same semantics as ``dgl.node_subgraph``.
Parameters
----------
nodes : nodes or dict[str, nodes]
The nodes to form the subgraph. The allowed nodes formats are:
* Int Tensor: Each element is a node ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is a node ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
node :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being node types
and values being the node IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass

# Required in Link Prediction
def edge_subgraph(self, edges, relabel_nodes=False, output_device=None):
"""Return a subgraph induced on given edges.
This has the same semantics as ``dgl.edge_subgraph``.
Parameters
----------
edges : edges or dict[(str, str, str), edges]
The edges to form the subgraph. The allowed edges formats are:
* Int Tensor: Each element is an edge ID. The tensor must have the same device type
and ID data type as the graph's.
* iterable[int]: Each element is an edge ID.
* Bool Tensor: Each :math:`i^{th}` element is a bool flag indicating whether
edge :math:`i` is in the subgraph.
If the graph is homogeneous, one can directly pass the above formats.
Otherwise, the argument must be a dictionary with keys being edge types
and values being the edge IDs in the above formats.
relabel_nodes : bool, optional
If True, the extracted subgraph will only have the nodes in the specified node set
and it will relabel the nodes in order.
output_device : Framework-specific device context object, optional
The output device. Default is the same as the input graph.
Returns
-------
DGLGraph
The subgraph.
"""
pass

# Required in Link Prediction negative sampler
def find_edges(self, edges, etype=None, output_device=None):
"""Return the source and destination node IDs given the edge IDs within the given edge type.
"""
pass

# Required in Link Prediction negative sampler
def num_nodes(self, ntype):
"""Return the number of nodes for the given node type."""
pass

def global_uniform_negative_sampling(self, num_samples, exclude_self_loops=True,
replace=False, etype=None):
"""Per source negative sampling as in ``dgl.dataloading.GlobalUniform``"""
Loading

0 comments on commit 701b4fc

Please sign in to comment.