Skip to content

Commit

Permalink
[Doc] Chinese User Guide chapter 1 - 4 (dmlc#2351)
Browse files Browse the repository at this point in the history
* [Feature] Add full graph training with dgl built-in dataset.

* [Feature] Add full graph training with dgl built-in dataset.

* [Feature] Add full graph training with dgl built-in dataset.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Bug] fix model to cuda.

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Feature] Add test loss and accuracy

* [Fix] Add random

* [Bug] Fix batch norm error

* [Doc] Test with CN in Sphinx

* [Doc] Test with CN in Sphinx

* [Doc] Remove the test CN docs.

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Feature] Add input embedding layer

* [Doc] fill readme with new performance results

* [Doc] Add Chinese User Guide, graph and 1.5

* [Doc] Add Chinese User Guide, graph and 1.5

* Update README.md

* [Fix] Temporary remove compgcn

* [Doc] Add CN user guide chapter2

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Tunning format

* [Test] Section headers

* [Fix] Fix format errors

* [Fix] Fix format errors

* [Fix] Fix format errors

* [Doc] Add CN-EN EN-CN links

* [Doc] Add CN-EN EN-CN links

* [Doc] Copyedit chapter2

* [Doc] Copyedit chapter2

* [Doc] Remove EN in 2.1

* [Doc] Remove EN in chapter 2

* [Doc] Copyedit first 2 sections

* [Doc] Copyedit first 2 sections

* [Doc] copyedited chapter 2 CN

* [Doc] Add chapter 3 raw texts

* [Doc] Add chapter 3 preface and 3.1

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Add chapter 3.2 and 3.3

* [Doc] Remove EN parts

* [Doc] Copyediting 3.1

* [Doc] Copyediting 3.2 and 3.3

* [Doc] Proofreading 3.1 and 3.2

* [Doc] Proofreading 3.2 and 3.3

* [Doc] Add chapter 4 CN raw text.

* [Clean] Remove codes in other branches

* [Doc] Start to copyedit chapter 4 preface

* [Doc] copyedit CN section 4.1

* [Doc] Remove EN in User Guide Chapter 4

* [Doc] Copyedit chapter 4.1

* [Doc] copyedit cn chapter 4.2, 4.3, 4.4, and 4.5.

* [Doc] Fix errors in EN user guide graph feature and heterograph

* [Doc] 2nd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Doc] 3rd round copyediting with Murph's comments

* [Sync] syncronize with the dgl master

* [Doc] edited after Minjie's comments, 1st round

* update cub

Co-authored-by: Minjie Wang <[email protected]>
  • Loading branch information
2 people authored and BarclayII committed Nov 27, 2020
1 parent 9e4138a commit 2db8ccb
Show file tree
Hide file tree
Showing 34 changed files with 1,233 additions and 16 deletions.
2 changes: 2 additions & 0 deletions docs/source/guide/data-dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
4.1 DGLDataset class
--------------------

:ref:`(中文版) <guide_cn-data-pipeline-dataset>`

:class:`~dgl.data.DGLDataset` is the base class for processing, loading and saving
graph datasets defined in :ref:`apidata`. It implements the basic pipeline
for processing graph data. The following flow chart shows how the
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/data-download.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
4.2 Download raw data (optional)
--------------------------------

:ref:`(中文版) <guide_cn-data-pipeline-download>`

If a dataset is already in local disk, make sure it’s in directory
``raw_dir``. If one wants to run the code anywhere without bothering to
download and move data to the right directory, one can do it
Expand Down
4 changes: 3 additions & 1 deletion docs/source/guide/data-loadogb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
4.5 Loading OGB datasets using ``ogb`` package
----------------------------------------------

:ref:`(中文版) <guide_cn-data-pipeline-loadogb>`

`Open Graph Benchmark (OGB) <https://ogb.stanford.edu/docs/home/>`__ is
a collection of benchmark datasets. The official OGB package
`ogb <https://github.com/snap-stanford/ogb>`__ provides APIs for
Expand Down Expand Up @@ -61,7 +63,7 @@ there is only one graph object in this kind of dataset.
valid_label = dataset.labels[split_idx['valid']]
test_label = dataset.labels[split_idx['test']]
*Link Property Prediction* datasets also contain one graph per dataset:
*Link Property Prediction* datasets also contain one graph per dataset.

.. code::
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/data-process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
4.3 Process data
----------------

:ref:`(中文版) <guide_cn-data-pipeline-process>`

One can implement the data processing code in function ``process()``, and it
assumes that the raw data is located in ``self.raw_dir`` already. There
are typically three types of tasks in machine learning on graphs: graph
Expand Down
9 changes: 3 additions & 6 deletions docs/source/guide/data-savenload.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
4.4 Save and load data
----------------------

:ref:`(中文版) <guide_cn-data-pipeline-savenload>`

DGL recommends implementing saving and loading functions to cache the
processed data in local disk. This saves a lot of data processing time
in most cases. DGL provides four functions to make things simple:
Expand Down Expand Up @@ -44,9 +46,4 @@ dataset information.
Note that there are cases not suitable to save processed data. For
example, in the builtin dataset :class:`~dgl.data.GDELTDataset`,
the processed data is quite large, so it’s more effective to process
each data example in ``__getitem__(idx)``.

.. code::
print(split_edge['valid'].keys())
print(split_edge['test'].keys())
each data example in ``__getitem__(idx)``.
2 changes: 2 additions & 0 deletions docs/source/guide/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Chapter 4: Graph Data Pipeline
==============================

:ref:`(中文版) <guide_cn-data-pipeline>`

DGL implements many commonly used graph datasets in :ref:`apidata`. They
follow a standard pipeline defined in class :class:`dgl.data.DGLDataset`. DGL highly
recommends processing graph data into a :class:`dgl.data.DGLDataset` subclass, as the
Expand Down
1 change: 1 addition & 0 deletions docs/source/guide/graph-feature.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ For weighted graphs, one can store the weights as an edge feature as below.
ndata_schemes={}
edata_schemes={'w' : Scheme(shape=(,), dtype=torch.float32)})
See APIs: :py:attr:`~dgl.DGLGraph.ndata`, :py:attr:`~dgl.DGLGraph.edata`.
2 changes: 2 additions & 0 deletions docs/source/guide/message-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
2.1 Built-in Functions and Message Passing APIs
-----------------------------------------------

:ref:`(中文版) <guide_cn-message-passing-api>`

In DGL, **message function** takes a single argument ``edges``,
which is an :class:`~dgl.udf.EdgeBatch` instance. During message passing,
DGL generates it internally to represent a batch of edges. It has three
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/message-edge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
2.4 Apply Edge Weight In Message Passing
----------------------------------------

:ref:`(中文版) <guide_cn-message-passing-edge>`

A commonly seen practice in GNN modeling is to apply edge weight on the
message before message aggregation, for examples, in
`GAT <https://arxiv.org/pdf/1710.10903.pdf>`__ and some `GCN
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/message-efficient.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
2.2 Writing Efficient Message Passing Code
------------------------------------------

:ref:`(中文版) <guide_cn-message-passing-efficient>`

DGL optimizes memory consumption and computing speed for message
passing. The optimization includes:

Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/message-heterograph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
2.5 Message Passing on Heterogeneous Graph
------------------------------------------

:ref:`(中文版) <guide_cn-message-passing-heterograph>`

Heterogeneous graphs (:ref:`guide-graph-heterogeneous`), or
heterographs for short, are graphs that contain different types of nodes
and edges. The different types of nodes and edges tend to have different
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/message-part.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
2.3 Apply Message Passing On Part Of The Graph
----------------------------------------------

:ref:`(中文版) <guide_cn-message-passing-part>`

If one only wants to update part of the nodes in the graph, the practice
is to create a subgraph by providing the IDs for the nodes to
include in the update, then call :meth:`~dgl.DGLGraph.update_all` on the
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/message.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Chapter 2: Message Passing
==========================

:ref:`(中文版) <guide_cn-message-passing>`

Message Passing Paradigm
------------------------

Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/nn-construction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
3.1 DGL NN Module Construction Function
---------------------------------------

:ref:`(中文版) <guide_cn-nn-construction>`

The construction function performs the following steps:

1. Set options.
Expand Down
10 changes: 5 additions & 5 deletions docs/source/guide/nn-forward.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
3.2 DGL NN Module Forward Function
----------------------------------

:ref:`(中文版) <guide_cn-nn-forward>`

In NN module, ``forward()`` function does the actual message passing and
computation. Compared with PyTorch’s NN module which usually takes
tensors as the parameters, DGL NN module takes an additional parameter
Expand Down Expand Up @@ -60,7 +62,7 @@ The math formulas for SAGEConv are:
One needs to specify the source node feature ``feat_src`` and destination
node feature ``feat_dst`` according to the graph type.
:meth:``~dgl.utils.expand_as_pair`` is a function that specifies the graph
:meth:`~dgl.utils.expand_as_pair` is a function that specifies the graph
type and expand ``feat`` into ``feat_src`` and ``feat_dst``.
The detail of this function is shown below.

Expand Down Expand Up @@ -95,9 +97,7 @@ element will be the destination node feature.

In mini-batch training, the computing is applied on a subgraph sampled
based on a bunch of destination nodes. The subgraph is called as
``block`` in DGL. After message passing, only those destination nodes
will be updated since they have the same neighborhood as the one they
have in the original full graph. In the block creation phase,
``block`` in DGL. In the block creation phase,
``dst nodes`` are in the front of the node list. One can find the
``feat_dst`` by the index ``[0:g.number_of_dst_nodes()]``.

Expand All @@ -120,7 +120,7 @@ Message passing and reducing
elif self._aggre_type == 'gcn':
check_eq_shape(feat)
graph.srcdata['h'] = feat_src
graph.dstdata['h'] = feat_dst # same as above if homogeneous
graph.dstdata['h'] = feat_dst
graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'neigh'))
# divide in_degrees
degs = graph.in_degrees().to(feat_dst)
Expand Down
6 changes: 4 additions & 2 deletions docs/source/guide/nn-heterograph.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
.. _guide-nn-heterograph:

3.3 Heterogeneous GraphConv Module
----------------------------------
------------------------------------

:ref:`(中文版) <guide_cn-nn-heterograph>`

:class:`~dgl.nn.pytorch.HeteroGraphConv`
is a module-level encapsulation to run DGL NN module on heterogeneous
graphs. The implementation logic is the same as message passing level API
:meth:`~dgl.DGLGraph.multi_update_all`:
:meth:`~dgl.DGLGraph.multi_update_all`, including:

- DGL NN module within each relation :math:`r`.
- Reduction that merges the results on the same node type from multiple
Expand Down
2 changes: 2 additions & 0 deletions docs/source/guide/nn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Chapter 3: Building GNN Modules
===============================

:ref:`(中文版) <guide_cn-nn>`

DGL NN module consists of building blocks for GNN models. An NN module inherits
from `Pytorch’s NN Module <https://pytorch.org/docs/1.2.0/_modules/torch/nn/modules/module.html>`__, `MXNet Gluon’s NN Block <http://mxnet.incubator.apache.org/versions/1.6/api/python/docs/api/gluon/nn/index.html>`__ and `TensorFlow’s Keras
Layer <https://www.tensorflow.org/api_docs/python/tf/keras/layers>`__, depending on the DNN framework backend in use. In a DGL NN
Expand Down
89 changes: 89 additions & 0 deletions docs/source/guide_cn/data-dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
.. _guide_cn-data-pipeline-dataset:

4.1 DGLDataset类
--------------------

:ref:`(English Version) <guide-data-pipeline-dataset>`

:class:`~dgl.data.DGLDataset` 是处理、导入和保存 :ref:`apidata` 中定义的图数据集的基类。
它实现了用于处理图数据的基本模版。下面的流程图展示了这个模版的工作方式。

.. figure:: https://data.dgl.ai/asset/image/userguide_data_flow.png
:align: center

在类DGLDataset中定义的图数据处理模版的流程图。

为了处理位于远程服务器或本地磁盘上的图数据集,下面的例子中定义了一个类,称为 ``MyDataset``,
它继承自 :class:`dgl.data.DGLDataset`。

.. code::
from dgl.data import DGLDataset
class MyDataset(DGLDataset):
""" 用于在DGL中自定义图数据集的模板:
Parameters
----------
url : str
下载原始数据集的url。
raw_dir : str
指定下载数据的存储目录或已下载数据的存储目录。默认: ~/.dgl/
save_dir : str
处理完成的数据集的保存目录。默认:raw_dir指定的值
force_reload : bool
是否重新导入数据集。默认:False
verbose : bool
是否打印进度信息。
"""
def __init__(self,
url=None,
raw_dir=None,
save_dir=None,
force_reload=False,
verbose=False):
super(MyDataset, self).__init__(name='dataset_name',
url=url,
raw_dir=raw_dir,
save_dir=save_dir,
force_reload=force_reload,
verbose=verbose)
def download(self):
# 将原始数据下载到本地磁盘
pass
def process(self):
# 将原始数据处理为图、标签和数据集划分的掩码
pass
def __getitem__(self, idx):
# 通过idx得到与之对应的一个样本
pass
def __len__(self):
# 数据样本的数量
pass
def save(self):
# 将处理后的数据保存至 `self.save_path`
pass
def load(self):
# 从 `self.save_path` 导入处理后的数据
pass
def has_cache(self):
# 检查在 `self.save_path` 中是否存有处理后的数据
pass
:class:`~dgl.data.DGLDataset` 类有抽象函数 ``process()``,
``__getitem__(idx)`` 和 ``__len__()``。子类必须实现这些函数。同时DGL也建议实现保存和导入函数,
因为对于处理后的大型数据集,这么做可以节省大量的时间,
并且有多个已有的API可以简化此操作(请参阅 :ref:`guide_cn-data-pipeline-savenload`)。

请注意, :class:`~dgl.data.DGLDataset` 的目的是提供一种标准且方便的方式来导入图数据。
用户可以存储有关数据集的图、特征、标签、掩码,以及诸如类别数、标签数等基本信息。
诸如采样、划分或特征归一化等操作建议在 :class:`~dgl.data.DGLDataset` 子类之外完成。

本章的后续部分展示了实现这些函数的最佳实践。
50 changes: 50 additions & 0 deletions docs/source/guide_cn/data-download.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _guide_cn-data-pipeline-download:

4.2 下载原始数据(可选)
--------------------------------

:ref:`(English Version) <guide-data-pipeline-download>`

如果用户的数据集已经在本地磁盘中,请确保它被存放在目录 ``raw_dir`` 中。
如果用户想在任何地方运行代码而又不想自己下载数据并将其移动到正确的目录中,则可以通过实现函数 ``download()`` 来自动完成。

如果数据集是一个zip文件,可以直接继承 :class:`dgl.data.DGLBuiltinDataset` 类。后者支持解压缩zip文件。
否则用户需要自己实现 ``download()``,具体可以参考 :class:`~dgl.data.QM7bDataset` 类:

.. code::
import os
from dgl.data.utils import download
def download(self):
# 存储文件的路径
file_path = os.path.join(self.raw_dir, self.name + '.mat')
# 下载文件
download(self.url, path=file_path)
上面的代码将一个.mat文件下载到目录 ``self.raw_dir``。如果文件是.gz、.tar、.tar.gz或.tgz文件,请使用
:func:`~dgl.data.utils.extract_archive` 函数进行解压缩。以下代码展示了如何在
:class:`~dgl.data.BitcoinOTCDataset` 类中下载一个.gz文件:

.. code::
from dgl.data.utils import download, check_sha1
def download(self):
# 存储文件的路径,请确保使用与原始文件名相同的后缀
gz_file_path = os.path.join(self.raw_dir, self.name + '.csv.gz')
# 下载文件
download(self.url, path=gz_file_path)
# 检查 SHA-1
if not check_sha1(gz_file_path, self._sha1_str):
raise UserWarning('File {} is downloaded but the content hash does not match.'
'The repo may be outdated or download may be incomplete. '
'Otherwise you can create an issue for it.'.format(self.name + '.csv.gz'))
# 将文件解压缩到目录self.raw_dir下的self.name目录中
self._extract_gz(gz_file_path, self.raw_path)
上面的代码会将文件解压缩到 ``self.raw_dir`` 下的目录 ``self.name`` 中。
如果该类继承自 :class:`dgl.data.DGLBuiltinDataset` 来处理zip文件,
则它也会将文件解压缩到目录 ``self.name`` 中。

一个可选项是用户可以按照上面的示例检查下载后文件的SHA-1字符串,以防作者在远程服务器上更改了文件。
Loading

0 comments on commit 2db8ccb

Please sign in to comment.