[Doc] Chinese User Guide chapter 1 - 4 (dmlc#2351)

* [Feature] Add full graph training with dgl built-in dataset. * [Feature] Add full graph training with dgl built-in dataset. * [Feature] Add full graph training with dgl built-in dataset. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Bug] fix model to cuda. * [Feature] Add test loss and accuracy * [Feature] Add test loss and accuracy * [Feature] Add test loss and accuracy * [Feature] Add test loss and accuracy * [Feature] Add test loss and accuracy * [Feature] Add test loss and accuracy * [Fix] Add random * [Bug] Fix batch norm error * [Doc] Test with CN in Sphinx * [Doc] Test with CN in Sphinx * [Doc] Remove the test CN docs. * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Feature] Add input embedding layer * [Doc] fill readme with new performance results * [Doc] Add Chinese User Guide, graph and 1.5 * [Doc] Add Chinese User Guide, graph and 1.5 * Update README.md * [Fix] Temporary remove compgcn * [Doc] Add CN user guide chapter2 * [Test] Tunning format * [Test] Tunning format * [Test] Tunning format * [Test] Tunning format * [Test] Tunning format * [Test] Section headers * [Fix] Fix format errors * [Fix] Fix format errors * [Fix] Fix format errors * [Doc] Add CN-EN EN-CN links * [Doc] Add CN-EN EN-CN links * [Doc] Copyedit chapter2 * [Doc] Copyedit chapter2 * [Doc] Remove EN in 2.1 * [Doc] Remove EN in chapter 2 * [Doc] Copyedit first 2 sections * [Doc] Copyedit first 2 sections * [Doc] copyedited chapter 2 CN * [Doc] Add chapter 3 raw texts * [Doc] Add chapter 3 preface and 3.1 * [Doc] Add chapter 3.2 and 3.3 * [Doc] Add chapter 3.2 and 3.3 * [Doc] Add chapter 3.2 and 3.3 * [Doc] Remove EN parts * [Doc] Copyediting 3.1 * [Doc] Copyediting 3.2 and 3.3 * [Doc] Proofreading 3.1 and 3.2 * [Doc] Proofreading 3.2 and 3.3 * [Doc] Add chapter 4 CN raw text. * [Clean] Remove codes in other branches * [Doc] Start to copyedit chapter 4 preface * [Doc] copyedit CN section 4.1 * [Doc] Remove EN in User Guide Chapter 4 * [Doc] Copyedit chapter 4.1 * [Doc] copyedit cn chapter 4.2, 4.3, 4.4, and 4.5. * [Doc] Fix errors in EN user guide graph feature and heterograph * [Doc] 2nd round copyediting with Murph's comments * [Doc] 3rd round copyediting with Murph's comments * [Doc] 3rd round copyediting with Murph's comments * [Doc] 3rd round copyediting with Murph's comments * [Sync] syncronize with the dgl master * [Doc] edited after Minjie's comments, 1st round * update cub Co-authored-by: Minjie Wang <[email protected]>
zeta1999 · Nov 27, 2020 · 2db8ccb · 2db8ccb
1 parent 9e4138a
commit 2db8ccb
Show file tree

Hide file tree

Showing 34 changed files with 1,233 additions and 16 deletions.
diff --git a/docs/source/guide/data-dataset.rst b/docs/source/guide/data-dataset.rst
@@ -3,6 +3,8 @@
 4.1 DGLDataset class
 --------------------
 
+:ref:`(中文版) <guide_cn-data-pipeline-dataset>`
+
 :class:`~dgl.data.DGLDataset` is the base class for processing, loading and saving
 graph datasets defined in :ref:`apidata`. It implements the basic pipeline
 for processing graph data. The following flow chart shows how the

diff --git a/docs/source/guide/data-download.rst b/docs/source/guide/data-download.rst
@@ -3,6 +3,8 @@
 4.2 Download raw data (optional)
 --------------------------------
 
+:ref:`(中文版) <guide_cn-data-pipeline-download>`
+
 If a dataset is already in local disk, make sure it’s in directory
 ``raw_dir``. If one wants to run the code anywhere without bothering to
 download and move data to the right directory, one can do it

diff --git a/docs/source/guide/data-loadogb.rst b/docs/source/guide/data-loadogb.rst
@@ -3,6 +3,8 @@
 4.5 Loading OGB datasets using ``ogb`` package
 ----------------------------------------------
 
+:ref:`(中文版) <guide_cn-data-pipeline-loadogb>`
+
 `Open Graph Benchmark (OGB) <https://ogb.stanford.edu/docs/home/>`__ is
 a collection of benchmark datasets. The official OGB package
 `ogb <https://github.com/snap-stanford/ogb>`__ provides APIs for
@@ -61,7 +63,7 @@ there is only one graph object in this kind of dataset.
     valid_label = dataset.labels[split_idx['valid']]
     test_label = dataset.labels[split_idx['test']]
 
-*Link Property Prediction* datasets also contain one graph per dataset:
+*Link Property Prediction* datasets also contain one graph per dataset.
 
 .. code:: 
 

diff --git a/docs/source/guide/data-process.rst b/docs/source/guide/data-process.rst
@@ -3,6 +3,8 @@
 4.3 Process data
 ----------------
 
+:ref:`(中文版) <guide_cn-data-pipeline-process>`
+
 One can implement the data processing code in function ``process()``, and it
 assumes that the raw data is located in ``self.raw_dir`` already. There
 are typically three types of tasks in machine learning on graphs: graph

diff --git a/docs/source/guide/data-savenload.rst b/docs/source/guide/data-savenload.rst
@@ -3,6 +3,8 @@
 4.4 Save and load data
 ----------------------
 
+:ref:`(中文版) <guide_cn-data-pipeline-savenload>`
+
 DGL recommends implementing saving and loading functions to cache the
 processed data in local disk. This saves a lot of data processing time
 in most cases. DGL provides four functions to make things simple:
@@ -44,9 +46,4 @@ dataset information.
 Note that there are cases not suitable to save processed data. For
 example, in the builtin dataset :class:`~dgl.data.GDELTDataset`,
 the processed data is quite large, so it’s more effective to process
-each data example in ``__getitem__(idx)``.
-
-.. code::
-
-    print(split_edge['valid'].keys())
-    print(split_edge['test'].keys())
+each data example in ``__getitem__(idx)``.
diff --git a/docs/source/guide/data.rst b/docs/source/guide/data.rst
@@ -3,6 +3,8 @@
 Chapter 4: Graph Data Pipeline
 ==============================
 
+:ref:`(中文版) <guide_cn-data-pipeline>`
+
 DGL implements many commonly used graph datasets in :ref:`apidata`. They
 follow a standard pipeline defined in class :class:`dgl.data.DGLDataset`. DGL highly
 recommends processing graph data into a :class:`dgl.data.DGLDataset` subclass, as the

diff --git a/docs/source/guide/graph-feature.rst b/docs/source/guide/graph-feature.rst
@@ -61,4 +61,5 @@ For weighted graphs, one can store the weights as an edge feature as below.
           ndata_schemes={}
           edata_schemes={'w' : Scheme(shape=(,), dtype=torch.float32)})
 
+
 See APIs: :py:attr:`~dgl.DGLGraph.ndata`, :py:attr:`~dgl.DGLGraph.edata`.
diff --git a/docs/source/guide/message-api.rst b/docs/source/guide/message-api.rst
@@ -3,6 +3,8 @@
 2.1 Built-in Functions and Message Passing APIs
 -----------------------------------------------
 
+:ref:`(中文版) <guide_cn-message-passing-api>`
+
 In DGL, **message function** takes a single argument ``edges``,
 which is an :class:`~dgl.udf.EdgeBatch` instance. During message passing,
 DGL generates it internally to represent a batch of edges. It has three

diff --git a/docs/source/guide/message-edge.rst b/docs/source/guide/message-edge.rst
@@ -3,6 +3,8 @@
 2.4 Apply Edge Weight In Message Passing
 ----------------------------------------
 
+:ref:`(中文版) <guide_cn-message-passing-edge>`
+
 A commonly seen practice in GNN modeling is to apply edge weight on the
 message before message aggregation, for examples, in
 `GAT <https://arxiv.org/pdf/1710.10903.pdf>`__ and some `GCN

diff --git a/docs/source/guide/message-efficient.rst b/docs/source/guide/message-efficient.rst
@@ -3,6 +3,8 @@
 2.2 Writing Efficient Message Passing Code
 ------------------------------------------
 
+:ref:`(中文版) <guide_cn-message-passing-efficient>`
+
 DGL optimizes memory consumption and computing speed for message
 passing. The optimization includes:
 

diff --git a/docs/source/guide/message-heterograph.rst b/docs/source/guide/message-heterograph.rst
@@ -3,6 +3,8 @@
 2.5 Message Passing on Heterogeneous Graph
 ------------------------------------------
 
+:ref:`(中文版) <guide_cn-message-passing-heterograph>`
+
 Heterogeneous graphs (:ref:`guide-graph-heterogeneous`), or
 heterographs for short, are graphs that contain different types of nodes
 and edges. The different types of nodes and edges tend to have different

diff --git a/docs/source/guide/message-part.rst b/docs/source/guide/message-part.rst
@@ -3,6 +3,8 @@
 2.3 Apply Message Passing On Part Of The Graph
 ----------------------------------------------
 
+:ref:`(中文版) <guide_cn-message-passing-part>`
+
 If one only wants to update part of the nodes in the graph, the practice
 is to create a subgraph by providing the IDs for the nodes to
 include in the update, then call :meth:`~dgl.DGLGraph.update_all` on the

diff --git a/docs/source/guide/message.rst b/docs/source/guide/message.rst
@@ -3,6 +3,8 @@
 Chapter 2: Message Passing
 ==========================
 
+:ref:`(中文版) <guide_cn-message-passing>`
+
 Message Passing Paradigm
 ------------------------
 

diff --git a/docs/source/guide/nn-construction.rst b/docs/source/guide/nn-construction.rst
@@ -3,6 +3,8 @@
 3.1 DGL NN Module Construction Function
 ---------------------------------------
 
+:ref:`(中文版) <guide_cn-nn-construction>`
+
 The construction function performs the following steps:
 
 1. Set options.

diff --git a/docs/source/guide/nn-forward.rst b/docs/source/guide/nn-forward.rst
@@ -3,6 +3,8 @@
 3.2 DGL NN Module Forward Function
 ----------------------------------
 
+:ref:`(中文版) <guide_cn-nn-forward>`
+
 In NN module, ``forward()`` function does the actual message passing and
 computation. Compared with PyTorch’s NN module which usually takes
 tensors as the parameters, DGL NN module takes an additional parameter
@@ -60,7 +62,7 @@ The math formulas for SAGEConv are:
 
 One needs to specify the source node feature ``feat_src`` and destination
 node feature ``feat_dst`` according to the graph type.
-:meth:``~dgl.utils.expand_as_pair`` is a function that specifies the graph
+:meth:`~dgl.utils.expand_as_pair` is a function that specifies the graph
 type and expand ``feat`` into ``feat_src`` and ``feat_dst``.
 The detail of this function is shown below.
 
@@ -95,9 +97,7 @@ element will be the destination node feature.
 
 In mini-batch training, the computing is applied on a subgraph sampled
 based on a bunch of destination nodes. The subgraph is called as
-``block`` in DGL. After message passing, only those destination nodes
-will be updated since they have the same neighborhood as the one they
-have in the original full graph. In the block creation phase,
+``block`` in DGL. In the block creation phase,
 ``dst nodes`` are in the front of the node list. One can find the
 ``feat_dst`` by the index ``[0:g.number_of_dst_nodes()]``.
 
@@ -120,7 +120,7 @@ Message passing and reducing
                 elif self._aggre_type == 'gcn':
                     check_eq_shape(feat)
                     graph.srcdata['h'] = feat_src
-                    graph.dstdata['h'] = feat_dst     # same as above if homogeneous
+                    graph.dstdata['h'] = feat_dst
                     graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'neigh'))
                     # divide in_degrees
                     degs = graph.in_degrees().to(feat_dst)

diff --git a/docs/source/guide/nn-heterograph.rst b/docs/source/guide/nn-heterograph.rst
@@ -1,12 +1,14 @@
 .. _guide-nn-heterograph:
 
 3.3 Heterogeneous GraphConv Module
-----------------------------------
+------------------------------------
+
+:ref:`(中文版) <guide_cn-nn-heterograph>`
 
 :class:`~dgl.nn.pytorch.HeteroGraphConv`
 is a module-level encapsulation to run DGL NN module on heterogeneous
 graphs. The implementation logic is the same as message passing level API
-:meth:`~dgl.DGLGraph.multi_update_all`:
+:meth:`~dgl.DGLGraph.multi_update_all`, including:
 
 -  DGL NN module within each relation :math:`r`.
 -  Reduction that merges the results on the same node type from multiple

diff --git a/docs/source/guide/nn.rst b/docs/source/guide/nn.rst
@@ -3,6 +3,8 @@
 Chapter 3: Building GNN Modules
 ===============================
 
+:ref:`(中文版) <guide_cn-nn>`
+
 DGL NN module consists of building blocks for GNN models. An NN module inherits
 from `Pytorch’s NN Module <https://pytorch.org/docs/1.2.0/_modules/torch/nn/modules/module.html>`__, `MXNet Gluon’s NN Block  <http://mxnet.incubator.apache.org/versions/1.6/api/python/docs/api/gluon/nn/index.html>`__ and `TensorFlow’s Keras
 Layer <https://www.tensorflow.org/api_docs/python/tf/keras/layers>`__, depending on the DNN framework backend in use. In a DGL NN

diff --git a/docs/source/guide_cn/data-dataset.rst b/docs/source/guide_cn/data-dataset.rst
@@ -0,0 +1,89 @@
+.. _guide_cn-data-pipeline-dataset:
+
+4.1 DGLDataset类
+--------------------
+
+:ref:`(English Version) <guide-data-pipeline-dataset>`
+
+:class:`~dgl.data.DGLDataset` 是处理、导入和保存 :ref:`apidata` 中定义的图数据集的基类。
+它实现了用于处理图数据的基本模版。下面的流程图展示了这个模版的工作方式。
+
+.. figure:: https://data.dgl.ai/asset/image/userguide_data_flow.png
+    :align: center
+
+    在类DGLDataset中定义的图数据处理模版的流程图。
+
+为了处理位于远程服务器或本地磁盘上的图数据集，下面的例子中定义了一个类，称为 ``MyDataset``,
+它继承自 :class:`dgl.data.DGLDataset`。
+
+.. code::
+
+    from dgl.data import DGLDataset
+    
+    class MyDataset(DGLDataset):
+        """ 用于在DGL中自定义图数据集的模板：
+    
+        Parameters
+        ----------
+        url : str
+            下载原始数据集的url。
+        raw_dir : str
+            指定下载数据的存储目录或已下载数据的存储目录。默认: ~/.dgl/
+        save_dir : str
+            处理完成的数据集的保存目录。默认：raw_dir指定的值
+        force_reload : bool
+            是否重新导入数据集。默认：False
+        verbose : bool
+            是否打印进度信息。
+        """
+        def __init__(self, 
+                     url=None, 
+                     raw_dir=None, 
+                     save_dir=None, 
+                     force_reload=False, 
+                     verbose=False):
+            super(MyDataset, self).__init__(name='dataset_name',
+                                            url=url,
+                                            raw_dir=raw_dir,
+                                            save_dir=save_dir,
+                                            force_reload=force_reload,
+                                            verbose=verbose)
+    
+        def download(self):
+            # 将原始数据下载到本地磁盘
+            pass
+    
+        def process(self):
+            # 将原始数据处理为图、标签和数据集划分的掩码
+            pass
+        
+        def __getitem__(self, idx):
+            # 通过idx得到与之对应的一个样本
+            pass
+    
+        def __len__(self):
+            # 数据样本的数量
+            pass
+    
+        def save(self):
+            # 将处理后的数据保存至 `self.save_path`
+            pass
+    
+        def load(self):
+            # 从 `self.save_path` 导入处理后的数据
+            pass
+    
+        def has_cache(self):
+            # 检查在 `self.save_path` 中是否存有处理后的数据
+            pass
+
+:class:`~dgl.data.DGLDataset` 类有抽象函数 ``process()``，
+``__getitem__(idx)`` 和 ``__len__()``。子类必须实现这些函数。同时DGL也建议实现保存和导入函数，
+因为对于处理后的大型数据集，这么做可以节省大量的时间，
+并且有多个已有的API可以简化此操作(请参阅 :ref:`guide_cn-data-pipeline-savenload`)。
+
+请注意， :class:`~dgl.data.DGLDataset` 的目的是提供一种标准且方便的方式来导入图数据。
+用户可以存储有关数据集的图、特征、标签、掩码，以及诸如类别数、标签数等基本信息。
+诸如采样、划分或特征归一化等操作建议在 :class:`~dgl.data.DGLDataset` 子类之外完成。
+
+本章的后续部分展示了实现这些函数的最佳实践。
diff --git a/docs/source/guide_cn/data-download.rst b/docs/source/guide_cn/data-download.rst
@@ -0,0 +1,50 @@
+.. _guide_cn-data-pipeline-download:
+
+4.2 下载原始数据（可选）
+--------------------------------
+
+:ref:`(English Version) <guide-data-pipeline-download>`
+
+如果用户的数据集已经在本地磁盘中，请确保它被存放在目录 ``raw_dir`` 中。
+如果用户想在任何地方运行代码而又不想自己下载数据并将其移动到正确的目录中，则可以通过实现函数 ``download()`` 来自动完成。
+
+如果数据集是一个zip文件，可以直接继承 :class:`dgl.data.DGLBuiltinDataset` 类。后者支持解压缩zip文件。
+否则用户需要自己实现 ``download()``，具体可以参考 :class:`~dgl.data.QM7bDataset` 类：
+
+.. code:: 
+
+    import os
+    from dgl.data.utils import download
+    
+    def download(self):
+        # 存储文件的路径
+        file_path = os.path.join(self.raw_dir, self.name + '.mat')
+        # 下载文件
+        download(self.url, path=file_path)
+
+上面的代码将一个.mat文件下载到目录 ``self.raw_dir``。如果文件是.gz、.tar、.tar.gz或.tgz文件，请使用
+:func:`~dgl.data.utils.extract_archive` 函数进行解压缩。以下代码展示了如何在
+:class:`~dgl.data.BitcoinOTCDataset` 类中下载一个.gz文件：
+
+.. code:: 
+
+    from dgl.data.utils import download, check_sha1
+    
+    def download(self):
+        # 存储文件的路径，请确保使用与原始文件名相同的后缀
+        gz_file_path = os.path.join(self.raw_dir, self.name + '.csv.gz')
+        # 下载文件
+        download(self.url, path=gz_file_path)
+        # 检查 SHA-1
+        if not check_sha1(gz_file_path, self._sha1_str):
+            raise UserWarning('File {} is downloaded but the content hash does not match.'
+                              'The repo may be outdated or download may be incomplete. '
+                              'Otherwise you can create an issue for it.'.format(self.name + '.csv.gz'))
+        # 将文件解压缩到目录self.raw_dir下的self.name目录中
+        self._extract_gz(gz_file_path, self.raw_path)
+
+上面的代码会将文件解压缩到 ``self.raw_dir`` 下的目录 ``self.name`` 中。
+如果该类继承自 :class:`dgl.data.DGLBuiltinDataset` 来处理zip文件，
+则它也会将文件解压缩到目录 ``self.name`` 中。
+
+一个可选项是用户可以按照上面的示例检查下载后文件的SHA-1字符串，以防作者在远程服务器上更改了文件。
Original file line number	Diff line number	Diff line change
Expand Up		@@ -61,4 +61,5 @@ For weighted graphs, one can store the weights as an edge feature as below.
		ndata_schemes={}
		edata_schemes={'w' : Scheme(shape=(,), dtype=torch.float32)})


		See APIs: :py:attr:`~dgl.DGLGraph.ndata`, :py:attr:`~dgl.DGLGraph.edata`.