fix sa api docs (PaddlePaddle#166) (PaddlePaddle#167)

* fix sa api docs * update * update * update
doublePKU · Mar 5, 2020 · f0f434f · f0f434f
1 parent ae3f71b
commit f0f434f
Show file tree

Hide file tree

Showing 7 changed files with 32 additions and 41 deletions.
diff --git a/docs/en/api_en/index_en.rst b/docs/en/api_en/index_en.rst
@@ -16,5 +16,5 @@ API Documents
    paddleslim.nas.rst
    paddleslim.nas.one_shot.rst
    paddleslim.pantheon.rst
-   search_space_en.rst
+   search_space_en.md
    table_latency_en.md
diff --git a/docs/en/api_en/search_space_en.rst → docs/en/api_en/search_space_en.md b/docs/en/api_en/search_space_en.rst → docs/en/api_en/search_space_en.md
@@ -1,11 +1,9 @@
-search space
-========
+# search space
 Search Space used in neural architecture search. Search Space is a collection of model architecture, the purpose of SANAS is to get a model which FLOPs or latency is smaller or percision is higher.
 
-search space which paddleslim.nas provided
--------
+## search space which paddleslim.nas provided
 
-Based on origin model architecture:
+#### Based on origin model architecture:
 1. MobileNetV2Space<br>
 &emsp; MobileNetV2's architecture can reference: [code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L29), [paper](https://arxiv.org/abs/1801.04381)
 
@@ -16,7 +14,7 @@ Based on origin model architecture:
 &emsp; ResNetSpace's architecture can reference: [code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L30), [paper](https://arxiv.org/pdf/1512.03385.pdf)
 
 
-Based on block from different model:
+#### Based on block from different model:
 1. MobileNetV1BlockSpace<br>
 &emsp; MobileNetV1Block's architecture can reference: [code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L173)
 
@@ -33,15 +31,13 @@ Based on block from different model:
 &emsp; InceptionCBlock's architecture can reference: [code](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L291)
 
 
-How to use search space
---------
+## How to use search space
 1. Only need to specify the name of search space if use the space based on origin model architecture, such as configs for class SANAS is [('MobileNetV2Space')] if you want to use origin MobileNetV2 as search space.
 2. Use search space paddleslim.nas provided based on block:<br>
   2.1 Use `input_size`, `output_size` and `block_num` to construct search space, such as configs for class SANAS is ('MobileNetV2BlockSpace', {'input_size': 224, 'output_size': 32, 'block_num': 10})].<br>
   2.2 Use `block_mask` to construct search space, such as configs for class SANAS is [('MobileNetV2BlockSpace', {'block_mask': [0, 1, 1, 1, 1, 0, 1, 0]})].
 
-How to write yourself search space
---------
+## How to write yourself search space
 If you want to write yourself search space, you need to inherit base class named SearchSpaceBase and overwrite following functions:<br>
 &emsp; 1. Function to get initial tokens(function `init_tokens`), set the initial tokens which you want, every token in tokens means index of search list, such as if tokens=[0, 3, 5], it means the list of channel of current model architecture is [8, 40, 128].
 &emsp; 2. Function about the length of every token in tokens(function `range_table`), range of every token in tokens.

diff --git a/docs/zh_cn/algo/algo.md b/docs/zh_cn/algo/algo.md
@@ -14,14 +14,14 @@
 近年来，定点量化使用更少的比特数（如8-bit、3-bit、2-bit等）表示神经网络的权重和激活已被验证是有效的。定点量化的优点包括低内存带宽、低功耗、低计算资源占用以及低模型存储需求等。
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/quan_table_0.png" height=258 width=600 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/quan_table_0.png" height=258 width=600 hspace='10'/> <br />
 <strong>表1: 不同类型操作的开销对比</strong>
 </p>
 
 由表1可知，低精度定点数操作的硬件面积大小及能耗比高精度浮点数要少几个数量级。 使用定点量化可带来4倍的模型压缩、4倍的内存带宽提升，以及更高效的cache利用(很多硬件设备，内存访问是主要能耗)。除此之外，计算速度也会更快(通常具有2x-3x的性能提升)。由表2可知，在很多场景下，定点量化操作对精度并不会造成损失。另外，定点量化对神经网络于嵌入式设备上的推断来说是极其重要的。
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/quan_table_1.png" height=155 width=500 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/quan_table_1.png" height=155 width=500 hspace='10'/> <br />
 <strong>表2：模型量化前后精度对比</strong>
 </p>
 
@@ -45,7 +45,7 @@ $q = scale * r + b$
 前向传播过程采用模拟量化的方式，具体描述如下：
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/quan_forward.png" height=433 width=335 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/quan_forward.png" height=433 width=335 hspace='10'/> <br />
 <strong>图1：基于模拟量化训练的前向过程</strong>
 </p>
 
@@ -69,7 +69,7 @@ $$
 上述公式表明反量化操作可以被移动到`GEMM`之前，即先对$Xq$和$Wq$执行反量化操作再做`GEMM`操作。因此，前向传播的工作流亦可表示为如下方式：
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/quan_fwd_1.png" height=435 width=341 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/quan_fwd_1.png" height=435 width=341 hspace='10'/> <br />
 <strong>图2：基于模拟量化训练前向过程的等价工作流</strong>
 </p>
 
@@ -79,7 +79,7 @@ $$
 由图3可知，权重更新所需的梯度值可以由量化后的权重和量化后的激活求得。反向传播过程中的所有输入和输出均为32-bit浮点型数据。注意，梯度更新操作需要在原始权重上进行，即计算出的梯度将被加到原始权重上而非量化后或反量化后的权重上。
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/quan_bwd.png" height=300 width=650 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/quan_bwd.png" height=300 width=650 hspace='10'/> <br />
 <strong>图3：基于模拟量化训练的反向传播和权重更新过程</strong>
 </p>
 
@@ -127,7 +127,7 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
 
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_0.png" height=200 width=600 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/pruning_0.png" height=200 width=600 hspace='10'/> <br />
 <strong>图4</strong>
 </p>
 
@@ -139,7 +139,7 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
 减去被删除的一行：greedy pruning
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_1.png" height=200 width=450 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/pruning_1.png" height=200 width=450 hspace='10'/> <br />
 <strong>图5</strong>
 </p>
 
@@ -149,7 +149,7 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
 
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_2.png" height=240 width=600 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/pruning_2.png" height=240 width=600 hspace='10'/> <br />
 <strong>图6</strong>
 </p>
 
@@ -176,7 +176,7 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
 #### 敏感度的理解
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_3.png" height=200 width=400 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/pruning_3.png" height=200 width=400 hspace='10'/> <br />
 <strong>图7</strong>
 </p>
 
@@ -189,7 +189,7 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
 用户给定一个模型整体的剪裁率，我们通过移动**图5**中的黑色实线来找到一组满足条件的且合法的剪裁率。
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/pruning_4.png" height=200 width=400 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/pruning_4.png" height=200 width=400 hspace='10'/> <br />
 <strong>图8</strong>
 </p>
 
@@ -206,12 +206,11 @@ $$ Vt = (1 - k) * V + k * V_{t-1} $$
    一般情况下，模型参数量越多，结构越复杂，其性能越好，但参数也越允余，运算量和资源消耗也越大；模型蒸馏是将复杂网络中的有用信息将复杂网络中的有用信息提取出来提取出来，迁移到一个更小的网络中去，在我们的工具包中，支持两种蒸馏的方法。
     第一种是传统的蒸馏方法（参考论文：[Distilling the Knowledge in a Neural Network](https://arxiv.org/pdf/1503.02531.pdf)）
    使用复杂的网络作为teacher模型去监督训练一个参数量和运算量更少的student模型。teacher模型可以是一个或者多个提前训练好的高性能模型。student模型的训练有两个目标：一个是原始的目标函数，为student模型输出的类别概率和label的交叉熵，记为hard-target；另一个是student模型输出的类别概率和teacher模型输出的类别概率的交叉熵，记为soft target，这两个loss加权后得到最终的训练loss，共同监督studuent模型的训练。
-   第二种是基于FSP的蒸馏方法（参考论文：[A Gift from Knowledge Distillation:
-Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf)）
+   第二种是基于FSP的蒸馏方法（参考论文：[A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf)）
    相比传统的蒸馏方法直接用小模型去拟合大模型的输出，该方法用小模型去拟合大模型不同层特征之间的转换关系，其用一个FSP矩阵（特征的内积）来表示不同层特征之间的关系，大模型和小模型不同层之间分别获得多个FSP矩阵，然后使用L2 loss让小模型的对应层FSP矩阵和大模型对应层的FSP矩阵尽量一致，具体如下图所示。这种方法的优势，通俗的解释是，比如将蒸馏类比成teacher（大模型）教student（小模型）解决一个问题，传统的蒸馏是直接告诉小模型问题的答案，让小模型学习，而学习FSP矩阵是让小模型学习解决问题的中间过程和方法，因此其学到的信息更多。
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/distillation_0.png" height=300 width=600 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/distillation_0.png" height=300 width=600 hspace='10'/> <br />
 <strong>图9</strong>
 </p>
 
@@ -258,7 +257,7 @@ e^{\frac{(r_k-r)}{T_k}} & r_k < r\\
 因为要搜索出在移动端运行速度快的模型，我们参考了MobileNetV2中的Linear Bottlenecks和Inverted residuals结构，搜索每一个Inverted residuals中的具体参数，包括kernelsize、channel扩张倍数、重复次数、channels number。如图10所示：
 
 <p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/develop/docs/docs/images/algo/light-nas-block.png" height=300 width=600 hspace='10'/> <br />
+<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleSlim/release/1.0.1/docs/images/algo/light-nas-block.png" height=300 width=600 hspace='10'/> <br />
 <strong>图10</strong>
 </p>
 

diff --git a/docs/zh_cn/api_cn/index.rst b/docs/zh_cn/api_cn/index.rst
@@ -16,5 +16,5 @@ API文档
    prune_api.rst
    quantization_api.rst
    single_distiller_api.rst
-   search_space.rst
+   search_space.md
    table_latency.md
diff --git a/docs/zh_cn/api_cn/nas_api.rst b/docs/zh_cn/api_cn/nas_api.rst
@@ -5,7 +5,7 @@ SA-NAS
 ----------------------
 
 
-通过参数配置搜索空间。更多搜索空间的使用可以参考: [search_space](../search_space.md)
+通过参数配置搜索空间。更多搜索空间的使用可以参考: `search_space <https://paddlepaddle.github.io/PaddleSlim/api_cn/search_space.html>`_
 
 **参数：**
 
@@ -119,7 +119,7 @@ SANAS（Simulated Annealing Neural Architecture Search）是基于模拟退火
       sanas.reward(float(score))
    
    
-   .. py:methd:: tokens2arch(tokens)
+   .. py:method:: tokens2arch(tokens)
 
    通过一组tokens得到实际的模型结构，一般用来把搜索到最优的token转换为模型结构用来做最后的训练。tokens的形式是一个列表，tokens映射到搜索空间转换成相应的网络结构，一组tokens对应唯一的一个网络结构。
 

diff --git a/docs/zh_cn/api_cn/search_space.rst → docs/zh_cn/api_cn/search_space.md b/docs/zh_cn/api_cn/search_space.rst → docs/zh_cn/api_cn/search_space.md
@@ -1,11 +1,9 @@
-搜索空间
-=========
+# 搜索空间
 搜索空间是神经网络搜索中的一个概念。搜索空间是一系列模型结构的汇集, SANAS主要是利用模拟退火的思想在搜索空间中搜索到一个比较小的模型结构或者一个精度比较高的模型结构。
 
-paddleslim.nas 提供的搜索空间
---------
+## paddleslim.nas 提供的搜索空间
 
-根据初始模型结构构造搜索空间:
+#### 根据初始模型结构构造搜索空间:
 
 1. MobileNetV2Space<br>
 &emsp; MobileNetV2的网络结构可以参考：[代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v2.py#L29)，[论文](https://arxiv.org/abs/1801.04381)
@@ -17,7 +15,7 @@ paddleslim.nas 提供的搜索空间
 &emsp; ResNetSpace的网络结构可以参考：[代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/resnet.py#L30)，[论文](https://arxiv.org/pdf/1512.03385.pdf)
 
 
-根据相应模型的block构造搜索空间:
+#### 根据相应模型的block构造搜索空间:
 1. MobileNetV1BlockSpace<br>
 &emsp; MobileNetV1Block的结构可以参考：[代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/mobilenet_v1.py#L173)
 
@@ -34,17 +32,15 @@ paddleslim.nas 提供的搜索空间
 &emsp; InceptionCBlock结构可以参考：[代码](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/models/inception_v4.py#L291)
 
 
-搜索空间使用示例
---------
+## 搜索空间使用示例
 
 1. 使用paddleslim中提供用初始的模型结构来构造搜索空间的话，仅需要指定搜索空间名字即可。例如：如果使用原本的MobileNetV2的搜索空间进行搜索的话，传入SANAS中的configs直接指定为[('MobileNetV2Space')]。
 2. 使用paddleslim中提供的block搜索空间构造搜索空间：<br>
   2.1 使用`input_size`, `output_size`和`block_num`来构造搜索空间。例如：传入SANAS的configs可以指定为[('MobileNetV2BlockSpace', {'input_size': 224, 'output_size': 32, 'block_num': 10})]。<br>
   2.2 使用`block_mask`构造搜索空间。例如：传入SANAS的configs可以指定为[('MobileNetV2BlockSpace', {'block_mask': [0, 1, 1, 1, 1, 0, 1, 0]})]。
 
 
-自定义搜索空间(search space)
---------
+## 自定义搜索空间(search space)
 
 自定义搜索空间类需要继承搜索空间基类并重写以下几部分：<br>
 &emsp; 1. 初始化的tokens(`init_tokens`函数)，可以设置为自己想要的tokens列表, tokens列表中的每个数字指的是当前数字在相应的搜索列表中的索引。例如本示例中若tokens=[0, 3, 5]，则代表当前模型结构搜索到的通道数为[8, 40, 128]。<br>

diff --git a/paddleslim/nas/sa_nas.py b/paddleslim/nas/sa_nas.py
@@ -190,10 +190,10 @@ def __init__(self,
             self._iter = 0
 
     def _get_host_ip(self):
-        if os.name == 'posix':
-            return socket.gethostbyname('localhost')
-        else:
+        try:
             return socket.gethostbyname(socket.gethostname())
+        except:
+            return socket.gethostbyname('localhost')
 
     def tokens2arch(self, tokens):
         """