Skip to content

Commit

Permalink
fix release note (PaddlePaddle#4140)
Browse files Browse the repository at this point in the history
  • Loading branch information
TCChenlong authored Dec 9, 2021
1 parent e6e675c commit 7c7f6f8
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 29 deletions.
10 changes: 1 addition & 9 deletions docs/release_note_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
- 新增 `paddle.device.cuda.graphs.CUDAGraph` API,支持NVIDIA的[CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/)功能,注意目前该API还处于实验阶段,尚未稳定。
- 修复了基础API、Tensor 索引中的已知问题。


## 2. 训练框架(含分布式)

### (1)新功能
Expand All @@ -21,7 +20,6 @@
- 新增``paddle.incubate.graph_send_recv`` API,主要应用于图学习领域,目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗,包含 SUM、MEAN、MIN、MAX 共四种更新模式。([#37205](https://github.com/PaddlePaddle/Paddle/pull/37205))
- 新增`paddle.incubate.operators.ResNetUnit` API,用于 ResNet 网络里的卷积、批归一化、shortcut/bottleneck操作融合。([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))


### (2)功能优化

#### API
Expand All @@ -37,19 +35,15 @@
#### 分布式训练
- 异构参数服务器完善任意次切图能力,增加流水线训练功能,提升训练吞吐。([#37446](https://github.com/PaddlePaddle/Paddle/pull/37446))


#### 其他

- 针对 `paddle.scatter```index`` 越界导致 core dump 的问题,加强了越界检查,并完善对应的报错信息。([#37431](https://github.com/PaddlePaddle/Paddle/pull/37431))


### (3)性能优化

- 优化 `paddle.top_k`,根据 ``k`` 的大小和 ``input_width`` 大小进行选择不同的实现方案,当 k>=75% input_width 时选择 cub 实现,否则选择手写 kernel 实现。([#37325](https://github.com/PaddlePaddle/Paddle/pull/37325))
- 优化`paddle.fluid.optimizer.LarsMomentumOptimizer`,通过 optimizer 算子融合 + [CUDA Cooperative Groups](https://developer.nvidia.com/blog/cooperative-groups/)的方式提高OP性能。([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))



### (4)问题修复

#### API
Expand All @@ -68,7 +62,6 @@
- 修复一维`Tensor`在使用省略号(...)索引时维度检测异常报错的问题。([#37192](https://github.com/PaddlePaddle/Paddle/pull/37192))
- 修复`Tensor`索引赋值(`setitem`)梯度属性无法传播的问题,详见[issue](https://github.com/PaddlePaddle/Paddle/issues/36902)。([#37028](https://github.com/PaddlePaddle/Paddle/pull/37028))


#### IR(Intermediate Representation)

- 动态图转静态图
Expand All @@ -85,15 +78,14 @@

- 修复动态图 inplace 操作的问题:对一个非叶子节点进行 inplace 操作后,立即执行 backward,该节点及更前的节点的梯度计算错误。([#37420](https://github.com/PaddlePaddle/Paddle/pull/37420))


## 4. 部署方向(Paddle Inference)

### (1)问题修复

- 在明确关闭日志的情况下,进一步去除冗余的调试日志。([#37212](https://github.com/PaddlePaddle/Paddle/pull/37212))
- 修复内存/显存优化策略,避免因不当的内存/显存优化导致预测结果有误或崩溃。([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324), [#37123](https://github.com/PaddlePaddle/Paddle/pull/37123))
- 修复 Transformer 模型的 MultiHead 结构中融合后 QkvToContextPluginDynamicscale 的 scale 计算错误问题,这是由于 cuda 函数的 block 和 thread 设置错误引起的。([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096))
- 将所有的推理OP在in8量化的功能中注册:解决因历史原因有些推理OP没有在int8量化中注册的问题。([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266))
- 将所有的推理OP在int8量化的功能中注册:解决因历史原因有些推理OP没有在int8量化中注册的问题。([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266))


# 2.2.0 Release Note
Expand Down
29 changes: 9 additions & 20 deletions docs/release_note_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@
This version fixed some function and performance issues of PaddlePaddle 2.2.0, and optimized some functions. The highlights are as follows:

- Add ``paddle.linalg.triangular_solve`` to calculate linear equations with triangular coefficient matrices.
- Add `paddle.device.cuda.graphs.CUDAGraph` API that supports the [CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/) function of NVIDIA. Note that the API is still experimental and not yet stable.
- Add `paddle.device.cuda.graphs.CUDAGraph` API that supports the [CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/) function of NVIDIA. Note that this API is still experimental and not yet stable.
- Fix known issues of basic API and Tensor index.


## 2. Training Framework(Distributed Included)

### (1)New Functions
Expand All @@ -18,10 +17,9 @@ This version fixed some function and performance issues of PaddlePaddle 2.2.0, a

- Add ``paddle.linalg.triangular_solve`` API to calculate linear equations with triangular coefficient matrices. ([#36714](https://github.com/PaddlePaddle/Paddle/pull/36714))
- Add `paddle.device.cuda.graphs.CUDAGraph` API that supports the [CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/) function of NVIDIA by capturing all GPU calculations into a single CUDA Graph and calling them for later use, which not only cuts the extra overhead but also improves the runtime performance. Note that the API is still experimental and not yet stable. ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
- Add``paddle.incubate.graph_send_recv`` API for image learning to reduce the loss of intermediate variables in memory or video memory during message passing. It contains four update modes, namely, SUM, MEAN, MIN, and MAX. ([#37205](https://github.com/PaddlePaddle/Paddle/pull/37205))
- Add``paddle.incubate.graph_send_recv`` API for graph learning to reduce the loss of intermediate variables in memory or video memory during message passing. It contains four update modes, namely, SUM, MEAN, MIN, and MAX. ([#37205](https://github.com/PaddlePaddle/Paddle/pull/37205))
- Add `paddle.incubate.operators.ResNetUnit` API to integrate the convolution, batch normalization, and shortcut/bottleneck operation in the ResNet network. ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))


### (2)Function Optimization

#### API
Expand All @@ -31,32 +29,28 @@ This version fixed some function and performance issues of PaddlePaddle 2.2.0, a
#### IR(Intermediate Representation)

- Dynamic Graph to Static Graph
- When adopting`@paddle.jit.to_static` to decorate single function, `train()、eval()` functions are provided to support the swtich to `train、eval` mode. ([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383))

- When adopting`@paddle.jit.to_static` to decorate single function, `train()、eval()` functions are provided to support the switch to `train、eval` mode. ([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383))

#### Distributed Training

- Optimize the ability of arbitrary cutting and add pipeline training in the heterogeneous parameter server, which enhance training throughput.([#37446](https://github.com/PaddlePaddle/Paddle/pull/37446))


#### Others

- Enhance the out-of-bounds check for the ``index`` of ``paddle.scatter` that causes core dump, and improve the corresponding error message. ([#37431](https://github.com/PaddlePaddle/Paddle/pull/37431))

- Enhance the out-of-bounds check for the ``index`` of ``paddle.scatter` that causes core dump, and improve the corresponding error reporting message. ([#37431](https://github.com/PaddlePaddle/Paddle/pull/37431))

### (3)Performance Optimization

- Optimize `paddle.top_k` by enabling it to choose different implementations according to the size of ``k`` and ``input_width``: cub implementation when k>=75% input_width, otherwise the handwritten kernel implementation.([#37325](https://github.com/PaddlePaddle/Paddle/pull/37325))
- Optimize `paddle.fluid.optimizer.LarsMomentumOptimizer` to improve OP performance by integrating optimizer operator and [CUDA Cooperative Groups](https://developer.nvidia.com/blog/cooperative-groups/). ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))



### (4)Bug Fixes

#### API

- Fix the calculation error of `paddle.nn.ELU` and `paddle.nn.functional.elu` when alpha<0;the error report of`paddle.nn.functional.elu_`when alpha<0 due to its objection to such scenario. ([#37437](https://github.com/PaddlePaddle/Paddle/pull/37437))
- Fix the problem of `out_of_range` when the `paddle.slice` is revesely excuted. ([#37584](https://github.com/PaddlePaddle/Paddle/pull/37584))
- Fix the calculation error of `paddle.nn.ELU` and `paddle.nn.functional.elu` when alpha<0;please note the inplace version:`paddle.nn.functional.elu_` will raise error when alpha<0. ([#37437]
- (https://github.com/PaddlePaddle/Paddle/pull/37437))
- Fix the problem of `out_of_range` when the `paddle.slice` is reversely executed. ([#37584](https://github.com/PaddlePaddle/Paddle/pull/37584))
- `paddle.shape` doesn't support backward, explicitly set ``stop_gradient`` to ``True``. ([#37412](https://github.com/PaddlePaddle/Paddle/pull/37412))
- `paddle.arange` doesn't support backward, explicitly set ``stop_gradient`` to ``True``.([#37486](https://github.com/PaddlePaddle/Paddle/pull/37486))
- `paddle.shard_index` reports an error if the last dimension of the input data is not 1. ([#37421](https://github.com/PaddlePaddle/Paddle/pull/37421))
Expand All @@ -70,7 +64,6 @@ This version fixed some function and performance issues of PaddlePaddle 2.2.0, a
- Fix the issue that one-dimensional `Tensor` reports an exception error of dimension detection when using ellipsis(...) indexing. ([#37192](https://github.com/PaddlePaddle/Paddle/pull/37192))
- Fix the issue that the gradient attribute of`Tensor` cannot be spread during indexing and assignment (`setitem`), see [issue](https://github.com/PaddlePaddle/Paddle/issues/36902) for details. ([#37028](https://github.com/PaddlePaddle/Paddle/pull/37028))


#### IR(Intermediate Representation)

- Dynamic Graph to Static Graph
Expand All @@ -82,22 +75,18 @@ This version fixed some function and performance issues of PaddlePaddle 2.2.0, a
- `fleet.load_model`: Fix the unavailable API loaded by the model in parameter server mode.([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))
- `fleet.save_inference_model`: Fix the issue that the model does not pull parameters from the server side before saving dense parameters in parameter server mode. ([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))


#### Others

- Fix the problem of inplace operation of dynamic graph: after performing inplace operation on a non-leaf node, followed by immediate execution of backward, the gradient of this node and the nodes before is calculated incorrectly. ([#37420](https://github.com/PaddlePaddle/Paddle/pull/37420))




## 4. Paddle Inference

### (1)Bug Fixes

- Further removal of redundant debug logs in the case of clear log closure.([#37212](https://github.com/PaddlePaddle/Paddle/pull/37212))
- Further removal of redundant debug logs in the case of clear log disable.([#37212](https://github.com/PaddlePaddle/Paddle/pull/37212))
- Fix memory/video memory optimization policies to avoid incorrect prediction results or crashes due to improper memory/video memory optimization. ([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324), [#37123](https://github.com/PaddlePaddle/Paddle/pull/37123))
- Fix the scale calculation error in the MultiHead structure of Transformer model after integrating QkvToContextPluginDynamicscale, which is caused by wrong block and thread settings of cuda function. ([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096))
- Register all inference OPs in the function of in8 quantization: Solve the issues that some inference OPs are not registered in int8 quantization due to historical reasons. ([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266
- Register all inference OPs in the function of int8 quantization: Solve the issues that some inference OPs are not registered in int8 quantization due to historical reasons. ([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266))

# 2.2.0 Release Note

Expand Down

0 comments on commit 7c7f6f8

Please sign in to comment.