Skip to content

Commit

Permalink
Update legacy links to latest repo (PaddlePaddle#5749)
Browse files Browse the repository at this point in the history
  • Loading branch information
YangQun1 authored Mar 22, 2023
1 parent 3f9af5f commit 68b54ca
Show file tree
Hide file tree
Showing 18 changed files with 34 additions and 34 deletions.
6 changes: 3 additions & 3 deletions docs/design/algorithm/parameter_average.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ In the new design, we propose to create a new operation for averaging parameter

The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.

The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) in Python API.
The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#layer-function) in Python API.

### Python API implementation for ParameterAverageOptimizer

Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:
- Any optimizer (RMSProp , AdaGrad etc.)
- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.

Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.cc)

#### Creation of the ParameterAverageOptimizer operator
Expand All @@ -71,4 +71,4 @@ The proposal is to add the op immediately while building the computation graph.

#### High-level API

In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.
4 changes: 2 additions & 2 deletions docs/design/concepts/block.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ if (cond) {

```
An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/execution/if_else_op.md) is as follows:
An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/execution/if_else_op.md) is as follows:
```python
import paddle as pd
Expand All @@ -140,7 +140,7 @@ The difference is that variables in the C++ program contain scalar values, where

### Blocks with `for` and `RNNOp`

The following RNN model in PaddlePaddle from the [RNN design doc](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/dynamic_rnn/rnn_design_en.md) :
The following RNN model in PaddlePaddle from the [RNN design doc](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dynamic_rnn/rnn_design_en.md) :

```python
x = sequence([10, 20, 30]) # shape=[None, 1]
Expand Down
2 changes: 1 addition & 1 deletion docs/design/concepts/executor.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Executor Design Doc

## Motivation
In [fluid](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/motivation/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
In [fluid](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/motivation/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.
Expand Down
4 changes: 2 additions & 2 deletions docs/design/concepts/program.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

A PaddlePaddle program consists of two parts -- the first generates a `ProgramDesc` protobuf message that describes the program, and the second runs this message using a C++ class `Executor`.

A simple example PaddlePaddle program can be found in [graph.md](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/others/graph.md):
A simple example PaddlePaddle program can be found in [graph.md](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/others/graph.md):

```python
x = layer.data("images")
Expand All @@ -22,7 +22,7 @@ The first five lines of the following PaddlePaddle program generates, or, compil
The basic structure of a PaddlePaddle program is some nested blocks, as a C++ or Java program.

- program: some nested blocks
- [block](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/block.md):
- [block](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/block.md):
- some local variable definitions, and
- a sequence of operators

Expand Down
6 changes: 3 additions & 3 deletions docs/design/data_type/float16.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ float half_to_float(float16 h);
which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. If the hardware or compiler level does not support float32 to float16 conversion, software emulation will be performed to do the conversion.

## float16 inference
In Fluid, a neural network is represented as a protobuf message called [ProgramDesc](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/program.md), whose Python wrapper is a [Program](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#program). The basic structure of a program is some nested [blocks](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#block), where each block consists of some [variable](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#variable) definitions and a sequence of [operators](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#operator). An [executor](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/executor.md) will run a given program desc by executing the sequence of operators in the entrance block of the program one by one.
In Fluid, a neural network is represented as a protobuf message called [ProgramDesc](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/program.md), whose Python wrapper is a [Program](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#program). The basic structure of a program is some nested [blocks](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#block), where each block consists of some [variable](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#variable) definitions and a sequence of [operators](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#operator). An [executor](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/executor.md) will run a given program desc by executing the sequence of operators in the entrance block of the program one by one.

### Operator level requirement
Each operator has many kernels for different data types, devices, and library types. The operator will select the appropriate kernel to run based on, among other things, the data type of the input variables. By default, every Fluid operator has a float data type kernel that takes float variables as input and generates float output.
Expand All @@ -108,7 +108,7 @@ The same principle applies if we want a program to run in float16 mode. We provi
So the preliminary requirement for float16 inference is to add float16 kernel to operators that are needed in a specific kind of program. For example, float16 inference on an image classification neural network like Vgg or Resnet, typically requires the following operators to have float16 kernels: convolution, pooling, multiplication, addition, batch norm, dropout, relu, and softmax. Please refer to [new_op_en](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/dev/new_op_en.md) for details of how to add new kernels to an operator.

### Variable level requirement
Operators including convolution and multiplication (used in fully-connected layers) takes as input not only the variables generated by the preceding operators but also [parameter](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/python_api.md#parameter) variables, which contains the trained weights to apply to the input data. These weights are obtained in the Fluid training process and are by default of float data type.
Operators including convolution and multiplication (used in fully-connected layers) takes as input not only the variables generated by the preceding operators but also [parameter](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/python_api.md#parameter) variables, which contains the trained weights to apply to the input data. These weights are obtained in the Fluid training process and are by default of float data type.

When these operators are running in float16 mode, the float16 kernel requires those parameter variables to contain weights of Fluid float16 data type. Thus, we need a convenient way to convert the original float weights to float16 weights.

Expand Down Expand Up @@ -137,7 +137,7 @@ This problem can be solved by introducing a type-casting operator which takes an
### float16 transpiler
Put all the above requirements in mind, we designed a float16 inference transpiler that can tranpile a float32 mode inference program desc to a float16 mode one.

Given a float inference program and the corresponding variables of float32 weights in the [scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md),
Given a float inference program and the corresponding variables of float32 weights in the [scope](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/scope.md),
this transpiler mainly does the following modifications:

1. Insert cast operators at the beginning of the program so that the input float data will be converted to float16 data type before feeding to subsequent operators to invoke the float16 kernel.
Expand Down
6 changes: 3 additions & 3 deletions docs/design/dist_train/distributed_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ The user can not directly specify the parameter update rule for the parameter se

This could be fixed by making the parameter server also run an IR, which can be different to the trainer side
For a detailed explanation, refer to this document -
[Design Doc: Parameter Server](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/dist_train/parameter_server.md)
[Design Doc: Parameter Server](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/parameter_server.md)

## Distributed Training Architecture

Expand Down Expand Up @@ -97,9 +97,9 @@ The code above is a typical local training program, the "Training Program" is bu
`fluid.layer.fc`. The training is done by calling `Executor.run`
iteratively.

For more details, the implementation of IR is [Program](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/program.md), and `ProgramDesc` is the protobuf type.
For more details, the implementation of IR is [Program](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/program.md), and `ProgramDesc` is the protobuf type.

[Executor](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/executor.md) simply runs the `ProgramDesc`. For local training you generally use
[Executor](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/executor.md) simply runs the `ProgramDesc`. For local training you generally use
`Executor` to run the program locally. For any kind of distributed training, you can use
`RemoteExecutor` to specify desired distributed training method with some optional arguments.

Expand Down
2 changes: 1 addition & 1 deletion docs/design/dist_train/distributed_lookup_table_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ operator: ![lookup table training](./src/lookup_table_training.png)

### Solution: Distributed storage

1. Paddle use [SelectedRows](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/selected_rows.md) as the storage format for the lookup table, the lookup table parameter will be split to multi-machine according to the hash of the feature ID, and data will also be split and send to the same machine to prefetch the parameter.
1. Paddle use [SelectedRows](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/selected_rows.md) as the storage format for the lookup table, the lookup table parameter will be split to multi-machine according to the hash of the feature ID, and data will also be split and send to the same machine to prefetch the parameter.

1. For common parameters, the trainer will get the whole parameter for training, but for the big lookup table, the trainer can not store the whole parameter. Because the input data feature is very sparse, every time we only need a few parameters for training, so we use `prefetch_op` to only prefetch the parameter needed to trainer.

Expand Down
2 changes: 1 addition & 1 deletion docs/design/dist_train/parameter_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ For embedding layers, the gradient may have many rows containing only 0 when tra
if the gradient uses a dense tensor to do parameter optimization,
it could spend unnecessary memory, slow down the calculations and waste
the bandwidth while doing distributed training.
In Fluid, we introduce [SelectedRows](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/selected_rows.md) to represent a list of rows containing
In Fluid, we introduce [SelectedRows](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/selected_rows.md) to represent a list of rows containing
non-zero gradient data. So when we do parameter optimization both locally and remotely,
we only need to send those non-zero rows to the optimizer operators:

Expand Down
2 changes: 1 addition & 1 deletion docs/design/dynamic_rnn/rnn.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ or copy the memory value of the previous step to the current ex-memory variable.
### Usage in Python
For more information on Block, please refer to the [design doc](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/block.md).
For more information on Block, please refer to the [design doc](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/block.md).
We can define an RNN's step-net using a Block:
Expand Down
6 changes: 3 additions & 3 deletions docs/design/modules/python_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Please be aware that these Python classes need to maintain some construction-tim

### Program

A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/program.md), which is composed of an array of `BlockDesc`s. The `BlockDesc`s in a `ProgramDesc` can have a tree-like hierarchical structure. However, the `ProgramDesc` onlys stores a flattened array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks.
A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/program.md), which is composed of an array of `BlockDesc`s. The `BlockDesc`s in a `ProgramDesc` can have a tree-like hierarchical structure. However, the `ProgramDesc` onlys stores a flattened array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks.

Whenever we create a block, we need to set its parent block to the current block, hence the Python class `Program` needs to maintain a data member `current_block`.

Expand Down Expand Up @@ -70,7 +70,7 @@ class Program(objects):

### Block

A [Block](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/block.md) includes
A [Block](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/concepts/block.md) includes

1. a map from variable names to an instance of the Python `Variable` class, and
1. a list of `Operator` instances.
Expand Down Expand Up @@ -322,4 +322,4 @@ executor.run(fetch_list=[hidden.param, hidden.param.grad], ...)

## Optimizer

[Optimizer Design Doc](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/modules/optimizer.md)
[Optimizer Design Doc](https://github.com/PaddlePaddle/docs/blob/develop/docs/design/modules/optimizer.md)
Loading

0 comments on commit 68b54ca

Please sign in to comment.