1
- ---
2
- source : https://github.com/d2l-ai/d2l-en/blob/master/chapter_preliminaries/autograd.md
3
- commit : 9e55a9c
4
- ---
5
-
6
1
# Automatic Differentiation
7
2
:label : ` sec_autograd `
8
3
@@ -24,49 +19,42 @@ Automatic differentiation enables the system to subsequently backpropagate gradi
24
19
Here, * backpropagate* simply means to trace through the computational graph,
25
20
filling in the partial derivatives with respect to each parameter.
26
21
27
- ``` {.python .input}
28
- from mxnet import autograd, np, npx
29
- npx.set_np()
30
- ```
31
-
32
- ``` {.python .input}
33
- #@tab pytorch
34
- import torch
35
- ```
36
-
37
- ``` {.python .input}
38
- #@tab tensorflow
39
- import tensorflow as tf
40
- ```
41
22
42
23
## A Simple Example
43
24
44
25
As a toy example, say that we are interested
45
- in differentiating the function
26
+ in ( ** differentiating the function
46
27
$y = 2\mathbf{x}^{\top}\mathbf{x}$
47
- with respect to the column vector $\mathbf{x}$.
28
+ with respect to the column vector $\mathbf{x}$.** )
48
29
To start, let us create the variable ` x ` and assign it an initial value.
49
30
50
31
``` {.python .input}
32
+ from mxnet import autograd, np, npx
33
+ npx.set_np()
34
+
51
35
x = np.arange(4.0)
52
36
x
53
37
```
54
38
55
39
``` {.python .input}
56
40
#@tab pytorch
41
+ import torch
42
+
57
43
x = torch.arange(4.0)
58
44
x
59
45
```
60
46
61
47
``` {.python .input}
62
48
#@tab tensorflow
49
+ import tensorflow as tf
50
+
63
51
x = tf.range(4, dtype=tf.float32)
64
52
x
65
53
```
66
54
67
- Before we even calculate the gradient
55
+ [ ** Before we even calculate the gradient
68
56
of $y$ with respect to $\mathbf{x}$,
69
- we will need a place to store it.
57
+ we will need a place to store it.** ]
70
58
It is important that we do not allocate new memory
71
59
every time we take a derivative with respect to a parameter
72
60
because we will often update the same parameters
@@ -95,7 +83,7 @@ x.grad # The default value is None
95
83
x = tf.Variable(x)
96
84
```
97
85
98
- Now let us calculate $y$.
86
+ ( ** Now let us calculate $y$.** )
99
87
100
88
``` {.python .input}
101
89
# Place our code inside an `autograd.record` scope to build the computational
122
110
Since ` x ` is a vector of length 4,
123
111
an inner product of ` x ` and ` x ` is performed,
124
112
yielding the scalar output that we assign to ` y ` .
125
- Next, we can automatically calculate the gradient of ` y `
126
- with respect to each component of ` x `
113
+ Next, [ ** we can automatically calculate the gradient of ` y `
114
+ with respect to each component of ` x ` ** ]
127
115
by calling the function for backpropagation and printing the gradient.
128
116
129
117
``` {.python .input}
@@ -143,8 +131,8 @@ x_grad = t.gradient(y, x)
143
131
x_grad
144
132
```
145
133
146
- The gradient of the function $y = 2\mathbf{x}^{\top}\mathbf{x}$
147
- with respect to $\mathbf{x}$ should be $4\mathbf{x}$.
134
+ ( ** The gradient of the function $y = 2\mathbf{x}^{\top}\mathbf{x}$
135
+ with respect to $\mathbf{x}$ should be $4\mathbf{x}$.** )
148
136
Let us quickly verify that our desired gradient was calculated correctly.
149
137
150
138
``` {.python .input}
@@ -161,7 +149,7 @@ x.grad == 4 * x
161
149
x_grad == 4 * x
162
150
```
163
151
164
- Now let us calculate another function of ` x ` .
152
+ [ ** Now let us calculate another function of ` x ` .** ]
165
153
166
154
``` {.python .input}
167
155
with autograd.record():
@@ -172,9 +160,9 @@ x.grad # Overwritten by the newly calculated gradient
172
160
173
161
``` {.python .input}
174
162
#@tab pytorch
175
- # PyTorch accumulates the gradient in default, we need to clear the previous
163
+ # PyTorch accumulates the gradient in default, we need to clear the previous
176
164
# values
177
- x.grad.zero_()
165
+ x.grad.zero_()
178
166
y = x.sum()
179
167
y.backward()
180
168
x.grad
@@ -196,13 +184,13 @@ For higher-order and higher-dimensional `y` and `x`,
196
184
the differentiation result could be a high-order tensor.
197
185
198
186
However, while these more exotic objects do show up
199
- in advanced machine learning (including in deep learning),
200
- more often when we are calling backward on a vector,
187
+ in advanced machine learning (including [ ** in deep learning** ] ),
188
+ more often ( ** when we are calling backward on a vector,** )
201
189
we are trying to calculate the derivatives of the loss functions
202
190
for each constituent of a * batch* of training examples.
203
- Here, our intent is not to calculate the differentiation matrix
204
- but rather the sum of the partial derivatives
205
- computed individually for each example in the batch.
191
+ Here, ( ** our intent is** ) not to calculate the differentiation matrix
192
+ but rather ( ** the sum of the partial derivatives
193
+ computed individually for each example** ) in the batch.
206
194
207
195
``` {.python .input}
208
196
# When we invoke `backward` on a vector-valued variable `y` (function of `x`),
@@ -236,8 +224,8 @@ t.gradient(y, x) # Same as `y = tf.reduce_sum(x * x)`
236
224
237
225
## Detaching Computation
238
226
239
- Sometimes, we wish to move some calculations
240
- outside of the recorded computational graph.
227
+ Sometimes, we wish to [ ** move some calculations
228
+ outside of the recorded computational graph.** ]
241
229
For example, say that ` y ` was calculated as a function of ` x ` ,
242
230
and that subsequently ` z ` was calculated as a function of both ` y ` and ` x ` .
243
231
Now, imagine that we wanted to calculate
@@ -309,10 +297,10 @@ t.gradient(y, x) == 2 * x
309
297
## Computing the Gradient of Python Control Flow
310
298
311
299
One benefit of using automatic differentiation
312
- is that even if building the computational graph of a function
313
- required passing through a maze of Python control flow
300
+ is that [ ** even if** ] building the computational graph of ( ** a function
301
+ required passing through a maze of Python control flow** )
314
302
(e.g., conditionals, loops, and arbitrary function calls),
315
- we can still calculate the gradient of the resulting variable.
303
+ ( ** we can still calculate the gradient of the resulting variable.** )
316
304
In the following snippet, note that
317
305
the number of iterations of the ` while ` loop
318
306
and the evaluation of the ` if ` statement
0 commit comments