forked from microsoft/ai-edu
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fix formula bug * jku * uwye * hduye * Update Level7_BiRnn_MNIST.py * fix bug * huyy * dkdkie * finish bi-rnn
- Loading branch information
Showing
113 changed files
with
1,090 additions
and
1,272 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
<!--Copyright © Microsoft Corporation. All rights reserved. | ||
适用于[License](https://github.com/Microsoft/ai-edu/blob/master/LICENSE.md)版权许可--> | ||
|
||
## 1.1 基本函数导数公式 | ||
|
||
### 1.1.1 基本函数及其导数 | ||
|
||
|公式序号|函数|导数|备注| | ||
|---|---|---|---| | ||
|1|$y=c$|$y'=0$| | ||
|2|$y=x^a$|$y'=ax^{a-1}$| | ||
|3|$y=log_ax$|$y'=\frac{1}{x}log_ae=\frac{1}{xlna}$| | ||
|4|$y=lnx$|$y'=\frac{1}{x}$| | ||
|5|$y=a^x$|$y'=a^xlna$| | ||
|6|$y=e^x$|$y'=e^x$| | ||
|7|$y=e^{-x}$|$y'=-e^{-x}$| | ||
|8|$y=sin(x)$|$y'=cos(x)$|正弦函数| | ||
|9|$y=cos(x)$|$y'=-sin(x)$|余弦函数| | ||
|10|$y=tg(x)$|$y'=sec^2(x)=\frac{1}{cos^2x}$| 正切函数 | | ||
|11|$y=ctg(x)$|$y'=-csc^2(x)$| 余切函数 | | ||
|12|$y=arcsin(x)$|$y'=\frac{1}{\sqrt{1-x^2}}$| 反正弦函数 | | ||
|13|$y=arccos(x)$|$y'=-\frac{1}{\sqrt{1-x^2}}$| 反余弦函数 | | ||
|14|$y=arctan(x)$|$y'=\frac{1}{1+x^2}$| 反正切函数 | | ||
|15|$y=arcctg(x)$|$y'=-\frac{1}{1+x^2}$| 反余切函数 | | ||
|16|$y=sinh(x)=(e^x-e^{-x})/2$|$y'=cosh(x)$|双曲正弦函数 | | ||
|17|$y=cosh(x)=(e^x+e^{-x})/2$|$y'=sinh(x)$|双曲余弦函数 | | ||
|18|$y=tanh(x)=(e^x-e^{-x})/(e^x+e^{-x})$|$y'=sech^2(x)=1-tanh^2(x)$|双曲正切函数| | ||
|19|$y=coth(x)=(e^x+e^{-x})/(e^x-e^{-x})$|$y'=-csch^2(x)$|双曲余切函数| | ||
|20|$y=sech(x)=2/(e^x+e^{-x})$|$y'=-sech(x)*tanh(x)$|双曲正割函数| | ||
|21|$y=csch(x)=2/(e^x-e^{-x})$|$y'=-csch(x)*coth(x)$| 双曲余割函数| | ||
|
||
### 1.1.2 导数四则运算 | ||
|
||
$$[u(x) + v(x)]' = u'(x) + v'(x) \tag{30}$$ | ||
$$[u(x) - v(x)]' = u'(x) - v'(x) \tag{31}$$ | ||
$$[u(x)*v(x)]' = u'(x)*v(x) + v'(x)*u(x) \tag{32}$$ | ||
$$[\frac{u(x)}{v(x)}]'=\frac{u'(x)v(x)-v'(x)u(x)}{v^2(x)} \tag{33}$$ | ||
|
||
### 1.1.3 偏导数 | ||
|
||
如$Z=f(x,y)$,则Z对x的偏导可以理解为当y是个常数时,Z单独对x求导: | ||
|
||
$$Z'_x=f'_x(x,y)=\frac{\partial{Z}}{\partial{x}} \tag{40}$$ | ||
|
||
则Z对y的偏导可以理解为当x是个常数时,Z单独对y求导: | ||
|
||
$$Z'_y=f'_y(x,y)=\frac{\partial{Z}}{\partial{y}} \tag{41}$$ | ||
|
||
在二元函数中,偏导的何意义,就是对任意的$y=y_0$的取值,在二元函数曲面上做一个$y=y_0$切片,得到$Z = f(x, y_0)$的曲线,这条曲线的一阶导数就是Z对x的偏导。对$x=x_0$同样,就是Z对y的偏导。 | ||
|
||
### 1.1.4 复合函数求导(链式法则) | ||
|
||
- 如果 $y=f(u), u=g(x)$ 则: | ||
|
||
$$y'_x = f'(u) \cdot u'(x) = y'_u \cdot u'_x=\frac{dy}{du} \cdot \frac{du}{dx} \tag{50}$$ | ||
|
||
- 如果$y=f(u),u=g(v),v=h(x)$ 则: | ||
|
||
$$ | ||
\frac{dy}{dx}=f'(u) \cdot g'(v) \cdot h'(x)=\frac{dy}{du} \cdot \frac{du}{dv} \cdot \frac{dv}{dx} \tag{51} | ||
$$ | ||
|
||
- 如$Z=f(U,V)$,通过中间变量$U = g(x,y), V=h(x,y)$成为x,y的复合函数$Z=f[g(x,y),h(x,y)]$ 则: | ||
|
||
$$ | ||
\frac{\partial{Z}}{\partial{x}}=\frac{\partial{Z}}{\partial{U}} \cdot \frac{\partial{U}}{\partial{x}} + \frac{\partial{Z}}{\partial{V}} \cdot \frac{\partial{V}}{\partial{x}} \tag{52} | ||
$$ | ||
|
||
$$ | ||
\frac{\partial{Z}}{\partial{y}}=\frac{\partial{Z}}{\partial{U}} \cdot \frac{\partial{U}}{\partial{y}} + \frac{\partial{Z}}{\partial{V}} \cdot \frac{\partial{V}}{\partial{y}} | ||
$$ | ||
|
||
### 1.1.5 矩阵求导 | ||
|
||
如$A,B,X$都是矩阵,则: | ||
|
||
$$ | ||
B\frac{\partial{(AX)}}{\partial{X}} = A^TB \tag{60} | ||
$$ | ||
|
||
$$ | ||
B\frac{\partial{(XA)}}{\partial{X}} = BA^T \tag{61} | ||
$$ | ||
|
||
$$ | ||
\frac{\partial{(X^TA)}}{\partial{X}} = \frac{\partial{(A^TX)}}{\partial{X}}=A \tag{62} | ||
$$ | ||
|
||
$$ | ||
\frac{\partial{(A^TXB)}}{\partial{X}} = AB^T \tag{63} | ||
$$ | ||
|
||
$$ | ||
\frac{\partial{(A^TX^TB)}}{\partial{X}} = BA^T, {dX^TAX \over dX} = (A+A^T)X \tag{64} | ||
$$ | ||
|
||
$${dX^T \over dX} = I, {dX \over dX^T} = I, {dX^TX \over dX}=2X\tag{65}$$ | ||
|
||
$${du \over dX^T} = ({du^T \over dX})^T$$ | ||
|
||
$${du^Tv \over dx} = {du^T \over dx}v + {dv^T \over dx}u^T, {duv^T \over dx} = {du \over dx}v^T + u{dv^T \over dx} \tag{66}$$ | ||
|
||
$${dAB \over dX} = {dA \over dX}B + A{dB \over dX} \tag{67}$$ | ||
|
||
$${du^TXv \over dx}=uv^T, {du^TX^TXu \over dX}=2Xuu^T \tag{68}$$ | ||
|
||
$${d[(Xu-v)^T(Xu-v)] \over dX}=2(Xu-v)u^T \tag{69}$$ | ||
|
||
### 1.1.6 标量对矩阵导数的定义 | ||
|
||
假定$y$是一个标量,$X$是一个$N \times M$大小的矩阵,有$y=f(X)$, $f$是一个函数。我们来看$df$应该如何计算。 | ||
|
||
首先给出定义: | ||
|
||
$$ | ||
df = \sum_j^M\sum_i^N \frac{\partial{f}}{\partial{x_{ij}}}dx_{ij} | ||
$$ | ||
|
||
下面我们引入矩阵迹的概念,所谓矩阵的迹,就是矩阵对角线元素之和。也就是说: | ||
|
||
$$ | ||
tr(X) = \sum_i x_{ii} | ||
$$ | ||
|
||
引入迹的概念后,我们来看上面的梯度计算是不是可以用迹来表达呢? | ||
|
||
$$ | ||
\frac{\partial{f}}{\partial{X}} = | ||
\begin{pmatrix} | ||
\frac{\partial{f}}{\partial{x_{11}}} & \frac{\partial{f}}{\partial{x_{12}}} & \dots & \frac{\partial{f}}{\partial{x_{1M}}} \\ | ||
\frac{\partial{f}}{\partial{x_{21}}} & \frac{\partial{f}}{\partial{x_{22}}} & \dots & \frac{\partial{f}}{\partial{x_{2M}}} \\ | ||
\vdots & \vdots & \ddots & \vdots \\ | ||
\frac{\partial{f}}{\partial{x_{N1}}} & \frac{\partial{f}}{\partial{x_{N2}}} & \dots & \frac{\partial{f}}{\partial{x_{NM}}} | ||
\end{pmatrix} \tag{90} | ||
$$ | ||
|
||
$$ | ||
dX = | ||
\begin{pmatrix} | ||
dx_{11} & d{x_{12}} & \dots & d{x_{1M}} \\ | ||
d{x_{21}} & d{x_{22}} & \dots & d{x_{2M}} \\ | ||
\vdots & \vdots & \ddots & \vdots \\ | ||
d{x_{N1}} & d{x_{N2}} & \dots & d{x_{NM}} | ||
\end{pmatrix} \tag{91} | ||
$$ | ||
|
||
我们来看矩阵$(90)$的转置和矩阵$(91)$乘积的对角线元素 | ||
|
||
$$ | ||
((\frac{\partial f}{\partial X})^T dX)_{jj}=\sum_i^N \frac{\partial f}{\partial x_{ij}} dx_{ij} | ||
$$ | ||
|
||
因此, | ||
|
||
$$ | ||
tr({(\frac{\partial{f}}{\partial{X}})}^TdX) = \sum_j^M\sum_i^N\frac{\partial{f}}{\partial{x_{ij}}}dx_{ij} = df = tr(df) \tag{92} | ||
$$ | ||
|
||
上式的最后一个等号是因为$df$是一个标量,标量的迹就等于其本身。 | ||
|
||
### 1.1.7 矩阵迹和导数的部分性质 | ||
|
||
这里将会给出部分矩阵的迹和导数的性质,作为后面推导过程的参考。性子急的同学可以姑且默认这是一些结论。 | ||
|
||
$$ | ||
d(X + Y) = dX + dY \tag{93} | ||
$$ | ||
$$ | ||
d(XY) = (dX)Y + X(dY)\tag{94} | ||
$$ | ||
$$ | ||
dX^T = {(dX)}^T \tag{95} | ||
$$ | ||
$$ | ||
d(tr(X)) = tr(dX) \tag{96} | ||
$$ | ||
$$ | ||
d(X \odot Y) = dX \odot Y + X \odot dY \tag{97} | ||
$$ | ||
$$ | ||
d(f(X)) = f^{'}(X) \odot dX \tag{98} | ||
$$ | ||
$$ | ||
tr(XY) = tr(YX) \tag{99} | ||
$$ | ||
$$ | ||
tr(A^T (B \odot C)) = tr((A \odot B)^T C) \tag{100} | ||
$$ | ||
|
||
以上各性质的证明方法类似,我们选取式(94)作为证明的示例: | ||
|
||
$$ | ||
Z = XY | ||
$$ | ||
|
||
则Z中的任意一项是 | ||
|
||
$$ | ||
z_{ij} = \sum_k x_{ik}y_{kj} | ||
$$ | ||
$$ | ||
dz_{ij} = \sum_k d(x_{ik}y_{kj}) | ||
$$ | ||
$$ | ||
= \sum_k (dx_{ik}) y_{kj} + \sum_k x_{ik} (dy_{kj}) | ||
$$ | ||
$$ | ||
=dX_{ij} \cdot Y_{ij} + X_{ij} \cdot dY_{ij} | ||
$$ | ||
从上式可见,$dZ$的每一项和$(dX)Y + X(dY)$的每一项都是相等的。因此,可以得出式(94)成立。 | ||
|
||
|
||
### 参考资料 | ||
|
||
[矩阵求导术](https://zhuanlan.zhihu.com/p/24709748) |
172 changes: 172 additions & 0 deletions
172
B-教学案例与实践/B6-神经网络基本原理简明教程/Appendix/14.1-神经网络反向传播四大公式.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
<!--Copyright © Microsoft Corporation. All rights reserved. | ||
适用于[License](https://github.com/Microsoft/ai-edu/blob/master/LICENSE.md)版权许可--> | ||
|
||
|
||
## 1.2 反向传播四大公式推导 | ||
|
||
著名的反向传播四大公式是: | ||
|
||
$$\delta^{L} = \nabla_{a}C \odot \sigma_{'}(Z^L) \tag{80}$$ | ||
$$\delta^{l} = ((W^{l + 1})^T\delta^{l+1})\odot\sigma_{'}(Z^l) \tag{81}$$ | ||
$$\frac{\partial{C}}{\partial{b_j^l}} = \delta_j^l \tag{82}$$ | ||
$$\frac{\partial{C}}{\partial{w_{jk}^{l}}} = a_k^{l-1}\delta_j^l \tag{83}$$ | ||
|
||
### 1.2.1 直观理解反向传播四大公式 | ||
|
||
下面我们用一个简单的两个神经元的全连接神经网络来直观解释一下这四个公式, | ||
|
||
<img src="../Images/14/bp.png" /> | ||
|
||
图14- | ||
|
||
每个结点的输入输出标记如图上所示,使用MSE作为计算loss的函数,那么可以得到这张计算图中的计算过公式如下所示: | ||
|
||
$$e_{01} = \frac{1}{2}(y-a_1^3)^2$$ | ||
$$a_1^3 = sigmoid(z_1^3)$$ | ||
$$z_1^3 = (w_{11}^2 \cdot a_1^2 + w_{12}^2 \cdot a_2^2 + b_1^3)$$ | ||
$$a_1^2 = sigmoid(z_1^2)$$ | ||
$$z_1^2 = (w_{11}^1 \cdot a_1^1 + w_{12}^1 \cdot a_2^1 + b_1^2)$$ | ||
|
||
我们按照反向传播中梯度下降的原理来对损失求梯度,计算过程如下: | ||
|
||
$$\frac{\partial{e_{o1}}}{\partial{w_{11}^2}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{w_{11}^2}}=\frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}a_{1}^2$$ | ||
|
||
$$\frac{\partial{e_{o1}}}{\partial{w_{12}^2}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{w_{12}^2}}=\frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}a_{2}^2$$ | ||
|
||
$$\frac{\partial{e_{o1}}}{\partial{w_{11}^1}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{a_{1}^2}}\frac{\partial{a_{1}^2}}{\partial{z_{1}^2}}\frac{\partial{z_{1}^2}}{\partial{w_{11}^1}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{a_{1}^2}}\frac{\partial{a_{1}^2}}{\partial{z_{1}^2}}a_1^1$$ | ||
|
||
$$=\frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}w_{11}^2\frac{\partial{a_{1}^2}}{\partial{z_{1}^2}}a_1^1$$ | ||
|
||
$$\frac{\partial{e_{o1}}}{\partial{w_{12}^1}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{a_{2}^2}}\frac{\partial{a_{2}^2}}{\partial{z_{1}^2}}\frac{\partial{z_{1}^2}}{\partial{w_{12}^1}} = \frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}\frac{\partial{z_{1}^3}}{\partial{a_{2}^2}}\frac{\partial{a_{2}^2}}{\partial{z_{1}^2}}a_2^2$$ | ||
|
||
$$=\frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}w_{12}^2\frac{\partial{a_{2}^2}}{\partial{z_{1}^2}}a_2^2$$ | ||
|
||
上述式中,$\frac{\partial{a}}{\partial{z}}$是激活函数的导数,即$\sigma^{'}(z)$项。观察到在求偏导数过程中有共同项$\frac{\partial{e_{o1}}}{\partial{a_{1}^3}}\frac{\partial{a_{1}^3}}{\partial{z_{1}^3}}$,采用$\delta$符号记录,用矩阵形式表示, | ||
即: | ||
|
||
$$\delta^L = [\frac{\partial{e_{o1}}}{\partial{a_{i}^L}}\frac{\partial{a_{i}^L}}{\partial{z_{i}^L}}] = \nabla_{a}C\odot\sigma^{'}(Z^L)$$ | ||
|
||
上述式中,$[a_i]$表示一个元素是a的矩阵,$\nabla_{a}C$表示将损失$C$对$a$求梯度,$\odot$表示矩阵element wise的乘积(也就是矩阵对应位置的元素相乘)。 | ||
|
||
从上面的推导过程中,我们可以得出$\delta$矩阵的递推公式: | ||
|
||
$$\delta^{L-1} = (W^L)^T[\frac{\partial{e_{o1}}}{\partial{a_{i}^L}}\frac{\partial{a_{i}^L}}{\partial{z_{i}^L}}]\odot\sigma^{'}(Z^{L - 1})$$ | ||
|
||
所以在反向传播过程中只需要逐层利用上一层的$\delta^l$进行递推即可。 | ||
|
||
相对而言,这是一个非常直观的结果,这份推导过程也是不严谨的。下面,我们会从比较严格的数学定义角度进行推导,首先要补充一些定义。 | ||
|
||
|
||
### 1.2.2 神经网络有关公式证明 | ||
|
||
+ 首先,来看一个通用情况,已知$f = A^TXB$,$A,B$是常矢量,希望得到$\frac{\partial{f}}{\partial{X}}$,推导过程如下 | ||
|
||
根据式(94), | ||
|
||
$$ | ||
df = d(A^TXB) = d(A^TX)B + A^TX(dB) = d(A^TX)B + 0 = d(A^T)XB+A^TdXB = A^TdXB | ||
$$ | ||
|
||
由于$df$是一个标量,标量的迹等于本身,同时利用公式(99): | ||
|
||
$$ | ||
df = tr(df) = tr(A^TdXB) = tr(BA^TdX) | ||
$$ | ||
|
||
由于公式(92): | ||
|
||
$$ | ||
tr(df) = tr({(\frac{\partial{f}}{\partial{X}})}^TdX) | ||
$$ | ||
|
||
可以得到: | ||
|
||
$$ | ||
(\frac{\partial{f}}{\partial{X}})^T = BA^T | ||
$$ | ||
$$ | ||
\frac{\partial{f}}{\partial{X}} = AB^T \tag{101} | ||
$$ | ||
|
||
+ 我们来看全连接层的情况: | ||
|
||
$$ Y = WX + B$$ | ||
|
||
取全连接层其中一个元素 | ||
|
||
$$ y = wX + b$$ | ||
|
||
这里的$w$是权重矩阵的一行,尺寸是$1 \times M$,X是一个大小为$M \times 1$的矢量,y是一个标量,若添加一个大小是1的单位阵,上式整体保持不变: | ||
|
||
$$ y = (w^T)^TXI + b$$ | ||
|
||
利用式(92),可以得到 | ||
|
||
$$ \frac{\partial{y}}{\partial{X}} = I^Tw^T = w^T$$ | ||
|
||
因此在误差传递的四大公式中,在根据上层传递回来的误差$\delta$继续传递的过程中,利用链式法则,有 | ||
|
||
$$\delta^{L-1} = (W^L)^T \delta^L \odot \sigma^{'}(Z^{L - 1})$$ | ||
|
||
同理,若将$y=wX+b$视作: | ||
|
||
$$ y = IwX + b $$ | ||
|
||
那么利用式(92),可以得到: | ||
|
||
$$ \frac{\partial{y}}{\partial{w}} = X^T$$ | ||
|
||
+ 使用softmax和交叉熵来计算损失的情况下: | ||
|
||
$$ l = - Y^Tlog(softmax(Z))$$ | ||
|
||
式中,$y$是数据的标签,$Z$是网络预测的输出,$y$和$Z$的维度是$N \times 1$。经过softmax处理作为概率。希望能够得到$\frac{\partial{l}}{\partial{Z}}$,下面是推导的过程: | ||
|
||
$$ | ||
softmax(Z) = \frac{exp(Z)}{\boldsymbol{1}^Texp(Z)} | ||
$$ | ||
|
||
其中, $\boldsymbol{1}$是一个维度是$N \times 1$的全1向量。将softmax表达式代入损失函数中,有 | ||
|
||
$$ | ||
dl = -Y^T d(log(softmax(Z)))\\ | ||
= -Y^T d (log\frac{exp(Z)}{\boldsymbol{1}^Texp(Z)}) \\ | ||
= -Y^T dZ + Y^T \boldsymbol{1}d(log(\boldsymbol{1}^Texp(Z))) \tag{102} | ||
$$ | ||
|
||
下面来化简式(102)的后半部分,利用式(98): | ||
|
||
$$ | ||
d(log(\boldsymbol{1}^Texp(Z))) = log^{'}(\boldsymbol{1}^Texp(Z)) \odot dZ | ||
= \frac{\boldsymbol{1}^T(exp(Z)\odot dZ)}{\boldsymbol{1}^Texp(Z)} | ||
$$ | ||
|
||
利用式(100),可以得到 | ||
|
||
$$ | ||
tr(Y^T \boldsymbol{1}\frac{\boldsymbol{1}^T(exp(Z)\odot dZ)}{\boldsymbol{1}^Texp(Z)}) = | ||
tr(Y^T \boldsymbol{1}\frac{(\boldsymbol{1} \odot (exp(Z))^T dZ)}{\boldsymbol{1}^Texp(Z)}) | ||
$$ | ||
$$ = | ||
tr(Y^T \boldsymbol{1}\frac{exp(Z)^T dZ}{\boldsymbol{1}^Texp(Z)}) = tr(Y^T \boldsymbol{1} softmax(Z)^TdZ) \tag{103} | ||
$$ | ||
|
||
将式(103)代入式(102)并两边取迹,可以得到: | ||
|
||
$$ | ||
dl = tr(dl) = tr(-y^T dZ + y^T\boldsymbol{1}softmax(Z)^TdZ) = tr((\frac{\partial{l}}{\partial{Z}})^TdZ) | ||
$$ | ||
|
||
在分类问题中,一个标签中只有一项会是1,所以$Y^T\boldsymbol{1} = 1$,因此有 | ||
|
||
$$ | ||
\frac{\partial{l}}{\partial{Z}} = softmax(Z) - Y | ||
$$ | ||
|
||
这也就是在损失函数中计算反向传播的误差的公式。 | ||
|
||
|
||
### 参考资料 | ||
|
||
[矩阵求导术](https://zhuanlan.zhihu.com/p/24709748) | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+30.4 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_24_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+30.8 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_24_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+31.5 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_24_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+30.8 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_24_8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed
BIN
-33 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_48_1.png
Binary file not shown.
Binary file removed
BIN
-33 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_48_2.png
Binary file not shown.
Binary file removed
BIN
-33 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_48_4.png
Binary file not shown.
Binary file removed
BIN
-33 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/deeprnn_pm25_fitting_result_48_8.png
Binary file not shown.
Binary file modified
BIN
-37 Bytes
(100%)
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/name_classifier_best_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-48 Bytes
(100%)
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/name_classifier_last_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+19.7 KB
(150%)
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/name_classifier_loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file removed
BIN
-20.3 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/pm25_classifier_result_72_1.png
Binary file not shown.
Binary file removed
BIN
-19.6 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/pm25_classifier_result_72_2.png
Binary file not shown.
Binary file not shown.
Binary file removed
BIN
-19.8 KB
B-教学案例与实践/B6-神经网络基本原理简明教程/Images/19/pm25_classifier_result_72_8.png
Diff not rendered.
Oops, something went wrong.
Diff not rendered.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file not shown.
Oops, something went wrong.