We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent eb8bea8 commit a574b7bCopy full SHA for a574b7b
chapter_deep-learning-basics/weight-decay.md
@@ -13,7 +13,7 @@ $$\ell(w_1, w_2, b) = \frac{1}{n} \sum_{i=1}^n \frac{1}{2}\left(x_1^{(i)} w_1 +
13
14
为例,其中$w_1, w_2$是权重参数,$b$是偏差参数,样本$i$的输入为$x_1^{(i)}, x_2^{(i)}$,标签为$y^{(i)}$,样本数为$n$。将权重参数用向量$\boldsymbol{w} = [w_1, w_2]$表示,带有$L_2$范数惩罚项的新损失函数为
15
16
-$$\ell(w_1, w_2, b) + \frac{\lambda}{2n} \|\boldsymbol{w}\|^2,$$
+$$\ell(w_1, w_2, b) + \frac{\lambda}{2} \|\boldsymbol{w}\|^2,$$
17
18
其中超参数$\lambda > 0$。当权重参数均为0时,惩罚项最小。当$\lambda$较大时,惩罚项在损失函数中的比重较大,这通常会使学到的权重参数的元素较接近0。当$\lambda$设为0时,惩罚项完全不起作用。上式中$L_2$范数平方$\|\boldsymbol{w}\|^2$展开后得到$w_1^2 + w_2^2$。有了$L_2$范数惩罚项后,在小批量随机梯度下降中,我们将[“线性回归”](linear-regression.md)一节中权重$w_1$和$w_2$的迭代方式更改为
19
0 commit comments