|
1 | 1 | <a name="nn.Criterions"/>
|
2 | 2 | # Criterions #
|
3 | 3 |
|
4 |
| -Criterions are helpful to train a neural network. Given an input and a |
5 |
| -target, they compute a gradient according to a given loss |
6 |
| -function. [AbsCriterion](#nn.AbsCriterion) and |
7 |
| -[MSECriterion](#nn.MSECriterion) are perfect for regression problems, while |
8 |
| -[ClassNLLCriterion](#nn.ClassNLLCriterion) or |
9 |
| -[CrossEntropyCriterion](#nn.CrossEntropyCriterion) are the criteria of |
10 |
| -choice when dealing with classification. |
11 |
| - |
12 |
| -Criterions are [serializable](https://github.com/torch/torch7/blob/master/doc/file.md#serialization-methods). |
| 4 | +[Criterions](#nn.Criterion) are helpful to train a neural network. Given an input and a |
| 5 | +target, they compute a gradient according to a given loss function. |
| 6 | + |
| 7 | + * Classification criterions : |
| 8 | + * [BCECriterion](#nn.BCECriterion) : binary cross-entropy (two-class version of [ClassNLLCriterion](#nn.ClassNLLCriterion)); |
| 9 | + * [ClassNLLCriterion](#nn.ClassNLLCriterion) : negative log-likelihood for [LogSoftMax](transfer.md#nn.LogSoftMax) (multi-class); |
| 10 | + * [CrossEntropyCriterion](#nn.CrossEntropyCriterion) : combines [LogSoftMax](transfer.md#nn.LogSoftMax) and [ClassNLLCriterion](#nn.ClassNLLCriterion); |
| 11 | + * [MarginCriterion](#nn.MarginCriterion) : two class margin-based loss; |
| 12 | + * [MultiMarginCriterion](#nn.MultiMarginCriterion) : multi-class margin-based loss; |
| 13 | + * [MultiLabelMarginCriterion](#nn.MultiLabelMarginCriterion) : multi-class multi-classification margin-based loss; |
| 14 | + * Regression criterions : |
| 15 | + * [AbsCriterion](#nn.AbsCriterion) : measures the mean absolute value of the element-wise difference between input; |
| 16 | + * [MSECriterion](#nn.MSECriterion) : mean square error (a classic); |
| 17 | + * [DistKLDivCriterion](#nn.DistKLDivCriterion) : Kullback–Leibler divergence (for fitting continuous probability distributions); |
| 18 | + * Embedding criterions (measuring whether two inputs are similar or dissimilar): |
| 19 | + * [HingeEmbeddingCriterion](#nn.HingeEmbeddingCriterion) : takes a distance as input ; |
| 20 | + * [L1HingeEmbeddingCriterion](#nn.L1HingeEmbeddingCriterion) : L1 distance between two inputs; |
| 21 | + * [CosineEmbeddingCriterion](#nn.CosineEmbeddingCriterion) : cosine distance between two inputs; |
| 22 | + * Miscelaneus criterions : |
| 23 | + * [MultiCriterion](#nn.MultiCriterion) : a weighted sum of other criterions; |
| 24 | + * [MarginRankingCriterion](#nn.MarginRankingCriterion) : ranks two inputs; |
13 | 25 |
|
14 | 26 | <a name="nn.Criterion"/>
|
15 | 27 | ## Criterion ##
|
@@ -55,8 +67,8 @@ criterion = nn.AbsCriterion()
|
55 | 67 | ```
|
56 | 68 |
|
57 | 69 | Creates a criterion that
|
58 |
| -measures the mean absolute value between `n` elements in the input `x` |
59 |
| -and output `y`: |
| 70 | +measures the mean absolute value of the element-wise difference between input `x` |
| 71 | +and target `y`: |
60 | 72 |
|
61 | 73 | ```lua
|
62 | 74 | loss(x,y) = 1/n \sum |x_i-y_i|
|
@@ -130,7 +142,7 @@ criterion = nn.CrossEntropyCriterion(weights)
|
130 | 142 | ```
|
131 | 143 |
|
132 | 144 | This criterion combines [LogSoftMax](#nn.LogSoftMax) and
|
133 |
| -[CrossEntropyCriterion](#nn.CrossEntropyCriterion) in one single class. |
| 145 | +[ClassNLLCriterion](#nn.ClassNLLCriterion) in one single class. |
134 | 146 |
|
135 | 147 | It is useful to train a classication problem with `n` classes. If
|
136 | 148 | provided, the optional argument `weights` should be a 1D Tensor assigning
|
@@ -162,12 +174,13 @@ loss(x, class) = forward(x, class) = weights[class]*( -x[class] + log( \sum_j e^
|
162 | 174 | criterion = nn.DistKLDivCriterion()
|
163 | 175 | ```
|
164 | 176 |
|
165 |
| -Kullback–Leibler divergence criterion. KL divergence is a useful distance |
166 |
| -measure for continuous distributions and is often useful when performance |
| 177 | +The [Kullback–Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) criterion. |
| 178 | +KL divergence is a useful distance |
| 179 | +measure for continuous distributions and is often useful when performing |
167 | 180 | direct regression over the space of (discretely sampled) continuous output
|
168 | 181 | distributions. As with ClassNLLCriterion, the `input` given through a
|
169 | 182 | `forward()` is expected to contain _log-probabilities_, however unlike
|
170 |
| -ClassNLLCriterion, `input` is not restricted to a 1D vector. |
| 183 | +ClassNLLCriterion, `input` is not restricted to a 1D or 2D vector (as the criterion is applied element-wise). |
171 | 184 |
|
172 | 185 | This criterion expect a `target` tensor of the same size as the `input`
|
173 | 186 | tensor when calling [forward(input, target)](#nn.CriterionForward) and
|
@@ -275,7 +288,40 @@ Creates a criterion that optimizes a multi-class classification hinge loss (marg
|
275 | 288 | ```lua
|
276 | 289 | loss(x,y) = forward(x,y) = sum_i(max(0, 1 - (x[y] - x[i]))^p) / x:size(1)
|
277 | 290 | ```
|
278 |
| -where i = 1 to x:size(1) and i ~= y |
| 291 | +where `i = 1` to `x:size(1)` and `i ~= y`. |
| 292 | +Note that this criterion also works with 2D inputs and 1D targets. |
| 293 | + |
| 294 | +This criterion is especially useful for classification when used in conjunction with a module ending in the following output layer: |
| 295 | +```lua |
| 296 | +mlp = nn.Sequential() |
| 297 | +mlp:add(nn.Euclidean(n,m)) -- outputs a vector of distances |
| 298 | +mlp:add(nn.MulConstant(-1)) -- distance to similarity |
| 299 | +``` |
| 300 | + |
| 301 | +<a name="nn.MultiLabelMarginCriterion"/> |
| 302 | +## MultiLabelMarginCriterion ## |
| 303 | + |
| 304 | +```lua |
| 305 | +criterion = nn.MultiLabelMarginCriterion() |
| 306 | +``` |
| 307 | + |
| 308 | +Creates a criterion that optimizes a multi-class multi-classification hinge loss |
| 309 | +(margin-based loss) between input `x` (a 1D Tensor) and output `y` (which is a 1D Tensor of target class indices) : |
| 310 | + |
| 311 | +```lua |
| 312 | +loss(x,y) = forward(x,y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x:size(1) |
| 313 | +``` |
| 314 | +where `i = 1` to `x:size(1)`, `j = 1` to `y:size(1)`, `y[j] ~= 0`, and `i ~= y[j]` for all `i` and `j`. |
| 315 | +Note that this criterion also works with 2D inputs and targets. |
| 316 | + |
| 317 | +`y` and `x` must have the same size. The criterion only considers the first non zero `y[j]` targets. |
| 318 | +This allows for different samples to have variable amounts of target classes: |
| 319 | +```lua |
| 320 | +criterion = nn.MultiLabelMarginCriterion() |
| 321 | +input = torch.randn(2,4) |
| 322 | +target = torch.Tensor{{1,3,0,0},{4,0,0,0}} -- zero-values are ignored |
| 323 | +criterion:forward(input, target) |
| 324 | +``` |
279 | 325 |
|
280 | 326 | <a name="nn.MSECriterion"/>
|
281 | 327 | ## MSECriterion ##
|
@@ -534,31 +580,3 @@ for i=1,100 do
|
534 | 580 | end
|
535 | 581 | ```
|
536 | 582 |
|
537 |
| -<a name="nn.L1Penalty"/> |
538 |
| -## L1Penalty ## |
539 |
| - |
540 |
| -```lua |
541 |
| -penalty = nn.L1Penalty(L1weight, sizeAverage) |
542 |
| -``` |
543 |
| - |
544 |
| -L1Penalty is an inline module that in it's FPROP copies the input Tensor directly to the output, and computes an L1 loss of the latent state (input) and stores it in the module's `loss` field. During BPROP: `gradInput = gradOutput + gradLoss`. |
545 |
| - |
546 |
| -This module can be used in autoencoder architectures to apply L1 losses to internal latent state without having to use Identity and parallel containers to carry the internal code to an output criterion. |
547 |
| - |
548 |
| -Example (sparse autoencoder, note: decoder should be normalized): |
549 |
| - |
550 |
| -```lua |
551 |
| -encoder = nn.Sequential() |
552 |
| -encoder:add(nn.Linear(3, 128)) |
553 |
| -encoder:add(nn.Threshold()) |
554 |
| -decoder = nn.Linear(128,3) |
555 |
| - |
556 |
| -autoencoder = nn.Sequential() |
557 |
| -autoencoder:add(encoder) |
558 |
| -autoencoder:add(nn.L1Penalty(l1weight)) |
559 |
| -autoencoder:add(decoder) |
560 |
| - |
561 |
| -criterion = nn.MSECriterion() -- To measure reconstruction error |
562 |
| --- ... |
563 |
| -``` |
564 |
| - |
|
0 commit comments