Skip to content

Commit 8492652

Browse files
Criterion doc++
1 parent 0916842 commit 8492652

5 files changed

+103
-55
lines changed

CosineEmbeddingCriterion.lua

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ local CosineEmbeddingCriterion, parent = torch.class('nn.CosineEmbeddingCriterio
22

33
function CosineEmbeddingCriterion:__init(margin)
44
parent.__init(self)
5-
margin=margin or 0
5+
margin = margin or 0
66
self.margin = margin
77
self.gradInput = {torch.Tensor(), torch.Tensor()}
88
end
@@ -15,7 +15,7 @@ function CosineEmbeddingCriterion:updateOutput(input,y)
1515
self.w32 = input2:dot(input2)
1616
self.w3 = math.sqrt(self.w32)
1717
self.output = self.w1/self.w2/self.w3
18-
if y==-1 then
18+
if y == -1 then
1919
self.output = math.max(0, self.output - self.margin);
2020
else
2121
self.output = 1 - self.output

HingeEmbeddingCriterion.lua

+6-6
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,24 @@ local HingeEmbeddingCriterion, parent = torch.class('nn.HingeEmbeddingCriterion'
22

33
function HingeEmbeddingCriterion:__init(margin)
44
parent.__init(self)
5-
margin=margin or 1
5+
margin = margin or 1
66
self.margin = margin
77
self.gradInput = torch.Tensor(1)
88
end
99

1010
function HingeEmbeddingCriterion:updateOutput(input,y)
11-
self.output=input[1]
12-
if y==-1 then
13-
self.output = math.max(0,self.margin - self.output);
11+
self.output = input[1]
12+
if y == -1 then
13+
self.output = math.max(0,self.margin - self.output);
1414
end
1515
return self.output
1616
end
1717

1818
function HingeEmbeddingCriterion:updateGradInput(input, y)
19-
self.gradInput[1]=y
19+
self.gradInput[1] = y
2020
local dist = input[1]
2121
if y == -1 and dist > self.margin then
22-
self.gradInput[1]=0;
22+
self.gradInput[1] = 0;
2323
end
2424
return self.gradInput
2525
end

L1HingeEmbeddingCriterion.lua

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@ local L1HingeEmbeddingCriterion, parent = torch.class('nn.L1HingeEmbeddingCriter
22

33
function L1HingeEmbeddingCriterion:__init(margin)
44
parent.__init(self)
5-
margin=margin or 1
5+
margin = margin or 1
66
self.margin = margin
77
self.gradInput = {torch.Tensor(), torch.Tensor()}
88
end
99

1010
function L1HingeEmbeddingCriterion:updateOutput(input,y)
1111
self.output=input[1]:dist(input[2],1);
12-
if y==-1 then
12+
if y == -1 then
1313
self.output = math.max(0,self.margin - self.output);
1414
end
1515
return self.output

doc/criterion.md

+62-44
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,27 @@
11
<a name="nn.Criterions"/>
22
# Criterions #
33

4-
Criterions are helpful to train a neural network. Given an input and a
5-
target, they compute a gradient according to a given loss
6-
function. [AbsCriterion](#nn.AbsCriterion) and
7-
[MSECriterion](#nn.MSECriterion) are perfect for regression problems, while
8-
[ClassNLLCriterion](#nn.ClassNLLCriterion) or
9-
[CrossEntropyCriterion](#nn.CrossEntropyCriterion) are the criteria of
10-
choice when dealing with classification.
11-
12-
Criterions are [serializable](https://github.com/torch/torch7/blob/master/doc/file.md#serialization-methods).
4+
[Criterions](#nn.Criterion) are helpful to train a neural network. Given an input and a
5+
target, they compute a gradient according to a given loss function.
6+
7+
* Classification criterions :
8+
* [BCECriterion](#nn.BCECriterion) : binary cross-entropy (two-class version of [ClassNLLCriterion](#nn.ClassNLLCriterion));
9+
* [ClassNLLCriterion](#nn.ClassNLLCriterion) : negative log-likelihood for [LogSoftMax](transfer.md#nn.LogSoftMax) (multi-class);
10+
* [CrossEntropyCriterion](#nn.CrossEntropyCriterion) : combines [LogSoftMax](transfer.md#nn.LogSoftMax) and [ClassNLLCriterion](#nn.ClassNLLCriterion);
11+
* [MarginCriterion](#nn.MarginCriterion) : two class margin-based loss;
12+
* [MultiMarginCriterion](#nn.MultiMarginCriterion) : multi-class margin-based loss;
13+
* [MultiLabelMarginCriterion](#nn.MultiLabelMarginCriterion) : multi-class multi-classification margin-based loss;
14+
* Regression criterions :
15+
* [AbsCriterion](#nn.AbsCriterion) : measures the mean absolute value of the element-wise difference between input;
16+
* [MSECriterion](#nn.MSECriterion) : mean square error (a classic);
17+
* [DistKLDivCriterion](#nn.DistKLDivCriterion) : Kullback–Leibler divergence (for fitting continuous probability distributions);
18+
* Embedding criterions (measuring whether two inputs are similar or dissimilar):
19+
* [HingeEmbeddingCriterion](#nn.HingeEmbeddingCriterion) : takes a distance as input ;
20+
* [L1HingeEmbeddingCriterion](#nn.L1HingeEmbeddingCriterion) : L1 distance between two inputs;
21+
* [CosineEmbeddingCriterion](#nn.CosineEmbeddingCriterion) : cosine distance between two inputs;
22+
* Miscelaneus criterions :
23+
* [MultiCriterion](#nn.MultiCriterion) : a weighted sum of other criterions;
24+
* [MarginRankingCriterion](#nn.MarginRankingCriterion) : ranks two inputs;
1325

1426
<a name="nn.Criterion"/>
1527
## Criterion ##
@@ -55,8 +67,8 @@ criterion = nn.AbsCriterion()
5567
```
5668

5769
Creates a criterion that
58-
measures the mean absolute value between `n` elements in the input `x`
59-
and output `y`:
70+
measures the mean absolute value of the element-wise difference between input `x`
71+
and target `y`:
6072

6173
```lua
6274
loss(x,y) = 1/n \sum |x_i-y_i|
@@ -130,7 +142,7 @@ criterion = nn.CrossEntropyCriterion(weights)
130142
```
131143

132144
This criterion combines [LogSoftMax](#nn.LogSoftMax) and
133-
[CrossEntropyCriterion](#nn.CrossEntropyCriterion) in one single class.
145+
[ClassNLLCriterion](#nn.ClassNLLCriterion) in one single class.
134146

135147
It is useful to train a classication problem with `n` classes. If
136148
provided, the optional argument `weights` should be a 1D Tensor assigning
@@ -162,12 +174,13 @@ loss(x, class) = forward(x, class) = weights[class]*( -x[class] + log( \sum_j e^
162174
criterion = nn.DistKLDivCriterion()
163175
```
164176

165-
Kullback–Leibler divergence criterion. KL divergence is a useful distance
166-
measure for continuous distributions and is often useful when performance
177+
The [Kullback–Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) criterion.
178+
KL divergence is a useful distance
179+
measure for continuous distributions and is often useful when performing
167180
direct regression over the space of (discretely sampled) continuous output
168181
distributions. As with ClassNLLCriterion, the `input` given through a
169182
`forward()` is expected to contain _log-probabilities_, however unlike
170-
ClassNLLCriterion, `input` is not restricted to a 1D vector.
183+
ClassNLLCriterion, `input` is not restricted to a 1D or 2D vector (as the criterion is applied element-wise).
171184

172185
This criterion expect a `target` tensor of the same size as the `input`
173186
tensor when calling [forward(input, target)](#nn.CriterionForward) and
@@ -275,7 +288,40 @@ Creates a criterion that optimizes a multi-class classification hinge loss (marg
275288
```lua
276289
loss(x,y) = forward(x,y) = sum_i(max(0, 1 - (x[y] - x[i]))^p) / x:size(1)
277290
```
278-
where i = 1 to x:size(1) and i ~= y
291+
where `i = 1` to `x:size(1)` and `i ~= y`.
292+
Note that this criterion also works with 2D inputs and 1D targets.
293+
294+
This criterion is especially useful for classification when used in conjunction with a module ending in the following output layer:
295+
```lua
296+
mlp = nn.Sequential()
297+
mlp:add(nn.Euclidean(n,m)) -- outputs a vector of distances
298+
mlp:add(nn.MulConstant(-1)) -- distance to similarity
299+
```
300+
301+
<a name="nn.MultiLabelMarginCriterion"/>
302+
## MultiLabelMarginCriterion ##
303+
304+
```lua
305+
criterion = nn.MultiLabelMarginCriterion()
306+
```
307+
308+
Creates a criterion that optimizes a multi-class multi-classification hinge loss
309+
(margin-based loss) between input `x` (a 1D Tensor) and output `y` (which is a 1D Tensor of target class indices) :
310+
311+
```lua
312+
loss(x,y) = forward(x,y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x:size(1)
313+
```
314+
where `i = 1` to `x:size(1)`, `j = 1` to `y:size(1)`, `y[j] ~= 0`, and `i ~= y[j]` for all `i` and `j`.
315+
Note that this criterion also works with 2D inputs and targets.
316+
317+
`y` and `x` must have the same size. The criterion only considers the first non zero `y[j]` targets.
318+
This allows for different samples to have variable amounts of target classes:
319+
```lua
320+
criterion = nn.MultiLabelMarginCriterion()
321+
input = torch.randn(2,4)
322+
target = torch.Tensor{{1,3,0,0},{4,0,0,0}} -- zero-values are ignored
323+
criterion:forward(input, target)
324+
```
279325

280326
<a name="nn.MSECriterion"/>
281327
## MSECriterion ##
@@ -534,31 +580,3 @@ for i=1,100 do
534580
end
535581
```
536582

537-
<a name="nn.L1Penalty"/>
538-
## L1Penalty ##
539-
540-
```lua
541-
penalty = nn.L1Penalty(L1weight, sizeAverage)
542-
```
543-
544-
L1Penalty is an inline module that in it's FPROP copies the input Tensor directly to the output, and computes an L1 loss of the latent state (input) and stores it in the module's `loss` field. During BPROP: `gradInput = gradOutput + gradLoss`.
545-
546-
This module can be used in autoencoder architectures to apply L1 losses to internal latent state without having to use Identity and parallel containers to carry the internal code to an output criterion.
547-
548-
Example (sparse autoencoder, note: decoder should be normalized):
549-
550-
```lua
551-
encoder = nn.Sequential()
552-
encoder:add(nn.Linear(3, 128))
553-
encoder:add(nn.Threshold())
554-
decoder = nn.Linear(128,3)
555-
556-
autoencoder = nn.Sequential()
557-
autoencoder:add(encoder)
558-
autoencoder:add(nn.L1Penalty(l1weight))
559-
autoencoder:add(decoder)
560-
561-
criterion = nn.MSECriterion() -- To measure reconstruction error
562-
-- ...
563-
```
564-

doc/simple.md

+31-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ and providing affine transformations :
3434
* [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable));
3535
* [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ;
3636
* [Padding](#nn.Padding) : adds padding to a dimension ;
37-
37+
* [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
38+
3839
<a name="nn.Linear"/>
3940
## Linear ##
4041

@@ -949,3 +950,32 @@ module:forward(torch.randn(2,3)) --batch input
949950
-1.0000 -1.0000 -0.2219 -0.6529 -1.9218
950951
[torch.DoubleTensor of dimension 2x5]
951952
```
953+
954+
<a name="nn.L1Penalty"/>
955+
## L1Penalty ##
956+
957+
```lua
958+
penalty = nn.L1Penalty(L1weight, sizeAverage)
959+
```
960+
961+
L1Penalty is an inline module that in its forward propagation copies the input Tensor directly to the output, and computes an L1 loss of the latent state (input) and stores it in the module's `loss` field.
962+
During backward propagation: `gradInput = gradOutput + gradLoss`.
963+
964+
This module can be used in autoencoder architectures to apply L1 losses to internal latent state without having to use Identity and parallel containers to carry the internal code to an output criterion.
965+
966+
Example (sparse autoencoder, note: decoder should be normalized):
967+
968+
```lua
969+
encoder = nn.Sequential()
970+
encoder:add(nn.Linear(3, 128))
971+
encoder:add(nn.Threshold())
972+
decoder = nn.Linear(128,3)
973+
974+
autoencoder = nn.Sequential()
975+
autoencoder:add(encoder)
976+
autoencoder:add(nn.L1Penalty(l1weight))
977+
autoencoder:add(decoder)
978+
979+
criterion = nn.MSECriterion() -- To measure reconstruction error
980+
-- ...
981+
```

0 commit comments

Comments
 (0)