Skip to content

Commit

Permalink
remove app.labml.ai links
Browse files Browse the repository at this point in the history
  • Loading branch information
vpj committed Apr 2, 2023
1 parent 97e53c0 commit c5685c9
Show file tree
Hide file tree
Showing 154 changed files with 4,116 additions and 4,285 deletions.
139 changes: 69 additions & 70 deletions docs/adaptive_computation/ponder_net/index.html

Large diffs are not rendered by default.

98 changes: 49 additions & 49 deletions docs/capsule_networks/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/capsule_networks/readme.html
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ <h1><a href="https://nn.labml.ai/capsule_networks/index.html">Capsule Networks</
<p>This file holds the implementations of the core modules of Capsule Networks.</p>
<p>I used <a href="https://github.com/jindongwang/Pytorch-CapsuleNet">jindongwang/Pytorch-CapsuleNet</a> to clarify some confusions I had with the paper.</p>
<p>Here&#x27;s a notebook for training a Capsule Network on MNIST dataset.</p>
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://app.labml.ai/run/e7c08e08586711ebb3e30242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a> </p>
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/capsule_networks/mnist.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> </p>

</div>
<div class='code'>
Expand Down
180 changes: 90 additions & 90 deletions docs/cfr/kuhn/index.html

Large diffs are not rendered by default.

63 changes: 31 additions & 32 deletions docs/conv_mixer/experiment.html

Large diffs are not rendered by default.

125 changes: 62 additions & 63 deletions docs/conv_mixer/index.html

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions docs/conv_mixer/readme.html
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,7 @@ <h1><a href="https://nn.labml.ai/conv_mixer/index.html">Patches Are All You Need
<p>ConvMixer is Similar to <a href="https://nn.labml.ai/transformers/mlp_mixer/index.html">MLP-Mixer</a>. MLP-Mixer separates mixing of spatial and channel dimensions, by applying an MLP across spatial dimension and then an MLP across the channel dimension (spatial MLP replaces the <a href="https://nn.labml.ai/transformers/vit/index.html">ViT</a> attention and channel MLP is the <a href="https://nn.labml.ai/transformers/feed_forward.html">FFN</a> of ViT).</p>
<p>ConvMixer uses a 1x1 convolution for channel mixing and a depth-wise convolution for spatial mixing. Since it&#x27;s a convolution instead of a full MLP across the space, it mixes only the nearby batches in contrast to ViT or MLP-Mixer. Also, the MLP-mixer uses MLPs of two layers for each mixing and ConvMixer uses a single layer for each mixing.</p>
<p>The paper recommends removing the residual connection across the channel mixing (point-wise convolution) and having only a residual connection over the spatial mixing (depth-wise convolution). They also use <a href="https://nn.labml.ai/normalization/batch_norm/index.html">Batch normalization</a> instead of <a href="../normalization/layer_norm/index.html">Layer normalization</a>.</p>
<p>Here&#x27;s <a href="https://nn.labml.ai/conv_mixer/experiment.html">an experiment</a> that trains ConvMixer on CIFAR-10.</p>
<p><a href="https://app.labml.ai/run/0fc344da2cd011ecb0bc3fdb2e774a3d"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a></p>
<p>Here&#x27;s <a href="https://nn.labml.ai/conv_mixer/experiment.html">an experiment</a> that trains ConvMixer on CIFAR-10. </p>

</div>
<div class='code'>
Expand Down
151 changes: 75 additions & 76 deletions docs/distillation/index.html

Large diffs are not rendered by default.

73 changes: 36 additions & 37 deletions docs/distillation/large.html

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions docs/distillation/readme.html
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,7 @@ <h1><a href="https://nn.labml.ai/distillation/index.html">Distilling the Knowled
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation/tutorial of the paper <a href="https://papers.labml.ai/paper/1503.02531">Distilling the Knowledge in a Neural Network</a>.</p>
<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>
<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>
<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better.</p>
<p><a href="https://app.labml.ai/run/d6182e2adaf011eb927c91a2a1710932"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a> </p>
<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>

</div>
<div class='code'>
Expand Down
Loading

0 comments on commit c5685c9

Please sign in to comment.