Skip to content

Commit

Permalink
Rendered TeX expressions in 0a03049, e5e1b35, 111dcb6, 9502ad5, 4db807d
Browse files Browse the repository at this point in the history
  • Loading branch information
texify[bot] committed Nov 23, 2018
1 parent 35b190a commit 26230fa
Show file tree
Hide file tree
Showing 8 changed files with 346 additions and 5 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,8 @@ The formulae are converted to images by [TeXify](https://github.com/apps/texify)

* [Lesson 2](notes/lesson-2.md)
* [Lesson 3](notes/lesson-3.md)
* [Lesson 4](notes/lesson-4.md) (in progress)

### Other resources

* [Tatiana's excellent Lesson 2 notes](Lesson2-Tatiana-notes.pdf)
10 changes: 5 additions & 5 deletions notes/lesson-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ Word then spread organically. Research labs then started switching.

### Debugging and Designing PyTorch

First public release 1.12.
First public release 0.1.12.

In about v1.6, Justin Johnson (co-runs CS231 ConvNets course at Stanford) was interning at FAIR. His networks weren't training because of a non-contiguous tensor passed through a linear layer.
In about v0.1.6, Justin Johnson (co-runs CS231 ConvNets course at Stanford) was interning at FAIR. His networks weren't training because of a non-contiguous tensor passed through a linear layer.

Python is extremely readable and writable, but the trade-off is speed.

Expand All @@ -51,7 +51,7 @@ If the focus is on production, then the library may not be as usable.

PyTorch 0.4 scaled to thousands of GPUs of parallel training.

1.0 allows exporting the model to a C++ runtime, or quantising (running at 8-bit rather than 32-bit).
v0.1.0 allows exporting the model to a C++ runtime, or quantising (running at 8-bit rather than 32-bit).

For production, function annotations are added to the model. PyTorch will create the model in its own iternal format which can then be shipped to production.

Expand All @@ -67,9 +67,9 @@ This is powered by a PyTorch's JIT compiler. Its first aim was to allow easy pr

The long-term JIT compiler objective is to allow the parts of the model which are compiled to be non-trivially optimised by fusing operations, eg, making memory bandwith bound operations into compute-bound operations. As new hardware comes available, it can be optimised for larger graphs.

Before 1.0, the ONNX open standard was released. A standard for all deep learning frameworks to talk to each other. Partnered with Microsoft and other big players like Chainer, Caffe2, Tensorflow so that a model trained in one can be exported to another.
Before 0.1.0, the ONNX open standard was released. A standard for all deep learning frameworks to talk to each other. Partnered with Microsoft and other big players like Chainer, Caffe2, Tensorflow so that a model trained in one can be exported to another.

Before 1.0, export would be via ONNX to run in somthing like TensorFlow.
Before 0.1.0, export would be via ONNX to run in somthing like TensorFlow.
But not all complex models could be exported as the standard wasn't developed enough.

### Cutting-edge applications in PyTorch
Expand Down
189 changes: 189 additions & 0 deletions notes/lesson-4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Lesson 4 - Introduction to PyTorch

It is mandatory to inherit from `nn.Module` when you're creating a class for your network.

PyTorch networks created with `nn.Module` must have a `forward` method defined. It takes in a tensor `x` and passes it through the operations you defined in the `__init__` method.

[PyTorch Basic Operations Summary](https://jhui.github.io/2018/02/09/PyTorch-Basic-operations/)

* `*` does element-wise multiplication.
* `@` does matrix multiplication, like A.mm(B). `.matmul` is an alias. `.mm` does not broadcast.

`torch.dot()` treats both objects as 1D vectors (irrespective of their original shape) and computes their inner product.

In-place operators end with a `_`, eg `a.add_(b) == a = a + b`

All operators have an `out` parameter where the result is stored: `torch.add(x, y, out=r1)`

`.numpy()` and `.from_numpy()` convert to/from numpy format. PyTorch uses the same memory layout, and by default the objects are not duplicated.

* `.reshape()` will sometimes return the same memory range, and sometimes a clone
* `.view((<shape>))` to reshape a tensor without changing it. Complains if the shape is invalid.
* `.resize()` can drop or add elements to satisify the given shape.

Set a random seed:

```
torch.manual_seed(1)
if torch.cuda.is_available:
torch.cuda.manual_seed_all(1)
```

The default datatype of a tensor is `long`.

This course has the weight matrices arranged transposed compared to Andrew Ng:

<img src="/notes/tex/b381b810e5e1d2d2173ca9dd32ce8601.svg?invert_in_darkmode&sanitize=true" align=middle width=160.22186564999998pt height=24.65753399999998pt/>

Instead of <img src="/notes/tex/8f5653c8b9cca851b9adae8f54135c40.svg?invert_in_darkmode&sanitize=true" align=middle width=85.73797814999999pt height=22.831056599999986pt/> it is: <img src="/notes/tex/6a4eab0aeb6f15cde85dba4ab7e153ec.svg?invert_in_darkmode&sanitize=true" align=middle width=85.73797815pt height=22.831056599999986pt/>


### Part 2 - Neural Networks in Pytorch

Display an image:
```
import matplotlib.pyplot as plt
plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r');
```
The first argument is expected to be matrix.

Weights: `model.fc1.weight`
Bias: `print(model.fc1.bias)`

```
# Set biases to all zeros
model.fc1.bias.data.fill_(0)
# Sample from random normal with standard dev = 0.01
model.fc1.weight.data.normal_(std=0.01)
```
## Part 3 - Training Neural Networks

### Gradient Descent

![l4-fwd-and-back-basses](l4-fwd-and-back-basses.png)

<p align="center"><img src="/notes/tex/31d5beaf8634c2a798828a5114558d86.svg?invert_in_darkmode&sanitize=true" align=middle width=187.3846755pt height=36.2778141pt/></p>

**Note:** I'm glossing over a few details here that require some knowledge of vector calculus, but they aren't necessary to understand what's going on.

We update our weights using this gradient with some learning rate <img src="/notes/tex/c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode&sanitize=true" align=middle width=10.57650494999999pt height=14.15524440000002pt/>.

<p align="center"><img src="/notes/tex/2874884b5bad5695b3f8896adf9e77fc.svg?invert_in_darkmode&sanitize=true" align=middle width=132.89707694999998pt height=36.2778141pt/></p>

The learning rate <img src="/notes/tex/c745b9b57c145ec5577b82542b2df546.svg?invert_in_darkmode&sanitize=true" align=middle width=10.57650494999999pt height=14.15524440000002pt/> is set such that the weight update steps are small enough that the iterative method settles in a minimum.

### Losses in PyTorch

By convention, the loss function is assigned to: `criterion = nn.CrossEntropyLoss`.

The input to criterion functions is expected to be class scores, not probabilities.

[`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss) criterion combines `nn.LogSoftmax()` and `nn.NLLLoss()` in one single class.

```
criterion = nn.CrossEntropyLoss()
...
# Calculate the loss with the pre-probability logits and the labels
loss = criterion(logits, labels)
```

It's recommend to use `log_softmax`, `criterion = nn.NLLLoss()`, and get prediction probabilities with `torch.exp(model(input))`.

### Autograd

Autograd works by keeping track of operations performed on tensors, then going backwards through those operations, calculating gradients along the way.

To make sure PyTorch keeps track of operations on a tensor and calculates the gradients, you need to set `requires_grad = True` on a tensor. You can do this at creation with the `requires_grad` keyword, or at any time with `x.requires_grad_(True)`.

You can turn off gradients for a block of code with the `torch.no_grad()` content:
```python
x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
... y = x * 2
>>> y.requires_grad
False
```

Also, you can turn on or off gradients altogether with `torch.set_grad_enabled(True|False)`.

`.grad` shows a tensor's gradient as calculated by `loss.backward()`

`.grad_fn` shows the function used to calculate `.grad`, eg:
```
y = x**2
print(y.grad_fn)
```
Gives: `<PowBackward0 object at 0x7f7ea8231a58>`

## Part 4 - Fashion-MNIST

```
from torch import nn, optim
import torch.nn.functional as F
```

The network can be defined as a subclass of `nn.Module`:
```
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.log_softmax(self.fc4(x), dim=1)
return x
```

Note that `forward()` automatically resizes the images via `view()`.

Alternatively, simple models can be defined via `Sequential`:

```
model = nn.Sequential(nn.Linear(784, 384),
nn.ReLU(),
nn.Linear(384, 128),
nn.ReLU(),
nn.Linear(128, 10),
nn.LogSoftmax(dim=1))
```

### Create the network, define the criterion and optimizer
```
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters())
```

### Train a network
```
epochs = 5
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
log_ps = model(images)
loss = criterion(log_ps, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Training loss: {running_loss/len(trainloader)}")
```

## Part 5 - Inference and Validation

Todo

28 changes: 28 additions & 0 deletions notes/tex/2874884b5bad5695b3f8896adf9e77fc.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 26230fa

Please sign in to comment.