Skip to content

Commit

Permalink
Merge pull request FluxML#2030 from svilupp/fix-typo-in-docs
Browse files Browse the repository at this point in the history
Fix typo in docs
  • Loading branch information
ToucheSir authored Aug 1, 2022
2 parents 0b62a91 + a9bc48a commit c4837f7
Show file tree
Hide file tree
Showing 6 changed files with 23 additions and 24 deletions.
12 changes: 6 additions & 6 deletions docs/src/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@ Some of the common workflows involving the use of GPUs are presented below.

### Transferring Training Data

In order to train the model using the GPU both model and the training data have to be transferred to GPU memory. This process can be done with the `gpu` function in two different ways:
In order to train the model using the GPU both model and the training data have to be transferred to GPU memory. This process can be done with the `gpu` function in two different ways:

1. Iterating over the batches in a [DataLoader](@ref) object transfering each one of the training batches at a time to the GPU.
1. Iterating over the batches in a [DataLoader](@ref) object transferring each one of the training batches at a time to the GPU.
```julia
train_loader = Flux.DataLoader((xtrain, ytrain), batchsize = 64, shuffle = true)
# ... model, optimizer and loss definitions
Expand All @@ -112,14 +112,14 @@ In order to train the model using the GPU both model and the training data have
end
```

2. Transferring all training data to the GPU at once before creating the [DataLoader](@ref) object. This is usually performed for smaller datasets which are sure to fit in the available GPU memory. Some possitilities are:
2. Transferring all training data to the GPU at once before creating the [DataLoader](@ref) object. This is usually performed for smaller datasets which are sure to fit in the available GPU memory. Some possibilities are:
```julia
gpu_train_loader = Flux.DataLoader((xtrain |> gpu, ytrain |> gpu), batchsize = 32)
```
```julia
gpu_train_loader = Flux.DataLoader((xtrain, ytrain) |> gpu, batchsize = 32)
```
Note that both `gpu` and `cpu` are smart enough to recurse through tuples and namedtuples. Other possibility is to use [`MLUtils.mapsobs`](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.mapobs) to push the data movement invocation into the background thread:
Note that both `gpu` and `cpu` are smart enough to recurse through tuples and namedtuples. Another possibility is to use [`MLUtils.mapsobs`](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.mapobs) to push the data movement invocation into the background thread:
```julia
using MLUtils: mapobs
# ...
Expand Down Expand Up @@ -159,7 +159,7 @@ let model = cpu(model)
BSON.@save "./path/to/trained_model.bson" model
end

# is equivalente to the above, but uses `key=value` storing directve from BSON.jl
# is equivalent to the above, but uses `key=value` storing directive from BSON.jl
BSON.@save "./path/to/trained_model.bson" model = cpu(model)
```
The reason behind this is that models trained in the GPU but not transferred to the CPU memory scope will expect `CuArray`s as input. In other words, Flux models expect input data coming from the same kind device in which they were trained on.
Expand All @@ -181,4 +181,4 @@ $ export CUDA_VISIBLE_DEVICES='0,1'
```


More information for conditional use of GPUs in CUDA.jl can be found in its [documentation](https://cuda.juliagpu.org/stable/installation/conditional/#Conditional-use), and information about the specific use of the variable is described in the [Nvidia CUDA blogpost](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).
More information for conditional use of GPUs in CUDA.jl can be found in its [documentation](https://cuda.juliagpu.org/stable/installation/conditional/#Conditional-use), and information about the specific use of the variable is described in the [Nvidia CUDA blog post](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Flux is a library for machine learning geared towards high-performance production pipelines. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

* **Doing the obvious thing**. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
* **Extensible by default**. Flux is written to be highly extensible and flexible while being performant. Extending Flux is as simple as using your own code as part of the model you want - it is all [high-level Julia code](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131). When in doubt, it’s well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
* **Performance is key**. Flux integrates with high-performance AD tools such as [Zygote.jl](https://github.com/FluxML/Zygote.jl) for generating fast code. Flux optimizes both CPU and GPU performance. Scaling workloads easily to multiple GPUs can be done with the help of Julia's [GPU tooling](https://github.com/JuliaGPU/CUDA.jl) and projects like [DaggerFlux.jl](https://github.com/DhairyaLGandhi/DaggerFlux.jl).
* **Play nicely with others**. Flux works well with Julia libraries from [data frames](https://github.com/JuliaComputing/JuliaDB.jl) and [images](https://github.com/JuliaImages/Images.jl) to [differential equation solvers](https://github.com/JuliaDiffEq/DifferentialEquations.jl), so you can easily build complex data processing pipelines that integrate Flux models.

Expand Down
6 changes: 3 additions & 3 deletions docs/src/training/optimisers.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ AdaBelief

Flux's optimisers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.

In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work on this with a simple example.

```julia
mutable struct Momentum
Expand Down Expand Up @@ -135,7 +135,7 @@ end
loss(rand(10)) # around 0.9
```

In this manner it is possible to compose optimisers for some added flexibility.
It is possible to compose optimisers for some added flexibility.

```@docs
Flux.Optimise.Optimiser
Expand All @@ -145,7 +145,7 @@ Flux.Optimise.Optimiser

In practice, it is fairly common to schedule the learning rate of an optimiser to obtain faster convergence. There are a variety of popular scheduling policies, and you can find implementations of them in [ParameterSchedulers.jl](https://darsnack.github.io/ParameterSchedulers.jl/dev/README.html). The documentation for ParameterSchedulers.jl provides a more detailed overview of the different scheduling policies, and how to use them with Flux optimizers. Below, we provide a brief snippet illustrating a [cosine annealing](https://arxiv.org/pdf/1608.03983.pdf) schedule with a momentum optimiser.

First, we import ParameterSchedulers.jl and initalize a cosine annealing schedule to varying the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
First, we import ParameterSchedulers.jl and initialize a cosine annealing schedule to vary the learning rate between `1e-4` and `1e-2` every 10 steps. We also create a new [`Momentum`](@ref) optimiser.
```julia
using ParameterSchedulers

Expand Down
21 changes: 10 additions & 11 deletions docs/src/training/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ To actually train a model we need four things:
* An [optimiser](optimisers.md) that will update the model parameters appropriately.

Training a model is typically an iterative process, where we go over the data set,
calculate the objective function over the datapoints, and optimise that.
calculate the objective function over the data points, and optimise that.
This can be visualised in the form of a simple loop.

```julia
Expand Down Expand Up @@ -41,7 +41,7 @@ more information can be found on [Custom Training Loops](../models/advanced.md).
## Loss Functions

The objective function must return a number representing how far the model is from its target – the *loss* of the model. The `loss` function that we defined in [basics](../models/basics.md) will work as an objective.
In addition to custom losses, model can be trained in conjuction with
In addition to custom losses, a model can be trained in conjunction with
the commonly used losses that are grouped under the `Flux.Losses` module.
We can also define an objective in terms of some model:

Expand All @@ -57,18 +57,18 @@ ps = Flux.params(m)
Flux.train!(loss, ps, data, opt)
```

The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built-in, like `mse` for mean squared error or `crossentropy` for cross-entropy loss, but you can calculate it however you want.
For a list of all built-in loss functions, check out the [losses reference](../models/losses.md).

At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
At first glance, it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility and the possibility of optimizing the calculations.

## Model parameters

The model to be trained must have a set of tracked parameters that are used to calculate the gradients of the objective function. In the [basics](../models/basics.md) section it is explained how to create models with such parameters. The second argument of the function `Flux.train!` must be an object containing those parameters, which can be obtained from a model `m` as `Flux.params(m)`.

Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values.

Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. Also, for freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
Handling all the parameters on a layer-by-layer basis is explained in the [Layer Helpers](../models/basics.md) section. For freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).

```@docs
Flux.params
Expand All @@ -93,7 +93,7 @@ using IterTools: ncycle
data = ncycle([(x, y)], 3)
```

It's common to load the `x`s and `y`s separately. In this case you can use `zip`:
It's common to load the `x`s and `y`s separately. Here you can use `zip`:

```julia
xs = [rand(784), rand(784), rand(784)]
Expand Down Expand Up @@ -159,8 +159,7 @@ end
## Custom Training loops
The `Flux.train!` function can be very convenient, especially for simple problems.
Its also very flexible with the use of callbacks.
But for some problems its much cleaner to write your own custom training loop.
For some problems, however, it's much cleaner to write your own custom training loop.
An example follows that works similar to the default `Flux.train` but with no callbacks.
You don't need callbacks if you just code the calls to your functions directly into the loop.
E.g. in the places marked with comments.
Expand All @@ -179,8 +178,8 @@ function my_custom_train!(loss, ps, data, opt)
end
# Insert whatever code you want here that needs training_loss, e.g. logging.
# logging_callback(training_loss)
# Insert what ever code you want here that needs gradient.
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge.
# Insert whatever code you want here that needs gradients.
# e.g. logging histograms with TensorBoardLogger.jl to check for exploding gradients.
update!(opt, ps, gs)
# Here you might like to check validation set accuracy, and break out to do early stopping.
end
Expand All @@ -202,7 +201,7 @@ function my_custom_train!(loss, ps, data, opt)
# logging_callback(training_loss)
# Apply back() to the correct type of 1.0 to get the gradient of loss.
gs = back(one(train_loss))
# Insert what ever code you want here that needs gradient.
# Insert whatever code you want here that needs gradient.
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge.
update!(opt, ps, gs)
# Here you might like to check validation set accuracy, and break out to do early stopping.
Expand Down
4 changes: 2 additions & 2 deletions docs/src/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Flux.skip

Flux provides utilities for controlling your training procedure according to some monitored condition and a maximum `patience`. For example, you can use `early_stopping` to stop training when the model is converging or deteriorating, or you can use `plateau` to check if the model is stagnating.

For example, below we create a pseudo-loss function that decreases, bottoms out, then increases. The early stopping trigger will break the loop before the loss increases too much.
For example, below we create a pseudo-loss function that decreases, bottoms out, and then increases. The early stopping trigger will break the loop before the loss increases too much.
```julia
# create a pseudo-loss that decreases for 4 calls, then starts increasing
# we call this like loss()
Expand All @@ -143,7 +143,7 @@ es = early_stopping(loss, 2; init_score = 9)
end
```

The keyword argument `distance` of `early_stopping` is a function of the form `distance(best_score, score)`. By default `distance` is `-`, which implies that the monitored metric `f` is expected to be decreasing and mimimized. If you use some increasing metric (e.g. accuracy), you can customize the `distance` function: `(best_score, score) -> score - best_score`.
The keyword argument `distance` of `early_stopping` is a function of the form `distance(best_score, score)`. By default `distance` is `-`, which implies that the monitored metric `f` is expected to be decreasing and minimized. If you use some increasing metric (e.g. accuracy), you can customize the `distance` function: `(best_score, score) -> score - best_score`.
```julia
# create a pseudo-accuracy that increases by 0.01 each time from 0 to 1
# we call this like acc()
Expand Down
2 changes: 1 addition & 1 deletion src/optimise/train.jl
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Here `pars` is produced by calling [`Flux.params`](@ref) on your model.
(Or just on the layers you want to train, like `train!(loss, params(model[1:end-2]), data, opt)`.)
This is the "implicit" style of parameter handling.
Then, this gradient is used by optimizer `opt` to update the paramters:
This gradient is then used by optimizer `opt` to update the parameters:
```
update!(opt, pars, grads)
```
Expand Down

0 comments on commit c4837f7

Please sign in to comment.