A pure Julia implementation of denoising diffusion probabilistic models as popularised in Denoising Diffusion Probabilistic Models by Jonathan Ho, Ajay Jain and Pieter Abbeel (2020)
For detailed examples please the notebooks at the corresponding project at github.com/LiorSinai/DenoisingDiffusion-examples. The notebooks were originally part of this repository but were removed using git-filter-repo to make this repository more lightweight.
For an explanation of the diffusion process and the code please see my blog posts at liorsinai.github.io.
Reverse process (left) and final image estimate (right). These coincide on the final time step.
Denoising diffusion starts from an image of pure noise and gradually removes this noise across many time steps, resulting in a natural looking image. At each time step a model predicts the noise to be removed in order to reach the final image on the final time step from the current time step. This allows an estimate of the final image to be created, which is updated at every time step. The above image shows this process with a trained model for number generation.
Classifier free guidance
It is possible to direct the outcome using classifier free guidance as introduced in Classifier-Free Diffusion Guidance by Jonathan Ho and Tim Salimans (2022). In this mode a label as well as the timestep is passed to the model. Two candidates of the noise to be removed are generated at each timestep: unconditioned noise made using a generic label (label=1) and conditioned noise made using the target label. The noise that is removed is then given by a weighted combination of the two:
noise = ϵ_uncond + guidance_scale * (ϵ_cond - ϵ_uncond)
Where guidance_scale >= 1
. The difference (ϵ_cond - ϵ_uncond)
represents a very rough gradient.
The original paper uses ϵ_cond + guidance_scale * (ϵ_cond - ϵ_uncond)
but using the baseline as ϵ_uncond
instead allows it to be cancelled and skipped for the special case of guidance_scale = 1
.
The main export is the GaussianDiffusion
struct and associated functions.
Various models and building blocks are included.
The models includes a flexible ConditionalChain
based on Flux.Chain
. It can handle multiple inputs where the first input is given priority.
Two versions of UNets (convolutional autoencoder) are available, UNet
and UNetFixed
.
A UNet
model made to the same specifications as UNetFixed
is 100% equivalent.
UNet
is flexible and can have an arbitrary number of downsample/upsample pairs (more than five is not advisable).
It is based on nested skip connections.
UNetFixed
is a linear implementation of the same model.
UNetFixed
has three downsample/upsample pairs and three middle layers with a total of 16 layers. For the default configuration UNetFixed(1, 8, 100)
will have approximately 150,000 parameters.
About 50% of these parameters are in the middle layer - 24% in the attention layer alone.
For both models, every doubling of the model_channels
will approximately quadruple the number of parameters because the convolution layer size is proportional to the square of the dimension.
For number generation the Frechet Inception Distance (FID) is cumbersome. The Inception V3 model has 27.1 million parameters which is overkill for number generation. Instead the simpler Fréchet LeNet Distance is proposed. This uses the same calculation except with a smaller LeNet model with approximately 44,000 parameters. The output layer has 84 values as opposed to Inception V3's 2048.
No pretrained weights are necessary because the LeNet model can be very easily trained on a CPU. However results will not be standardised.
Example values are:
Model | Parameters | FLD | Notes |
---|---|---|---|
training data | 0 | 0.5 | |
UNetConditioned | 622, 865 | 7.0 | Guidance with |
UNet | 376,913 | 18.3 | No attention layer |
UNet | 602,705 | 23.9 | |
UNet | 602,705 | 26.3 | DDIM |
Random | 0 | >337 |
The loss is Mean Squared Error. All models were trained for 15 epochs.
Download the GitHub repository (it is not registered). Then in the Julia REPL:
julia> ] #enter package mode
(@v1.x) pkg> dev path\\to\\DenoisingDiffusion
julia> using Revise # allows dynamic edits to code
julia> using DenoisingDiffusion
Optionally, tests can be run with:
(@v1.x) pkg> test DenoisingDiffusion
This repository uses FastAi's nbdev to manage the Jupyter Notebooks for Git. This requires a Python installation of nbdev. To avoid using it, follow the steps in .gitconfig.
To run the examples:
julia examples\\train_images.jl --threads auto
Or start the Julia REPL and run it interactively.
There are three use cases:
- Spiral (2 values per data point).
- Numbers (28×28=784 values per data point.)
- Pokemon (48×48×3=6912 values per data point.)
The spiral use case requires approximately 1,000 parameters. The number generation requires at least 100 times this, and the Pokemon possibly more. So far, satisfying results for the Pokemon have not been achieved. See however This Pokémon Does Not Exist for an example trained on 1.3 billion parameter model.
- Self-attention blocks.
- DDIM for more efficient and faster image generation.
- Guided diffusion.