Skip to content

Fine-Tuning a Stable Diffusion model on a limited amount of VRAM.

Notifications You must be signed in to change notification settings

caetas/FineTune_SD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-Tuning Stable Diffusion

Python Code style: black Checked with mypy Imports: isort documentation pre-commit mlflow dvc Hydra security: bandit pytest conventional-commits

Fine-Tuning a Stable Diffusion model on a limited amount of VRAM using a ControlNet. A dataset composed of Pokemon pictures with BLIP descriptions was modified to fine-tune Stable Diffusion 1.5. By generating sketch-like masks based on the original training set, the goal is to teach the original model to take scribbles (as well as the text prompts) as a conditioning mask for the generation of new Pokemons.

Prerequisites

You will need:

  • python (see pyproject.toml for full version)
  • Git
  • Make
  • a .secrets file with the required secrets and credentials
  • load environment variables from .env
  • A GPU with more than 8 GB of VRAM

Installation

Clone this repository (requires git ssh keys)

git clone --recursive [email protected]:caetas/FineTune_SD.git
cd finetune_sd

Install dependencies

conda create -y -n python3.9 python=3.9
conda activate python3.9

or if environment already exists

conda env create -f environment.yml
conda activate python3.9

And then setup all virtualenv using make file recipe

(finetune_sd) $ make setup-all

Download the Pokemon Dataset

The dataset can be downloaded from the following link. The .parquet file should be moved to the data/raw directory.

To process the images and generate the masks, please run the following command:

python prepare_dataset.py

The processed dataset will be stored in the data/processed folder.

Train the ControlNet

Run the following commands:

chmod -X control_execute.sh
bash train_script.sh

NOTE: You can skip the first command after the first execution.

Results

Although my input sketch is very rudimentary, the trained network can follow the provided textual and visual instructions to generate a new (very ugly) Pokemon.

One of the images generated during training

You can check more examples in the reports/figures

Inference Mode

With a Streamlit app, you can draw your own sketch of a Pokemon and ask the pretrained ControlNet to generate an image based on your sketch and a prompt. The influence of the prompt and of the ControlNet can also be adjusted via some sliders.

streamlit run app.py

The interface should look like this:

Interface Example

Documentation

Full documentation is available here: docs/.

For more information on the Diffusers library implementation of ControlNet, you can visit the original tutorials

This code was adapted from the aforementioned tutorial and from the training scripts contained in the Diffusers Repository, more precisely in the examples folder.

Dev

See the Developer guidelines for more information.

Contributing

Contributions of any kind are welcome. Please read CONTRIBUTING.md for details and the process for submitting pull requests to us.

Changelog

See the Changelog for more information.

Security

Thank you for improving the security of the project, please see the Security Policy for more information.

License

This project is licensed under the terms of the No license license. See LICENSE for more details.

Citation

If you publish work that uses Fine-Tuning Stable Diffusion, please cite Fine-Tuning Stable Diffusion as follows:

@misc{Fine-Tuning Stable Diffusion FineTune_SD,
  author = {None},
  title = {Fine-Tuning a Stable Diffusion model on a limited amount of VRAM.},
  year = {2023},
}

About

Fine-Tuning a Stable Diffusion model on a limited amount of VRAM.

Resources

Security policy

Stars

Watchers

Forks

Packages

No packages published