This project utilizes the power of Stable Diffusion (SDXL/SDXL-Light) and the BLIP (Bootstrapping Language-Image Pre-training) captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.
- Interactive Colorization: Users can specify desired colors for different objects in the image.
- ControlNet Approach: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.
- High-Quality Outputs: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.
- User-Friendly Interface: Easy-to-use interface for seamless interaction with the model.
To set up the project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/nick8592/text-guided-image-colorization.git cd text-guided-image-colorization
-
Install Dependencies: Make sure you have Python 3.7 or higher installed. Then, install the required packages:
pip install -r requirements.txt
Install
torch
andtorchvision
matching your CUDA version:pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXX
Replace
XXX
with your CUDA version (e.g.,118
for CUDA 11.8). For more info, see PyTorch Get Started. -
Download Pre-trained Models:
Models Hugging Face (Recommand) Other SDXL-Lightning Caption link link (2kNJfV) SDXL-Lightning Custom Caption link link (KW7Fpi) text-guided-image-colorization/sdxl_light_caption_output ├── checkpoint-30000 │ ├── diffusion_pytorch_model.safetensors │ └── config.json ├── optimizer.bin ├── random_states_0.pkl ├── scaler.pt └── scheduler.bin
- Run the
gradio_ui.py
script:
python gradio_ui.py
-
Open the provided URL in your web browser to access the Gradio-based user interface.
-
Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.
-
The model will generate a colorized version of the image based on your input (or automatic). See the demo video.
You can find more details about the dataset usage in the Dataset-for-Image-Colorization.
For training, you can use one of the following scripts:
train_controlnet.sh
: Trains a model using Stable Diffusion v2train_controlnet_sdxl.sh
: Trains a model using SDXLtrain_controlnet_sdxl_light.sh
: Trains a model using SDXL-Lightning
Although the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.
For evaluation, you can use one of the following scripts:
eval_controlnet.sh
: Evaluates the model using Stable Diffusion v2 for a folder of images.eval_controlnet_sdxl_light.sh
: Evaluates the model using SDXL-Lightning for a folder of images.eval_controlnet_sdxl_light_single.sh
: Evaluates the model using SDXL-Lightning for a single image.
Ground truth images are provided solely for reference purpose in the image colorization task.
Grayscale Image | Colorized Result | Ground Truth |
---|---|---|
This project is licensed under the MIT License. See the LICENSE file for more details.