Unofficial code implementation of "Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator" [Link]
Clone this project and install dependencies to set up the environment (Python 3.11 is recommended):
cd Diptych
pip install -r requirements.txt
Prepare GroundingDINO:
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
Prepare SAM:
mkdir SAM_checkpoints
Then download required checkpoints from: facebookresearch/segment-anything under ./SAM_checkpoints/
.
mkdir output
python inference_diptych.py --arg1 * --arg2 *
@article{shin2024large,
title={Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator},
author={Shin, Chaehun and Choi, Jooyoung and Kim, Heeseung and Yoon, Sungroh},
journal={arXiv preprint arXiv:2411.15466},
year={2024}
}
The code is mainly based on diffusers and FLUX-Controlnet-Inpainting.