Skip to content

Code implementation of "Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator"

Notifications You must be signed in to change notification settings

wuyou22s/Diptych

Repository files navigation

Diptych

Unofficial code implementation of "Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator" [Link]

Setup

Clone this project and install dependencies to set up the environment (Python 3.11 is recommended):

cd Diptych
pip install -r requirements.txt

Prepare GroundingDINO:

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

Prepare SAM:

mkdir SAM_checkpoints

Then download required checkpoints from: facebookresearch/segment-anything under ./SAM_checkpoints/.

Running

mkdir output
python inference_diptych.py --arg1 * --arg2 *

Citation

@article{shin2024large,
  title={Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator},
  author={Shin, Chaehun and Choi, Jooyoung and Kim, Heeseung and Yoon, Sungroh},
  journal={arXiv preprint arXiv:2411.15466},
  year={2024}
}

Acknowledgements

The code is mainly based on diffusers and FLUX-Controlnet-Inpainting.

About

Code implementation of "Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages