Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models

Source Code for Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models accepted at ICCV2023

Prerequesites

You need a pretrained text to image diffusion model. We did our main experiments on Stable diffusion v1.4 but observed similar results with v1.3. Next install the required packages:

pip install -r requirements.txt
pip install -e ./stable-diffusion

We resolve experiments with experiment config files located in ./experiments. If you want to perform experiments on your own dataset you mainly have to change the following arguments:

base_dir = "PATH" # path to dataset
out_dir = os.path.join(base_dir, "preliminary_masks/", "compute_preliminary_bird_masks") # where preliminary masks are stored
rev_diff_steps = 40 # T in paper 
num_repeat_each_diffusion_step = 1 # N in paper
prompt = "a photo of a background"  
goal_prompt = "bird" # object we want to detect, currently unused (defined explicitly when AttentionExtractor is set to relevant_token_step_mean)
foreground_prompt = "a photo of a bird" 
background_prompt = "a photo of a background"
attention_extractor = AttentionExtractor("relevant_token_step_mean", tok_idx=5, steps=rev_diff_steps) # type of extraction methodextraction method

Expected layout: The dataloader expects the following structure:

basedir/
├── images.txt 
├── images/ 
│   ├── 036.Northern_Flicker/Northern_Flicker_0124_28966.jpg         # Load and stress tests
│   ├── ...

where images.txt is a list with the relative path and split numbers:

images/036.Northern_Flicker/Northern_Flicker_0124_28966.jpg 0
images/066.Western_Gull/Western_Gull_0143_54909.jpg 0 
...

Pipeline

Overview of our model pipeline

Key to our pipeline is the extraction of attention maps using text to image diffusion models. To do this we take the official repository of stable diffusion and change the implementation of attention to save the attention probabilities (see src.stable-diffusion.ldm.modules.attention.CrossAttention.save_attention)

Step 1: Preliminary mask extraction

To save computation time we precompute the attention masks of the entire dataset into the folder defined by out_dir

python scripts/compute_attention_masks_raw.py experiments/birds/compute_preliminary_bird_masks_train.py

Step 2: Finetune stable diffusion on your custom dataset

Now change the value of _data.params.train.params.opt_path to be equal to the output path from the previous steps and run the following command for finetuning the diffsion model

python finetune-stable-diffusion/main.py -t --base ./experiments/configs/bird_finetuning.yaml --gpus 0,1 --scale_lr False --num_nodes 1  --check_val_every_n_epoch  1 --finetune_from ./stable-diffusion/sd-v1-4-full-ema.ckpt data.params.batch_size=4 lightning.trainer.accumulate_grad_batches=1 data.params.validation.params.n_gpus=2

Sampling: Now we can sample foreground and background using the prompt 'a photo of a background' or 'a photo of a bird'

Optional: Generate a synthetic foreground dataset and repeat step 1 and step 2

Step 3:

The next step includes the preliminary masks but this time inpaints the foreground region with the background image. The difference image is then saved.

python ./scripts/sample_background_prelim_masks.py experiments/birds/compute_preliminary_bird_masks_train.py # computes inpainted mask (Step 3)

Segmentation Evaluation

To evaluate foreground segmentation we take the difference image from Step 3 as ground truth labels and train the model towards segmentation.

python ./scripts/train_segmentation_refined.py experiments/birds/compute_preliminary_bird_masks_train.py

For more details have a look at our paper

@InProceedings{Dombrowski_2023_ICCV,
    author    = {Dombrowski, Mischa and Reynaud, Hadrien and Baugh, Matthew and Kainz, Bernhard},
    title     = {Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {988-998}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments		experiments
finetune-stable-diffusion		finetune-stable-diffusion
media		media
scripts		scripts
src		src
stable-diffusion		stable-diffusion
LICENSE		LICENSE
README.md		README.md
log.py		log.py
requirements.txt		requirements.txt
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models

Prerequesites

Pipeline

Overview of our model pipeline

Step 1: Preliminary mask extraction

Step 2: Finetune stable diffusion on your custom dataset

Step 3:

Segmentation Evaluation

For more details have a look at our paper

About

Releases

Packages

Languages

License

MischaD/fobadiffusion

Folders and files

Latest commit

History

Repository files navigation

Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models

Prerequesites

Pipeline

Overview of our model pipeline

Step 1: Preliminary mask extraction

Step 2: Finetune stable diffusion on your custom dataset

Step 3:

Segmentation Evaluation

For more details have a look at our paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages