Skip to content

boomb0om/text2image-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.

Goals of this benchmark:

  • Unified metrics and datasets for all text-to-image models
  • Reproducible results
  • User-friendly interface for most popular metrics: FID and CLIP-score

Table of Contents

Introduction

Generative text-to-image models have become a popular and widely used tool for users. There are many articles on the topic of image generation from text that present new, more advanced models. However, there is still no uniform way to measure the quality of such models. To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models.

We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. We provide the MS-COCO validation subset and precalculated metrics for it. We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.

You can easily contribute your model into benchmark and make FID results reproducible! See more in contribution section.

Main features

  • Standardized FID calculation: fixed image preprocessing and InceptionV3 model.
  • FID-30k on MS-COCO validation set: we provide dataset on huggingface🤗, precomputed FID stats, fixed 30000 captions from MS-COCO that should be used to generate images
  • Implementations of different popular text-to-image models to make metrics reproducible
  • CLIP-score calculation
  • User-friendly metrics calculation (checkout Getting started)

Installation

pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/boomb0om/text2image-benchmark

Getting started

Metrics: FID

Calculate FID for two sets of images:

from T2IBenchmark import calculate_fid

fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
print(fid)

Calculate FID between model generations and MS-COCO validation subset:

from T2IBenchmark import calculate_fid
from T2IBenchmark.datasets import get_coco_fid_stats

fid, _ = calculate_fid(
    'path/to/your/generations/',
    get_coco_fid_stats()
)

MS-COCO FID-30k for T2IModelWrapper. In this example we are using Kandinsky 2.1 model:

pip install -r T2IBenchmark/models/kandinsky21/requirements.txt
from T2IBenchmark import calculate_coco_fid
from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper

fid, fid_data = calculate_coco_fid(
    Kandinsky21Wrapper,
    device='cuda:0',
    save_generations_dir='coco_generations/'
)

Metrics: CLIP-score

Example of calculating CLIP-score for a set of images and fixed prompt:

from T2IBenchmark import calculate_clip_score
from glob import glob

cat_paths = glob('assets/images/cats/*.jpg')
captions_mapping = {path: "a cat" for path in cat_paths}
clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping)

Project Structure

  • T2IBenchmark/
    • datasets/ - Datasets that can be used for evaluation
      • coco2014/ - MS-COCO 2014 validation subset
    • feature_extractors/ - Implementation of different neural nets used to extract features from images
    • metrics/ - Implementation of metrics
    • utils/ - Some utils
  • tests/ - Tests
  • docs/ - Documentation
  • examples/ - Benchmark usage examples
  • experiments/ - Experiments with metrics
  • assets/ - Assets

Examples

Examples of use are listed below in recommended order for study:

Documentation

  • FID.md - Explanation of different parameters that affects FID calculation

Contribution

If you want to contribute your model into this benchmark and publish metrics, follow these steps:

  1. Create a fork of this repository
  2. Create a wrapper for your model that inherits T2IModelWrapper class
  3. Generate images and calculate metrics using calculate_coco_fid. For more information see this example
  4. Create a pull request with your model
  5. Congrats!

TO-DO

  • Implementation of Inception Score (IS) and Kernel Inception Distance (KID)
  • FID-CLIPscore metric and plots
  • Implementation and FIDs for Kandinsky 2.X models with the help of Sber AI
  • Implementation and FIDs for popular models from diffusers: Stable Diffusion, IF

Contacts

Authors:

If you have any question, please email [email protected].

Citing

If you use this repository in your research, consider citing it using the following Bibtex entry:

@misc{boomb0omT2IBenchmark,
  author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
  title={{Text-to-Image Benchmark: A benchmark for generative models}},
  howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
  month={September},
  year={2023},
  note={Version 0.1.0},
}

Acknowledgments

Thanks to:

  • clean-fid - Explanation of influence of various parameters when calculating FID.
  • pytorch-fid - Port of the official implementation of Frechet Inception Distance to PyTorch.