Skip to content

Create copy/paste synthetic images for object detection and instance segmentation.

License

Notifications You must be signed in to change notification settings

JureHudoklin/CopyPaste_DatasetGenerator

Repository files navigation

CI basis arxiv project page

Synthetic Dataset Generation

This repository is a modified (and extended) version of debidatta/syndata-generation, which hopefully is more convenient to use. All credits to the original authors (also see Citation).

Overview
Figure: Minimal dataset example of 5 images where only assets from the image pool were used.

This repo helps you to very easily create your own instance segmentation dataset rapidly. What do you need? The relevant resources, i.e. an image pool of

  • background images (any sizes)
  • objects of interest (in RGBA format)
  • distractor objects (in RGBA format)
  • splits for training, validation and test (of your resources) as described here

The rest will be handled by this repository :) If you need help gathering data for the image pool, check our project page with details on image scraping and asset selection.

This version was developed as part of a paper (citation), also check the project page for more details.

Usage

There two places to make configurations:

config.py to adjust e.g. (all variables are explained in the comments)

  • number of objects of interests
  • number of distractors
  • max IoU between objects
  • which blending methods are used
  • ... (see config.py)

generate_synthetic_data.py to set

  • paths to the resources needed for dataset generation (also see Data)
  • number of images that should be generated
  • flags for enabling occlusion, rotation and scaling
  • flag for multithreading for faster image generation (recommended)

Locally

Install the requirements

pip install -r requirements.txt

And run

python src/tools/generate_synthetic_data.py

Docker

Build using

source scripts/docker_build.sh           # for CPU
source scripts/GPU/docker_build.sh       # for GPU (faster Poisson Blending)

Run dataset generation using

source scripts/docker_run.sh             # for CPU
source scripts/GPU/docker_run.sh         # for GPU (faster Poisson Blending)

Please check the respective files, in order to make any changes.

Citation

If you use this code for scientific research, please consider citing the following two works.

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

The original work, including the code on which this repository is built. Thanks a lot to the authors for providing their code!

@InProceedings{Dwibedi_2017_ICCV,
author = {Dwibedi, Debidatta and Misra, Ishan and Hebert, Martial},
title = {Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}

Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics

Our work for which this repository was developed.

@inproceedings{naumannScrapeCutPasteLearn2022,
  title = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics},
  booktitle = {{{IEEE Conference}} on {{Machine Learning}} and Applications} ({{ICMLA}})},
  author = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai},
  date = {2022},
}

Affiliations

FZI Logo

About

Create copy/paste synthetic images for object detection and instance segmentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages