My training and evaluating pipeline to extract trash objects from images
In this repo we are going to work on the TACO dataset
I used as a training framework Detectron2
You can find this model deployed in my personal website: Personal Website with TorchServe TSS
If you face some issues with inference, The model server is hardware demanding and I can only work with free tier / limited student credits.
- Training on 2 parts
- Validation strategy : Stratfied group kfold
- AP@50 : 27.257
- Training on 2 parts
- Validation strategy : Group kfold
- Heavy augs
- AP@50 : 32.242
- Although this pipeline score a lot, The other one seem to generalize better (Augs are more sensible)
- Group kfold is under-representing smaller classes
- The yaml config files experiment.yaml and detectron_config.yaml will help configurate the whole project
- Train an object detection model using TACO dataset
- Evaluate an object detection model on the train and valiation dataset
File/Folder | Description |
---|---|
configs | The project config |
configs/experiment.yaml | The general config |
configs/detectron_config.yaml | Detectron2 framework config |
data | The dataset repository |
notebooks | The notebooks I used for training |
src | The main module that is used for the training and evaluation |
train.py | Script to train the model |
eval.py | Script to evaluate a model |
requirements.txt | Python requirements |
install_req | Script to install python environment requirements |
File/Folder | Description |
---|---|
augment.py | The file containing the training and validation preprocess and augmentation (Albumentations) |
configs.py | Main functions that are related to the config (Loading , saving , ...) |
evaluator.py | The main function that help evaluate the model |
mapper.py | An extension of Detectron2 DatasetMapper in order to be able to use albumentations |
preprocess.py | Fix annotation problems and remove negative bboxes |
split.py | Define train/val split strategy (Validation Strategy) |
trainer.py | An extension of Detectron2 DefaultTrainer to add hooks (Custom checkpoints, ...) |
utils.py | Util functions |
- Create a python environment and install dependencies
virtualenv env
source env/bin/activate
./install_req
- Download the dataset and placed in data
you can find the dataset in:
I personally worked with the Kaggle dataset to be able to use kaggle API / train in kaggle notebooks
python train.py
python eval.py
or
python eval.py --model_path path
- You can change the experiment.yaml to suit your need (I will put my best config).
- You can change the detectron_config.yaml, but beware, some parameters in experiment.yaml overwrite some parameters in detectron_config.yaml. Read both of them carefully.
- train.py doesn't take arguments
- eval.py take optional argument model_path. If not specified, model_path will default to models/best_model.pth
- Validation strategy used in the training and evaluation : Stratified group kfold:
- Each set contains approximately the same percentage of samples of each target class as the complete set.
- The same group is not represented in both testing and training sets.
- Thank you for your patience
TACO dataset
@article{taco2020,
title={TACO: Trash Annotations in Context for Litter Detection},
author={Pedro F Proença and Pedro Simões},
journal={arXiv preprint arXiv:2003.06975},
year={2020}
}
Detectron2
@misc{wu2019detectron2,
author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and
Wan-Yen Lo and Ross Girshick},
title = {Detectron2},
howpublished = {\url{https://github.com/facebookresearch/detectron2}},
year = {2019}
}