Name	Name	Last commit message	Last commit date
Latest commit History 27 Commits
datasets	datasets
models	models
util	util
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
engine.py	engine.py
main.py	main.py
requirements.txt	requirements.txt
yolos.png	yolos.png

You Only 👀 One Sequence

TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark.
Code and model weights will be released soon, please stay tuned :)

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

by Yuxin Fang¹ *, Bencheng Liao¹ *, Xinggang Wang^{1 📧}, Jiemin Fang^{2, 1}, Jiyang Qi¹, Rui Wu³, Jianwei Niu³, Wenyu Liu¹.

¹ School of EIC, HUST, ² Institute of AI, HUST, ³ Horizon Robotics.

(*) equal contribution, (^📧) corresponding author.

arXiv technical report (arXiv yolos.yolos)

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

Highlights

Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. Concretely, our main contributions are summarized as follows:

We use the mid-sized ImageNet-1k as the sole pre-training dataset, and show that a vanilla ViT (DeiT) can be successfully transferred to perform the challenging object detection task and produce competitive COCO results with the fewest possible modifications, i.e., by only looking at one sequence (YOLOS).
We demonstrate that 2D object detection can be accomplished in a pure sequence-to-sequence manner by taking a sequence of fixed-sized non-overlapping image patches as input. Among existing object detectors, YOLOS utilizes minimal 2D inductive biases. Moreover, it is feasible for YOLOS to perform object detection in any dimensional space unaware the exact spatial structure or geometry.
For ViT (DeiT), we find the object detection results are quite sensitive to the pre-train scheme and the detection performance is far from saturating. Therefore the proposed YOLOS can be used as a challenging benchmark task to evaluate different pre-training strategies for ViT (DeiT).
We also discuss the impacts as wel as the limitations of prevalent pre-train schemes and model scaling strategies for Transformer in vision through transferring to object detection.

Results

Model	Pre-train Epochs	Backbone Weight / log	Fine-tune Epochs	Eval Size	YOLOS Checkpoint / log	AP
YOLOS-Ti	300	Deit-Ti	300	512	yolos_ti.pth / log	28.7
YOLOS-S	200	Deit-S	150	800	yolos_s_200_pre.pth	36.1
YOLOS-S	300	Deit-S	150	800	yolos_s_300_pre.pth / log	36.1
YOLOS-S(dWr)	300	Deit-S(dWr) / log	150	800	yolos_s_dWr.pth / log	37.6
YOLOS-B	1000	Deit-B (:alembic:)	150	800	yolos_base.pth / log	42.0

Notes:

Access code for pan.baidu.com is yolo, we will

Requirement

This codebase has been developed with python version 3.6, PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

Before finetuning on COCO, you need download the ImageNet pretrained model to the /path/to/YOLOS/ directory

To train the YOLOS-Ti model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 2 \
    --lr 5e-5 \
    --epochs 300 \
    --backbone_name tiny \
    --pre_trained /path/to/deit-tiny.pth\
    --eval_size 512 \
    --init_pe_size 800 1333 \
    --output_dir /output/path/box_model

To train the YOLOS-S model with 200 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --lr 2.5e-5 --epochs 150 --backbone_name small --pre_trained /path/to/deit-small-200epoch.pth --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --output_dir /output/path/box_model

To train the YOLOS-S model with 300 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 1 \ --lr 2.5e-5 \ --epochs 150 \ --backbone_name small \ --pre_trained /path/to/deit-small-300epoch.pth\ --eval_size 800 \ --init_pe_size 512 864 \ --mid_pe_size 512 864 \ --output_dir /output/path/box_model

To train the YOLOS-S(dWr) model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name small_dWr \
    --pre_trained /path/to/deit-small-dWr-scale.pth\
    --eval_size 800 \
    --init_pe_size 512 864 \
    --mid_pe_size 512 864 \
    --output_dir /output/path/box_model

To train the YOLOS-B model in the paper, run this command:


python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name base \
    --pre_trained /path/to/deit-base.pth\
    --eval_size 800 \
    --init_pe_size 800 1344 \
    --mid_pe_size 800 1344 \
    --output_dir /output/path/box_model

Evaluation

To evaluate YOLOS-Ti model on coco, run:

python main.py --coco_path /path/to/coco --batch_size 2 --backbone_name tiny --eval --eval_size 512 --init_pe_size 800 1333 --resume /path/to/YOLOS-Ti

To evaluate YOLOS-S model on coco, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S

To evaluate YOLOS-S(dWr) model on coco, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small_dWr --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S(dWr)

To evaluate YOLOS-B model on coco, run:

python main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume /path/to/YOLOS-B

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{YOLOS,
  title={You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection},
  author={All YOLOS Authors},
  journal={arXiv preprint arXiv:yolos.yolos},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You Only 👀 One Sequence

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

Highlights

Results

Requirement

Data preparation

Training

Evaluation

Citation

About

Releases

Packages

Languages

License

yawudede/YOLOS

Folders and files

Latest commit

History

Repository files navigation

You Only 👀 One Sequence

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

Highlights

Results

Requirement

Data preparation

Training

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages