This is an official release of the paper Aligning Bag of Regions for Open-Vocabulary Object Detection.
Aligning Bag of Regions for Open-Vocabulary Object Detection,
Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy
In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper][Supp][project page(TBD)][Bibetex]
This project is based on MMDetection 3.x
It requires the following OpenMMLab packages:
- MMEngine >= 0.6.0
- MMCV-full >= v2.0.0rc4
- MMDetection >= v3.0.0rc6
- lvisapi
pip install openmim mmengine
mim install "mmcv>=2.0.0rc4"
pip install git+https://github.com/lvis-dataset/lvis-api.git
mim install mmdet>=3.0.0rc6
This project is released under the NTU S-Lab License 1.0.
We use CLIP's ViT-B-32 model for the implementation of our method. Obtain the state_dict
of the model from GoogleDrive and
put it under checkpoints
. Otherwise, pip install git+https://github.com/openai/CLIP.git
and
run
import clip
import torch
model, _ = clip.load("ViT-B/32")
torch.save(model.state_dict(), 'checkpoints/clip_vitb32.pth')
The training and testing on OV-COCO are supported now.
@inproceedings{wu2023baron,
title={Aligning Bag of Regions for Open-Vocabulary Object Detection},
author={Size Wu and Wenwei Zhang and Sheng Jin and Wentao Liu and Chen Change Loy},
year={2023},
booktitle={CVPR},
}