Recalling Unknowns without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection (TIP 2024)
Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include:
(1) Noisy labels caused by the class-agnostic nature of SAM;
(2) Precision degradation on known objects when more unknown objects are recalled.
For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact.
For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification.
Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (
- To the best of our knowledge, we are the first to propose exploiting the rich generic knowledge of large visual models (LVMs) to enhance OWOD. We demonstrate the feasibility of employing SAM for OWOD and establish the very first SAM-Guided OWOD baseline method.
- We identify and address three vital challenges in SAM-Guided OWOD, \textit{i.e.}, learning noisy labels from SAM by a dynamic label assignment (DLA) module, mitigating the optimization conflict between objectness and classification learning by a cross-layer learning (CLL) strategy, and preventing uncontrolled expansion of objectness semantics by a SAM-based negative sampling (SNS) module.
- Our proposed SGROD method significantly improves the recall of unknown objects while achieving robust performance on known object detection, which proves the feasibility and promise of leveraging LVMs to advance OWOD for handling open-world environments.
conda create --name sgrod python==3.10.4
conda activate sgrod
pip install -r requirements.txt
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
Download the self-supervised backbone (DINO) from here and add in models
folder.
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
SGROD/
└── data/
└── OWOD/
├── JPEGImages
├── Annotations
└── ImageSets
├── OWDETR
├── TOWOD
└── VOC2007
The splits are present inside data/OWOD/ImageSets/
folder.
- Download the COCO Images and Annotations from coco dataset into the
data/
directory. - Unzip train2017 and val2017 folder. The current directory structure should look like:
SGROD/
└── data/
└── coco/
├── annotations/
├── train2017/
└── val2017/
- Move all images from
train2017/
andval2017/
toJPEGImages
folder. - Use the code
./datasets/coco2voc.py
for converting json annotations to xml files. - Download the PASCAL VOC 2007 & 2012 Images and Annotations from pascal dataset into the
data/
directory. - untar the trainval 2007 and 2012 and test 2007 folders.
- Move all the images to
JPEGImages
folder and annotations toAnnotations
folder. - Download the pseudo labels of segment anything model (SAM) SAM from Annotations_segment into the
data/OWOD
directory. You can also generate the pseudo labels by movesegment-anything/generate_proposal.py
to SAM project and run it.
To train SGROD on a single node with 4 GPUS, run
bash ./run.sh
**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh
in each directory.
By editing the run.sh file, you can decide to run each one of the configurations defined in \configs
:
- EVAL_M_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the MOWOD Benchmark.
- EVAL_S_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.
- M_OWOD_BENCHMARK.sh - training for tasks 1-4 on the MOWOD Benchmark.
- S_OWOD_BENCHMARK.sh - training for tasks 1-4 on the SOWOD Benchmark.
For reproducing any of the aforementioned results, please download our pretrain weights and place them in the
'checkpoints' directory. Run the run_eval.sh
file to utilize multiple GPUs.
**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh
in each directory.
SGROD/
└── checkpoints/
├── MOWODB/
| └── t1 checkpoint0040.pth
└── t2_ft checkpoint0110.pth
└── t3_ft checkpoint0180.pth
└── t4_ft checkpoint0260.pth
└── SOWODB/
└── t1 checkpoint0040.pth
└── t2_ft checkpoint0120.pth
└── t3_ft checkpoint0200.pth
└── t4_ft checkpoint0300.pth
Note: For more training and evaluation details please check the PROB reposistory.
Should you have any question, please contact 📧 [email protected]
Acknowledgments:
SGROD builds on previous works' code base such as OW-DETR, Deformable DETR, PROB, SAM, and OWOD. If you found SGROD useful please consider citing these works as well.