Recalling Unknowns without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection (TIP 2024)

Yulin He, Wei Chen, Siqi Wang, Tianrui Liu, and Meng Wan

Abstract

Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: (1) Noisy labels caused by the class-agnostic nature of SAM; (2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin ($\sim 20$%), but also preserves highly-competitive precision on known objects.

Overview

To the best of our knowledge, we are the first to propose exploiting the rich generic knowledge of large visual models (LVMs) to enhance OWOD. We demonstrate the feasibility of employing SAM for OWOD and establish the very first SAM-Guided OWOD baseline method.
We identify and address three vital challenges in SAM-Guided OWOD, \textit{i.e.}, learning noisy labels from SAM by a dynamic label assignment (DLA) module, mitigating the optimization conflict between objectness and classification learning by a cross-layer learning (CLL) strategy, and preventing uncontrolled expansion of objectness semantics by a SAM-based negative sampling (SNS) module.
Our proposed SGROD method significantly improves the recall of unknown objects while achieving robust performance on known object detection, which proves the feasibility and promise of leveraging LVMs to advance OWOD for handling open-world environments.

Installation

Requirements

conda create --name sgrod python==3.10.4
conda activate sgrod
pip install -r requirements.txt
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

Backbone features

Download the self-supervised backbone (DINO) from here and add in models folder.

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Data Structure

SGROD/
└── data/
    └── OWOD/
        ├── JPEGImages
        ├── Annotations
        └── ImageSets
            ├── OWDETR
            ├── TOWOD
            └── VOC2007

Dataset Preparation

The splits are present inside data/OWOD/ImageSets/ folder.

Download the COCO Images and Annotations from coco dataset into the data/ directory.
Unzip train2017 and val2017 folder. The current directory structure should look like:

SGROD/
└── data/
    └── coco/
        ├── annotations/
        ├── train2017/
        └── val2017/

Move all images from train2017/ and val2017/ to JPEGImages folder.
Use the code ./datasets/coco2voc.py for converting json annotations to xml files.
Download the PASCAL VOC 2007 & 2012 Images and Annotations from pascal dataset into the data/ directory.
untar the trainval 2007 and 2012 and test 2007 folders.
Move all the images to JPEGImages folder and annotations to Annotations folder.
Download the pseudo labels of segment anything model (SAM) SAM from Annotations_segment into the data/OWOD directory. You can also generate the pseudo labels by move segment-anything/generate_proposal.py to SAM project and run it.

Training

Training on single node

To train SGROD on a single node with 4 GPUS, run

bash ./run.sh

**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh in each directory.

By editing the run.sh file, you can decide to run each one of the configurations defined in \configs:

EVAL_M_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the MOWOD Benchmark.
EVAL_S_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.
M_OWOD_BENCHMARK.sh - training for tasks 1-4 on the MOWOD Benchmark.
S_OWOD_BENCHMARK.sh - training for tasks 1-4 on the SOWOD Benchmark.

Evaluation & Result Reproduction

For reproducing any of the aforementioned results, please download our pretrain weights and place them in the 'checkpoints' directory. Run the run_eval.sh file to utilize multiple GPUs.

**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh in each directory.

SGROD/
└── checkpoints/
    ├── MOWODB/
    |   └── t1 checkpoint0040.pth
        └── t2_ft checkpoint0110.pth
        └── t3_ft checkpoint0180.pth
        └── t4_ft checkpoint0260.pth
    └── SOWODB/
        └── t1 checkpoint0040.pth
        └── t2_ft checkpoint0120.pth
        └── t3_ft checkpoint0200.pth
        └── t4_ft checkpoint0300.pth

Note: For more training and evaluation details please check the PROB reposistory.

Contact

Should you have any question, please contact 📧 [email protected]

Acknowledgments:

SGROD builds on previous works' code base such as OW-DETR, Deformable DETR, PROB, SAM, and OWOD. If you found SGROD useful please consider citing these works as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recalling Unknowns without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection (TIP 2024)

Yulin He, Wei Chen, Siqi Wang, Tianrui Liu, and Meng Wan

Abstract

Overview

Installation

Requirements

Backbone features

Compiling CUDA operators

Data Structure

Dataset Preparation

Training

Training on single node

Evaluation & Result Reproduction

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
GCP_setup		GCP_setup
configs		configs
datasets		datasets
models		models
segment-anything		segment-anything
tools		tools
util		util
README.md		README.md
engine.py		engine.py
main_open_world.py		main_open_world.py
requirements.txt		requirements.txt
run.sh		run.sh
run_eval.sh		run_eval.sh

harrylin-hyl/SGROD

Folders and files

Latest commit

History

Repository files navigation

Recalling Unknowns without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection (TIP 2024)

Yulin He, Wei Chen, Siqi Wang, Tianrui Liu, and Meng Wan

Abstract

Overview

Installation

Requirements

Backbone features

Compiling CUDA operators

Data Structure

Dataset Preparation

Training

Training on single node

Evaluation & Result Reproduction

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages