Renhao Wang*, Yu Sun*, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
See installation instructions.
We release COCO-Videos, a new dataset for instance and panoptic segmentation which follows the COCO labeling format. We also rely on semantic-level labels in the KITTI-STEP dataset for evaluation on semantic segmentation.
All datasets can be downloaded here and should subsequently be unzipped to the path specified under the $DETECTRON2_DATASETS
environment variable (see installation instructions).
Relevant pretrained checkpoints can be obtained here. These should be downloaded and stored at some /path/to/checkpoints
.
To evaluate a pretrained Mask2Former-S on COCO-Videos for panoptic segmentation:
python runner_coco_videos_baseline.py --gpu 0 \
--videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
--batch_size 8 \
--weights /path/to/checkpoints/ttt_coco_panoptic_baseline.pkl \
--output_dir coco_vid_panoptic_baseline \
--eval_type pano \
--num_imgs 4000
You can pass --eval_type inst
to obtain the baseline instance numbers (as well as the corresponding pretrained instance segmentation checkpoint). Results will be logged under the directory specified in the --output_dir
flag,
Runner script for instance segmentation:
python runner_ttt_mae_inst.py --gpu 0 \
--videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
--batch_size 32 \
--accum_iter 8 \
--base_lr 0.0001 \
--weights /path/to/checkpoints/ttt_coco_instance_baseline.pkl \
--restart_optimizer
Runner script for panoptic segmentation:
python runner_ttt_mae_panoptic.py --gpu 0 \
--videos bangkok bar berkeley havana house irvine paris restaurant school tokyo \
--batch_size 32 \
--accum_iter 8 \
--base_lr 0.0001 \
--weights /path/to/checkpoints/ttt_coco_panoptic_baseline.pkl \
--restart_optimizer
For easy collation of numbers, we provide a utility script which can, for example, be called as python mask2former/utils/tabulate_results_cv.py --root_dir exp_dir/mae_coco_inst_32_0.0001
.
Runner script:
python runner_ttt_mae.py --gpu 0 \
--videos 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 \
--batch_size 32 \
--accum_iter 4 \
--base_lrs 0.0001 \
--weights /path/to/checkpoints/ttt_ks_semantic_baseline.pkl \
--restart_optimizer
For easy collation of numbers, we provide a utility script which can, for example, be called as python mask2former/utils/tabulate_results.py --root_dir exp_dir/mae_ks_sema_32_0.0001
.
This codebase inherits all licenses from the public release of Mask2Former.
@article{wang2023test,
title={Test-time training on video streams},
author={Wang, Renhao and Sun, Yu and Gandelsman, Yossi and Chen, Xinlei and Efros, Alexei A and Wang, Xiaolong},
journal={arXiv preprint arXiv:2307.05014},
year={2023}
}
Code is based on Mask2Former.