Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LiewFeng authored Dec 2, 2022
0 parents commit e8d89b6
Showing 1 changed file with 38 additions and 0 deletions.
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## imTED

Code of [Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection](https://arxiv.org/abs/2205.09613).

The Code is based on [mmdetection](https://github.com/open-mmlab/mmdetection), please refer to [get_started.md](docs/en/get_started.md) and [MMDET_README.md](MMDET_README.md) to set up the environment and prepare the data.

## Config Files and Performance

We provide 9 configuration files in the configs directory.

| Config File | Backbone | Epochs | Box AP | Mask AP |
| :--------------------------------------------------------------------------------: | :---------: | :-------: | :---------: | :-------: |
| configs/imted/imted_faster_rcnn_vit_small_3x_coco.py | ViT-S | 36 | 48.2 | |
| configs/imted/imted_faster_rcnn_vit_base_3x_coco.py | ViT-B | 36 | 52.9 | |
| configs/imted/imted_faster_rcnn_vit_large_3x_coco.py | ViT-L | 36 | 55.4 | |
| configs/imted/imted_mask_rcnn_vit_small_3x_coco.py | ViT-S | 36 | 48.7 | 42.7 |
| configs/imted/imted_mask_rcnn_vit_base_3x_coco.py | ViT-B | 36 | 53.3 | 46.4 |
| configs/imted/imted_mask_rcnn_vit_large_3x_coco.py | ViT-L | 36 | 55.5 | 48.1 |
| configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py | ViT-B | 24 | 50.6 | |
| configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco.py | ViT-B | 108 | 22.5 | |
| configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_30shot_coco.py | ViT-B | 108 | 30.2 | |

## MAE pre-training

The pre-trained model is trained with the [official MAE code](https://github.com/facebookresearch/mae).
For ViT-S, we use a 4-layer decoder with dimension 256 for 800 epochs of pre-training.
For ViT-B, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the [official weight](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base_full.pth).
For ViT-L, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the[official weight](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large_full.pth).

For all experiments, remember to modify the path of pre-trained weights in the configuration files, e.g. configs/imted/imted_faster_rcnn_vit_small_3x_coco.py.

For few-shot experiments, please refer to [FsDet](https://github.com/ucbdrive/few-shot-object-detection/blob/master/datasets/README.md#:~:text=2%2C%20and%203.-,COCO%3A,-cocosplit/%0A%20%20datasplit/%0A%20%20%20%20trainvalno5k) for data preparation. Remember to modify the path of json in the configuration files, e.g. configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py.

## Training with 8 GPUs

```bash
tools/dist_train.sh "path/to/config/file.py" 8
```

0 comments on commit e8d89b6

Please sign in to comment.