Skip to content

Commit

Permalink
Description of run.sh Args
Browse files Browse the repository at this point in the history
  • Loading branch information
shallowtoil committed Jan 17, 2022
1 parent 862fac7 commit 564f363
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 4 deletions.
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,26 @@ iBOT is a novel self-supervised pre-training framework that performs masked imag

See [installation structions](https://github.com/bytedance/ibot/blob/main/INSTALL.md) for details.

## One-Line Command by Using `run.sh`

We provide `run.sh` you can complete the pre-training + fine-tuning experiment cycle in a one-line command.

### Arguments

- `TYPE` is named by the rule of `dataset+task`. For example, pre-training on ImageNet-1K has a `TYPE` of imagenet_pretrain and linear probing evalution in ImageNet-1K has a `TYPE` of imagenet_linear.
- `JOB_NAME` is customized job name to distinguish from different groups of experiments.
- `ARCH` is the architecture of the pre-trained models.
- `KEY` chooses which pre-trained model to be evaluated and can be set as either `teacher` (generally better) or `student` for one model. It can also be set as `teacher,student` and the script will distribute the evaluation of the two models to 2 out of all nodes.
- `GPUS` is total GPUs needed for the evaluation. If the amount required `GPUS` exceed that of `MAX_GPUS` (GPUs for each node). `GPUS` should be able to split into `GPUS_PER_NODE x TOTAL_NODES`.
- Other additional arguments can directly appended after these required ones. For example, `--lr 0.001`.


For example, the following commands will automatically evaluate the models on K-NN and linear probing benchmark after the pre-training with `student` and `teacher` model distributed across 2 nodes.
```
TOTAL_NODES=2 NODE_ID=0 ./run.sh imagenet_pretrain+imagenet_knn+imagenet_linear vit_small student,teacher 16 // the first node
TOTAL_NODES=2 NODE_ID=1 ./run.sh imagenet_pretrain+imagenet_knn+imagenet_linear vit_small student,teacher 16 // the second node
```

## Training

For a glimpse at the full documentation of iBOT pre-training, please run:
Expand Down Expand Up @@ -177,7 +197,7 @@ You can choose to download only the weights of the pre-trained `backbone` used f
<td>307M</td>
<td>Rand</td>
<td>77.7%</td>
<td>81.2%</td>
<td>81.3%</td>
<td>85.0%</td>
<td><a href="https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/checkpoint_teacher.pth">backbone (t)</a></td>
<td><a href="https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/checkpoint.pth">full ckpt</a></td>
Expand Down
3 changes: 3 additions & 0 deletions evaluation/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Evaluating iBOT on Downstream Tasks

### Arguments
- `KEY` chooses which pre-trained model to be evaluated and can be set as either `teacher` (generally better) or `student` for one model.

### k-NN Classification & Logistic Regression on ImageNet
To evaluate k-NN classification or logistic regression on the frozen features, run:
```
Expand Down
4 changes: 2 additions & 2 deletions main_ibot.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,9 +355,9 @@ def train_ibot(args):
}
if fp16_scaler is not None:
save_dict['fp16_scaler'] = fp16_scaler.state_dict()
utils.save_on_master(save_dict, os.path.join(args.output_dir, 'checkpoint.pth'))
torch.save(save_dict, os.path.join(args.output_dir, 'checkpoint.pth'))
if args.saveckp_freq and (epoch % args.saveckp_freq == 0) and epoch:
utils.save_on_master(save_dict, os.path.join(args.output_dir, f'checkpoint{epoch:04}.pth'))
torch.save(save_dict, os.path.join(args.output_dir, f'checkpoint{epoch:04}.pth'))
log_stats = {**{f'train_{k}': v for k, v in train_stats.items()},
'epoch': epoch}
if utils.is_main_process():
Expand Down
4 changes: 3 additions & 1 deletion run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,9 @@ if [[ $TYPE =~ imagenet_knn ]] || [[ $TYPE =~ imagenet_reg ]] || \
WEIGHT_FILE=$SUB_OUTPUT_DIR/checkpoint_${KEY_LIST[$K]}.pth
python3 $CURDIR/evaluation/classification_layer_decay/extract_backbone_weights.py \
$PRETRAINED $WEIGHT_FILE --checkpoint_key ${KEY_LIST[$K]}
python3 -m torch.distributed.launch --nproc_per_node=$GPUS_PER_NODE \
python3 -m torch.distributed.launch --nnodes ${TOTAL_NODES:-1} \
--node_rank ${NODE_ID:-0} --nproc_per_node=$GPUS_PER_NODE \
--master_addr=${MASTER_ADDR:-127.0.0.1} \
--master_port=$[${MASTER_PORT:-29500}-$K] \
$CURDIR/evaluation/classification_layer_decay/run_class_finetuning.py \
--finetune $WEIGHT_FILE \
Expand Down

0 comments on commit 564f363

Please sign in to comment.