Initial commit

Lyttonkeepfoing · Jan 6, 2022 · be47fef · be47fef
commit be47fef
Show file tree

Hide file tree

Showing 20 changed files with 3,158 additions and 0 deletions.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,80 @@
+# Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+
+This Code of Conduct also applies outside the project spaces when there is a
+reasonable belief that an individual's behavior may have a negative impact on
+the project or its community.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <[email protected]>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,31 @@
+# Contributing to mae
+We want to make contributing to this project as easy and transparent as
+possible.
+
+## Pull Requests
+We actively welcome your pull requests.
+
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+
+Complete your CLA here: <https://code.facebook.com/cla>
+
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+
+Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+disclosure of security bugs. In those cases, please go through the process
+outlined on that page and do not file a public issue.
+
+## License
+By contributing to mae, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.
diff --git a/FINETUNE.md b/FINETUNE.md
@@ -0,0 +1,133 @@
+## Fine-tuning Pre-trained MAE for Classification
+
+### Evaluation
+
+As a sanity check, run evaluation using our ImageNet **fine-tuned** models:
+
+<table><tbody>
+<!-- START TABLE -->
+<!-- TABLE HEADER -->
+<th valign="bottom"></th>
+<th valign="bottom">ViT-Base</th>
+<th valign="bottom">ViT-Large</th>
+<th valign="bottom">ViT-Huge</th>
+<!-- TABLE BODY -->
+<tr><td align="left">fine-tuned checkpoint</td>
+<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/finetune/mae_finetuned_vit_base.pth">download</a></td>
+<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/finetune/mae_finetuned_vit_large.pth">download</a></td>
+<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/finetune/mae_finetuned_vit_huge.pth">download</a></td>
+</tr>
+<tr><td align="left">md5</td>
+<td align="center"><tt>1b25e9</tt></td>
+<td align="center"><tt>51f550</tt></td>
+<td align="center"><tt>2541f2</tt></td>
+</tr>
+<tr><td align="left">reference ImageNet accuracy</td>
+<td align="center">83.664</td>
+<td align="center">85.952</td>
+<td align="center">86.928</td>
+</tr>
+</tbody></table>
+
+Evaluate ViT-Base in a single GPU (`${IMAGENET_DIR}` is a directory containing `{train, val}` sets of ImageNet):
+```
+python main_finetune.py --eval --resume mae_finetuned_vit_base.pth --model vit_base_patch16 --batch_size 16 --data_path ${IMAGENET_DIR}
+```
+This should give:
+```
+* Acc@1 83.664 Acc@5 96.530 loss 0.731
+```
+
+Evaluate ViT-Large:
+```
+python main_finetune.py --eval --resume mae_finetuned_vit_large.pth --model vit_large_patch16 --batch_size 16 --data_path ${IMAGENET_DIR}
+```
+This should give:
+```
+* Acc@1 85.952 Acc@5 97.570 loss 0.646
+```
+
+Evaluate ViT-Huge:
+```
+python main_finetune.py --eval --resume mae_finetuned_vit_huge.pth --model vit_huge_patch16 --batch_size 16 --data_path ${IMAGENET_DIR}
+```
+This should give:
+```
+* Acc@1 86.928 Acc@5 98.088 loss 0.584
+```
+
+### Fine-tuning
+
+Get our pre-trained checkpoints from [here](https://github.com/fairinternal/mae/#pre-trained-checkpoints).
+
+To fine-tune with **multi-node distributed training**, run the following on 4 nodes with 8 GPUs each:
+```
+python submitit_finetune.py \
+    --job_dir ${JOB_DIR} \
+    --nodes 4 \
+    --batch_size 32 \
+    --model vit_base_patch16 \
+    --finetune ${PRETRAIN_CHKPT} \
+    --epochs 100 \
+    --blr 5e-4 --layer_decay 0.65 \
+    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
+    --dist_eval --data_path ${IMAGENET_DIR}
+```
+- Install submitit (`pip install submitit`) first.
+- Here the effective batch size is 32 (`batch_size` per gpu) * 4 (`nodes`) * 8 (gpus per node) = 1024.
+- `blr` is the base learning rate. The actual `lr` is computed by the [linear scaling rule](https://arxiv.org/abs/1706.02677): `lr` = `blr` * effective batch size / 256.
+- We have run 4 trials with different random seeds. The resutls are 83.63, 83.66, 83.52, 83.46 (mean 83.57 and std 0.08).
+- Training time is ~7h11m in 32 V100 GPUs.
+
+Script for ViT-Large:
+```
+python submitit_finetune.py \
+    --job_dir ${JOB_DIR} \
+    --nodes 4 --use_volta32 \
+    --batch_size 32 \
+    --model vit_large_patch16 \
+    --finetune ${PRETRAIN_CHKPT} \
+    --epochs 50 \
+    --blr 1e-3 --layer_decay 0.75 \
+    --weight_decay 0.05 --drop_path 0.2 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
+    --dist_eval --data_path ${IMAGENET_DIR}
+```
+- We have run 4 trials with different random seeds. The resutls are 85.95, 85.87, 85.76, 85.88 (mean 85.87 and std 0.07).
+- Training time is ~8h52m in 32 V100 GPUs.
+
+Script for ViT-Huge:
+```
+python submitit_finetune.py \
+    --job_dir ${JOB_DIR} \
+    --nodes 8 --use_volta32 \
+    --batch_size 16 \
+    --model vit_huge_patch14 \
+    --finetune ${PRETRAIN_CHKPT} \
+    --epochs 50 \
+    --blr 1e-3 --layer_decay 0.75 \
+    --weight_decay 0.05 --drop_path 0.3 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
+    --dist_eval --data_path ${IMAGENET_DIR}
+```
+- Training time is ~13h9m in 64 V100 GPUs.
+
+To fine-tune our pre-trained ViT-Base with **single-node training**, run the following on 1 node with 8 GPUs:
+```
+OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
+    --accum_iter 4 \
+    --batch_size 32 \
+    --model vit_base_patch16 \
+    --finetune ${PRETRAIN_CHKPT} \
+    --epochs 100 \
+    --blr 5e-4 --layer_decay 0.65 \
+    --weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
+    --dist_eval --data_path ${IMAGENET_DIR}
+```
+- Here the effective batch size is 32 (`batch_size` per gpu) * 4 (`accum_iter`) * 8 (gpus) = 1024. `--accum_iter 4` simulates 4 nodes.
+
+### Notes
+
+- The [pre-trained models we provide](https://github.com/fairinternal/mae/#pre-trained-checkpoints) are trained with *normalized* pixels `--norm_pix_loss` (1600 epochs, Table 3 in paper). The fine-tuning hyper-parameters are slightly different from the default baseline using *unnormalized* pixels.
+
+- The original MAE implementation was in TensorFlow+TPU with no explicit mixed precision. This re-implementation is in PyTorch+GPU with automatic mixed precision (`torch.cuda.amp`). We have observed different numerical behavior between the two platforms. In this repo, we use `--global_pool` for fine-tuning; using `--cls_token` performs similarly, but there is a chance of producing NaN when fine-tuning ViT-Huge in GPUs. We did not observe this issue in TPUs. Turning off amp could solve this issue, but is slower.
+
+- Here we use RandErase following DeiT: `--reprob 0.25`. Its effect is smaller than random variance.