SLAB

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo*, Xinghao Chen*, Yehui Tang, Yunhe Wang (*Equal Contribution)

ICML 2024

[arXiv] [BibTeX]

🔥 Updates

2024/08/23: Unofficial pretrained checkpoints for Llama-350M-PRepBN from huggingface.
2024/05/13: Pre-trained models and codes of SLAB are released both in Pytorch and Mindspore.

📸 Overview

This is an official pytorch implementation of our paper "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization". In this paper, we investigate the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. Layer normalization is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing Layernorm with more efficient batch normalization in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. During inference, the proposed PRepBN could be simply re-parameterized into a normal BatchNorm, thus could be fused with linear layers to reduce the latency. Moreover, we propose a simplified linear attention (SLA) module that is simply yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, powered by the proposed methods, our SLAB-Swin obtains 83.6% top-1 accuracy on ImageNet with 16.2ms latency, which is 2.4ms less than that of Flatten-Swin with 0.1 higher accuracy.

Figure 1: The framework of our proposed Progressive Re-parameterized BatchNorm.

Figure 2: Visualization of attention map for different methods.

Figure 3: Results of our method for classification and detection.

Figure 4: Results of our method for LLaMA-350M on various benchmarks.

1️⃣ Image Classification

Dependenices

- torch
- torchvision
- numpy
- einops
- timm==0.4.12
- opencv-python==4.4.0.46
- termcolor==1.1.0
- yacs==0.1.8
- apex

Training

Train models from scratch using the following command:

python -m torch.distributed.launch --nproc_per_node=8 main.py --cfg <config-path> --data-path <imagenet-path> --output <output-path>

Evaluation

Merge PRepBN for Swin Transformer: For a Swin-T model, we provide the implementation of PRepBN fusion. You can convert the whole model by simply calling merge_bn of the module. This is the recommended way. Examples are shown in eval.py.

for module in model.modules():
    if module.__class__.__name__ == 'SwinTransformerBlock':
        module.merge_bn()
    elif module.__class__.__name__ == 'PatchMerging':
        module.merge_bn()
    elif module.__class__.__name__ == 'PatchEmbed':
        module.merge_bn()
for module in model.modules():
    if module.__class__.__name__ == 'SwinTransformer':
        module.merge_bn()

We have also provide an example for the conversion.

python -m torch.distributed.launch --nproc_per_node=1 eval.py --cfg cfgs/swin_t_prepbn.yaml --batch-size 128 --data-path <imagenet-path>  --pretrained <pretrained-path>

Checkpoints

Model	Top1	config	checkpoints
deit_t_prepbn	73.6%	deit_t_prepbn.yaml	deit_tiny_prepbn.pth
deit_s_prepbn	80.2%	deit_s_prepbn.yaml	deit_small_prepbn.pth
slab_deit_t	74.3%	slab_deit_t.yaml	slab_deit_tiny.pth
slab_deit_s	80.0%	slab_deit_s.yaml	slab_deit_small.pth
pvt_t_prepbn	76.0%	pvt_t_prepbn.yaml	pvt_tiny_prepbn.pth
pvt_s_prepbn	80.1%	pvt_s_prepbn.yaml	pvt_small_prepbn.pth
pvt_m_prepbn	81.7%	pvt_m_prepbn.yaml	pvt_medium_prepbn.pth
slab_pvt_t	76.5%	slab_pvt_t.yaml	slab_pvt_tiny.pth
swin_t_prepbn	81.4%	swin_t_prepbn.yaml	swin_tiny_prepbn.pth
slab_swin_t	81.8%	slab_swin_t.yaml	slab_swin_tiny.pth
slab_swin_s	83.6%	slab_swin_s.yaml	slab_swin_small.pth
slab_cswin_t	82.8%	slab_cswin_t.yaml	slab_cswin_tiny.pth

2️⃣ Object Detection

Installation

pip install torch 
pip install torchvision

pip install timm==0.4.12
pip install einops
pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
pip install -U openmim
pip install mmcv-full==1.4.0
pip install mmdet==2.11.0

Install apex

Training

SLAB-Swin-T

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/swin/mask_rcnn_slab_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

SLAB-Swin-S

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/swin/mask_rcnn_slab_swin_small_patch4_window7_mstrain_480-800_adamw_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

Swin-T-RepBN

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/swin/mask_rcnn_swin_tiny_prepbn_patch4_window7_mstrain_480-800_adamw_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

Swin-S-RepBN

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/swin/mask_rcnn_swin_small_prepbn_patch4_window7_mstrain_480-800_adamw_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

PVT-T-RepBN

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/pvt/mask_rcnn_pvt_t_prepbn_fpn_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

PVT-S-RepBN

python -m torch.distributed.launch --nproc_per_node 8 --nnodes <world_size> --node_rank <rank> train.py configs/pvt/mask_rcnn_pvt_s_prepbn_fpn_1x_coco.py --work-dir <output_path> --launcher pytorch --init_method <init_method> --cfg-options model.pretrained=<pretrained_backbone_path>

Checkpoints

TBD

3️⃣ Language Task

Dependencies

- torch==1.13.1
- tensorboardX
- numpy
- rouge_score
- fire
- openai==0.27.6
- transformers==4.29.1
- datasets==2.17.0
- sentencepiece
- tokenizers==0.13.3
- deepspeed==0.8.3
- accelerate==0.27.2
- scikit-learn

Evaluation

Download the unofficial pretrained checkpoints for Llama-350M-PRepBN from huggingface.

python evaluation.py --ckpt <checkpoint-path>

✏️ Reference

If you find SLAB useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@inproceedings{guo2024slab,
  title={SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization},
  author={Guo, Jialong and Chen, Xinghao and Tang, Yehui  and Wang, Yunhe},
  booktitle={International Conference on Machine Learning},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
classification		classification
detection		detection
docs		docs
llama		llama
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLAB

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

🔥 Updates

📸 Overview

1️⃣ Image Classification

Dependenices

Training

Evaluation

Checkpoints

2️⃣ Object Detection

Installation

Training

Checkpoints

3️⃣ Language Task

Dependencies

Evaluation

✏️ Reference

About

Releases

Packages

Languages

feixuedudiao/SLAB

Folders and files

Latest commit

History

Repository files navigation

SLAB

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

🔥 Updates

📸 Overview

1️⃣ Image Classification

Dependenices

Training

Evaluation

Checkpoints

2️⃣ Object Detection

Installation

Training

Checkpoints

3️⃣ Language Task

Dependencies

Evaluation

✏️ Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages