In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
In-Place Activated BatchNorm (InPlace-ABN) is a novel approach to reduce the memory required for training deep networks. It allows for up to 50% memory savings in modern architectures such as ResNet, ResNeXt and Wider ResNet by redefining BN + non linear activation as a single in-place operation, while smartly dropping or recomputing intermediate buffers as needed.
This repository contains a PyTorch implementation of the InPlace-ABN layer, as well as some training scripts to reproduce the ImageNet classification results reported in our paper.
We have now also released the inference code for semantic segmentation, together with the Mapillary Vistas trained model leading to #1 position on the Mapillary Vistas Semantic Segmentation leaderboard. More information can be found at the bottom of this page.
Update 08 Jan. 2019:
- Enabled multiprocessing and inplace ABN syncronization over multiple processes (previously using threads)
- Added compatibility with fp16
We have modified the imagenet training code and BN syncronization in order to work with multiple processes. We have also added compatibility of our Inplace ABN module with fp16.
If you use In-Place Activated BatchNorm in your research, please cite:
@inproceedings{rotabulo2017place,
title={In-Place Activated BatchNorm for Memory-Optimized Training of DNNs},
author={Rota Bul\`o, Samuel and Porzi, Lorenzo and Kontschieder, Peter},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
When processing a BN-Activation-Convolution sequence in the forward pass, most deep learning frameworks need to store
two big buffers, i.e. the input x
of BN and the input z
of Conv.
This is necessary because the standard implementations of the backward passes of BN and Conv depend on their inputs to
calculate the gradients.
Using Inplace-ABN to replace the BN-Activation sequence, we can safely discard x
, thus saving up to 50% GPU memory at
training time.
To achieve this, we rewrite the backward pass of BN in terms of its output y
, which is in turn reconstructed from z
by inverting the activation function.
To install PyTorch, please refer to https://github.com/pytorch/pytorch#installation.
NOTE: our code requires PyTorch v1.0.
To install all dependencies using pip, just run:
pip install -r requirements.txt
Some parts of InPlace-ABN have native CUDA implementations, which are compiled using Pytorch v1.0's newly introduced
extension mechanism, which requires a package called ninja
.
This can easy be installed from most distributions' package managers, e.g. in Ubuntu derivatives:
sudo apt-get install ninja-build
In case PyTorch is installed via conda, ninja
will be automatically installed too.
Here you can find the results from our arXiv paper (top-1 / top-5 scores) with corresponding, trained models and md5 checksums, respectively. The model files provided below are made available under the license attached to ImageNet.
Network | Batch | 224 | 224, 10-crops | 320 | Trained models (+md5) |
---|---|---|---|---|---|
ResNeXt101, Std-BN | 256 | 77.04 / 93.50 | 78.72 / 94.47 | 77.92 / 94.28 | 448438885986d14db5e870b95f814f91 |
ResNeXt101, InPlace-ABN | 512 | 78.08 / 93.79 | 79.52 / 94.66 | 79.38 / 94.67 | 3b7a221cbc076410eb12c8dd361b7e4e |
ResNeXt152, InPlace-ABN | 256 | 78.28 / 94.04 | 79.73 / 94.82 | 79.56 / 94.67 | 2c8d572587961ed74611d534c5b2e9ce |
WideResNet38, InPlace-ABN | 256 | 79.72 / 94.78 | 81.03 / 95.43 | 80.69 / 95.27 | 1c085ab70b789cc1d6c1594f7a761007 |
ResNeXt101, InPlace-ABN sync | 256 | 77.70 / 93.78 | 79.18 / 94.60 | 78.98 / 94.56 | 0a85a21847b15e5a242e17bf3b753849 |
DenseNet264, InPlace-ABN | 256 | 78.57 / 94.17 | 79.72 / 94.93 | 79.49 / 94.89 | 0b413d67b725619441d0646d663865bf |
Our script uses torchvision.datasets.ImageFolder for loading ImageNet data, which expects folders organized as follows:
root/train/[class_id1]/xxx.{jpg,png,jpeg}
root/train/[class_id1]/xxy.{jpg,png,jpeg}
root/train/[class_id2]/xxz.{jpg,png,jpeg}
...
root/val/[class_id1]/asdas.{jpg,png,jpeg}
root/val/[class_id1]/123456.{jpg,png,jpeg}
root/val/[class_id2]/__32_.{jpg,png,jpeg}
...
Images can have any name, as long as the extension is that of a recognized image format. Class ids are also free-form, but they are expected to match between train and validation data. Note that the training data in the standard ImageNet distribution is already given in the required format, while validation images need to be split into class sub-folders as described above.
The main training script is train_imagenet.py
: this supports training on ImageNet, or any other dataset formatted
as described above, while keeping a log of relevant metrics in Tensorboard format and periodically saving snapshots.
Most training parameters can be specified as a json
-formatted configuration file (look here
for a complete list of configurable parameters).
All parameters not explicitly specified in the configuration file are set to their defaults, also available in
imagenet/config.py.
Our arXiv results can be reproduced by running train_imagenet.py
with the configuration files in ./experiments
.
As an example, the command to train ResNeXt101
with InPlace-ABN, Leaky ReLU and batch_size = 512
is:
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> train_imagenet.py --log-dir /path/to/tensorboard/logs experiments/resnext101_ipabn_lr_512.json /path/to/imagenet/root
Validation is run by train_imagenet.py
at the end of every training epoch.
To validate a trained model, you can use the test_imagenet.py
script, which allows for 10-crops validation and
transferring weights across compatible networks (e.g. from ResNeXt101
with ReLU to ResNeXt101
with Leaky
ReLU).
This script accepts the same configuration files as train_imagenet.py
, but note that the scale_val
and crop_val
parameters are ignored in favour of the --scale
and --crop
command-line arguments.
As an example, to validate the ResNeXt101
trained above using 10-crops of size 224
from images scaled to 256
pixels, you can run:
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> test_imagenet.py --crop 224 --scale 256 --ten_crops experiments/resnext101_ipabn_lr_512.json /path/to/checkpoint /path/to/imagenet/root
We have successfully used InPlace-ABN with a DeepLab3 segmentation head that was trained on top of the WideResNet38 model above. Due to InPlace-ABN, we can significantly increase the amount of input data to this model, which eventually allowed us to obtain #1 positions on Cityscapes, Mapillary Vistas, AutoNUE, Kitti and ScanNet segmentation leaderboards. The training settings mostly follow the description in our paper.
We release our WideResNet38 + DeepLab3 segmentation model trained on the Mapillary Vistas research set. This is the model used to reach #1 position on the MVD semantic segmentation leaderboard. The segmentation model file provided below is made available under a CC BY-NC-SA 4.0 license.
Network | mIOU | Trained model (+md5) |
---|---|---|
WideResNet38 + DeepLab3 | 53.42 | 913f78486a34aa1577a7cd295e8a33bb |
To use this, please download the .pth.tar
model file linked above and run the test_vistas.py
script as follows:
python test_vistas.py /path/to/model.pth.tar /path/to/input/folder /path/to/output/folder
The script will process all .png
, .jpg
and .jpeg
images from the input folder and write the predictions in the
output folder as .png
images.
For additional options, e.g. test time augmentation, please consult the script's help message.
The results on the test data written above were obtained by employing only scale 1.0 + flipping.