Tricks, corresponding results, experimental settings, and running commands

The file contains the results, experimental settings, and running commands of different tricks. These tricks are divided into four families, which are re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.
For any problem, such as bugs, feel free to open an issue.
Click each method to get the experimental setting configs and running commands.

Re-weighting

Strictly speaking, the LDAM loss, CrossEntropyLabelSmooth, CDT, and SEQL do not belong to re-weighting methods, but both of them consider the long-tailed distribution when calculate the losses, and they can be combined with re-weighting in DRW. So we add them to re-weighting family.
The methods of re-weighting are realized in loss.

Datasets	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
Baseline CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
CE_CE Introduction: The most commonly used re-weighting method, you can see Eq. (2) in our paper for more details. CONFIG: configs/cao_cifar/re_weighting/csce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	31.70	23.20	67.73	63.49
Square CS_CE Introduction: This is a smooth version of CE_CE (smooth CS_CE), which add a hyper-parameter $ \gamma$ to vanilla CS_CE. In smooth CS_CE, the loss weight of class i is defined as: $(\frac{N_{min}}{N_i})^\gamma$, where $\gamma \in [0, 1]$, $N_i$ is the number of images in class i. We set $\gamma = 0.5$ to get a square-root version of CS_CE (Square CE_CE). CONFIG: configs/cao_cifar/re_weighting/csce/{cifar10_im100_square.yaml, cifar10_im50_square.yaml, cifar100_im100_square.yaml, cifar100_im50_square.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	31.70	22.22	61.64	57.23
Focal loss Introduction: Focal loss makes the model focus training on difficult samples, and you can see Eq. (4) in our paper for more details. The Focal loss paper link: Lin et al., ICCV 2017. CONFIG: configs/cao_cifar/re_weighting/focal/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.44	22.09	62.78	58.21
ClassBalanceFocal Introduction: The modified version of Focal loss, which is based on the theory of effective numbers, and you can see Eq. (5) in our paper for more details. The ClassBalanceFocal paper link: Cui et al., CVPR 2019. CONFIG: configs/cao_cifar/re_weighting/cbfocal/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	24.80	21.01	61.44	57.63
ClassBalanceCE Introduction: The modified version of cross-entropy loss, which is based on the theory of effective numbers, and you can see Eq. (6) in our paper for more details. The ClassBalanceCE paper link: Cui et al., CVPR 2019. CONFIG: configs/cao_cifar/re_weighting/cbce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.52	22.52	61.03	56.22
CrossEntropyLabelSmooth Introduction: The commonly used regularization trick, label smoothing, based on cross-entropy loss. The CrossEntropyLabelSmooth paper link: Szegedy et al., CVPR 2016. CONFIG: configs/cao_cifar/re_weighting/cels/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	27.19	23.43	61.56	57.66
CrossEntropyLabelAwareSmooth Introduction: The modified regularization trick, label-aware smoothing, which is based on label smoothing. It assigns different smoothing factors for each class according to the number of training images it contains. The CrossEntropyLabelAwareSmooth paper link: Zhong et al., CVPR 2021. CONFIG: configs/cao_cifar/re_weighting/celas/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	27.49	22.04	62.32	56.22
LDAM loss Introduction: LDAM loss is one of metric learning methods, which aims to assign different margins to different class. The LDAM loss paper link: Cao et al., NeurIPS 2019. CONFIG: configs/cao_cifar/re_weighting/ldam/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	26.34	20.99	61.12	56.41
SEQL Introduction: The softmax equalization loss (SEQL) aims to reduce the gradients of tail classes' negative samples. The author argues that the imbalance of gradients in tail classes' positive and negtive samples causes bad influences. The SEQL paper link: Tan et al., CVPR 2020. CONFIG: configs/cao_cifar/re_weighting/seql/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	--	--	59.51	55.19
CDT Introduction: The authors find that a model significantly over-fits the tail classes, and they argue that feature deviation between the training and test samples causes this problem. So they propose class-dependent temperatures (CDT). The CDT paper link: Ye et al., arXiv 2020. CONFIG: configs/cao_cifar/re_weighting/cdt/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	22.90	18.19	60.41	55.17
BalancedSoftmaxCE Introduction: A simple and effective re-weighting method, and you can see Eq. (4) in the author paper. The BalancedSoftmaxCE paper link: Ren et al., NeurIPS 2020. CONFIG: configs/cao_cifar/re_weighting/bsce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	22.46	18.89	57.01	53.45

Re-sampling

The methods of re-sampling are realized in dataset.

Datasets	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
Baseline CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
Class-balanced sampling Introduction: Class-balanced sampling makes each class to have an equal probability of being selected, and you can see the section `Re-sampling` in our paper for more details. The class-balanced sampling paper link: Kang et al., ICLR 2020. CONFIG: configs/cao_cifar/re_sampling/balance/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	27.55	21.92	64.76	60.54
Square-root sampling Introduction: Square-root sampling aims to return a lighter imbalanced dataset., and you can see the section `Re-sampling` in our paper for more details. The square-root sampling paper link: Kang et al., ICLR 2020. CONFIG: configs/cao_cifar/re_sampling/square/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.58	22.32	63.21	58.87
Progressively-balanced sampling Introduction: Progressively-balanced sampling changes the sampling probabilities of classes from random sampling to class-balanced sampling., and you can see the section `Re-sampling` in our paper for more details. The progressively-balanced sampling paper link: Kang et al., ICLR 2020. CONFIG: configs/cao_cifar/re_sampling/progressive/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.02	21.43	60.96	56.88
BBN-style sampling Introduction: We combine the sampling method of BBN, which consists of a uniform sampler and a reverse sampler, with input mixup. For more details about these two samplers, you can read the original paper. The progressively-balanced sampling paper link: Zhou et al., CVPR 2020. CONFIG: configs/cao_cifar/re_sampling/bbn-style/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.38	21.89	62.94	57.97

Mixup training

The methods of mixup training are realized in combiner.py.

Datasets	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
Baseline CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
Input mixup Introduction: In input mixup, each new example is formed with two randomly sampled example by a weighted linear interpolation, and we only use the new example to train the network. You can see the section `Mixup training` in our paper for more details. The mixup paper link: Zhang et al., ICLR 2018. CONFIG: configs/cao_cifar/mixup/input_mixup/{cifar10_im100_im_alpha10.yaml, cifar10_im50_im_alpha10.yaml, cifar100_im100_im_alpha10.yaml, cifar100_im50_im_alpha10.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	25.94	21.33	59.18	54.08
Manifold mixup Introduction: Manifold mixup encourages neural networks to predict less confidently on interpolations of hidden representations. We apply manifold mixup on only one layer in our experiments. You can see the section `Mixup training` in our paper for more details. The manifold mixup paper link: Verma et al., ICML 2019. CONFIG: configs/cao_cifar/mixup/manifold_mixup/{cifar10_im100_mm_alpha10.yaml, cifar10_im50_mm_alpha10.yaml, cifar100_im100_mm_alpha10.yaml, cifar100_im50_mm_alpha10.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	24.81	20.42	60.12	54.76
Remix Introduction: Remix assigns the label in favor of the minority class by providing a disproportionately higher weight to the minority class. The remix paper link: Chou et al., ECCV 2020 workshop. CONFIG: configs/cao_cifar/mixup/remix/{cifar10_im100_remix_alpha10.yaml, cifar10_im50_remix_alpha10.yaml, cifar100_remix100_im_alpha10.yaml, cifar100_im50_remix_alpha10.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	26.57	20.74	58.61	54.30

Two-stage training

DRW

The methods of DRW are realized in loss.

First Stage	Second Stage	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
CE	CE CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
CE	CE_CE CONFIG: configs/cao_cifar/two_stage/drw/csce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	25.18	20.18	58.38	53.20
CE	Focal loss CONFIG: configs/cao_cifar/two_stage/drw/focal/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.85	20.68	62.47	56.39
CE	ClassBalanceFocal CONFIG: configs/cao_cifar/two_stage/drw/cbfocal/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	24.57	18.62	61.94	55.01
CE	ClassBalanceCE CONFIG: configs/cao_cifar/two_stage/drw/cbce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	25.36	20.65	60.79	56.63
CE	CrossEntropyLabelSmooth CONFIG: configs/cao_cifar/two_stage/drw/cels/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	28.39	22.71	61.10	57.16
CE	CrossEntropyLabelAwareSmooth CONFIG: configs/cao_cifar/two_stage/drw/celas/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	27.88	22.27	62.42	57.25
CE	LDAM loss CONFIG: configs/cao_cifar/two_stage/drw/ldam/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	22.27	18.40	57.53	52.71
CE	CDT CONFIG: configs/cao_cifar/two_stage/drw/cdt/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	22.45	18.73	57.78	53.20
CE	BalancedSoftmaxCE CONFIG: configs/cao_cifar/two_stage/drw/bsce/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	22.83	19.16	58.18	53.51
CE	InfluenceBalancedLoss CONFIG: configs/cao_cifar/two_stage/drw/ibloss/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	23.34	19.08	59.21	54.54

DRS

The DRS is realized in dataset.

First Stage	Second Stage	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
CE	Vanilla sampling CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
CE	Square-root sampling CONFIG: configs/cao_cifar/two_stage/drs/squre/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	26.78	21.15	59.64	55.46
CE	Progressively-balanced sampling CONFIG: configs/cao_cifar/two_stage/drs/progressive/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	26.15	19.47	59.59	54.79
CE	Class-balanced sampling CONFIG: configs/cao_cifar/two_stage/drs/balance/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	24.93	19.27	59.36	54.58
CE	CAM-based square-sampling CONFIG: FIRST-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/first_stage/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} CAM-GENERATION-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/cam_generation/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} SECOND-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/second_stage/square/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: You have three steps. You should run this codebase with the configs in the first stage, CAM generation, and the second stage step by step. bash data_parallel_train.sh FIRST-STAGE-CONFIG GPU bash data_parallel_train.sh CAM-GENERATION-CONFIG GPU bash data_parallel_train.sh SECOND-STAGE-CONFIG GPU	26.45	20.46	59.33	54.58
CE	CAM-based progressive-sampling CONFIG: FIRST-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/first_stage/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} CAM-GENERATION-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/cam_generation/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} SECOND-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/second_stage/progressive/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: You have three steps. You should run this codebase with the configs in the first stage, CAM generation, and the second stage step by step. bash data_parallel_train.sh FIRST-STAGE-CONFIG GPU bash data_parallel_train.sh CAM-GENERATION-CONFIG GPU bash data_parallel_train.sh SECOND-STAGE-CONFIG GPU	27.08	20.76	58.92	53.90
CE	CAM-based balance-sampling CONFIG: FIRST-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/first_stage/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} CAM-GENERATION-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/cam_generation/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} SECOND-STAGE-CONFIG: configs/cao_cifar/two_stage/drs/cam_based_sampling/second_stage/balance/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: You have three steps. You should run this codebase with the configs in the first stage, CAM generation, and the second stage step by step. bash data_parallel_train.sh FIRST-STAGE-CONFIG GPU bash data_parallel_train.sh CAM-GENERATION-CONFIG GPU bash data_parallel_train.sh SECOND-STAGE-CONFIG GPU	23.10	19.28	58.05	53.27

Classifier-balancing

Classifier-balancing shows another way to balance the backbone and classifier. Unlike DRS and DRW, it trains a backbone firstly and then freezes the backbone to re-balance the classifier.
The classifier-balancing methods introduced in Kang et al., ICLR 2020, includes tau_normalization, LWS and cRT. You can see the Section 4 in the paper for details of these methods.
tau_normalization, LWS, and cRT are realized in network.py

Datasets	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
Baseline CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
Tau_normalization Introduction: The tau_normalization paper link: Kang et al., ICLR 2020. When using tau_normalization, you should have a trained model firstly and then change the `TEST.MODEL_FILE` to your own path. CONFIG: configs/cao_cifar/two_stage/classifier_balance/tau_norm/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: python main/valid.py --cfg CONFIG --gpus GPU	27.30	20.22	59.04	54.31
cRT (Classifier Re-training) Introduction: The cRT paper link: Kang et al., ICLR 2020. When using cRT, you should have a trained model firstly and then change the `NETWORK.PRETRAINED_MODEL` to your own path. CONFIG: configs/cao_cifar/two_stage/classifier_balance/cRT/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	25.01	19.83	59.74	54.64
LWS (Learnable Weight Scaling) Introduction: The LWS paper link: Kang et al., ICLR 2020. When using LWS, you should have a trained model firstly and then change the `NETWORK.PRETRAINED_MODEL` to your own path. CONFIG: configs/cao_cifar/two_stage/classifier_balance/LWS/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	27.37	21.34	59.84	54.79

Knowledge distillation and knowledge transfer

DiVE is realized in combiner.py and kld_loss.py

Datasets	CIFAR-10-LT-100	CIFAR-10-LT-50	CIFAR-100-LT-100	CIFAR-100-LT-50
Baseline CONFIG (from left to right): configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} Running commands: bash data_parallel_train.sh CONFIG GPU	29.64	25.19	61.68	56.15
DiVE Introduction: The DiVE paper link: He et al., ICCV 2021. When using DiVE, you should train a teacher model firstly, and then use the trained teacher to distill a student model. CONFIG: 1, configs/cao_cifar/DiVE/{cifar10_im100, cifar10_im50, cifar100_im100, cifar100_im50}/teacher.yaml 2, configs/cao_cifar/DiVE/{cifar10_im100, cifar10_im50, cifar100_im100, cifar100_im50}/student.yaml	21.12	17.56	54.48	49.17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trick_gallery.md

trick_gallery.md

Tricks, corresponding results, experimental settings, and running commands

Re-weighting

Re-sampling

Mixup training

Two-stage training

DRW

DRS

Classifier-balancing

Knowledge distillation and knowledge transfer

Files

trick_gallery.md

Latest commit

History

trick_gallery.md

File metadata and controls

Tricks, corresponding results, experimental settings, and running commands

Re-weighting

Re-sampling

Mixup training

Two-stage training

DRW

DRS

Classifier-balancing

Knowledge distillation and knowledge transfer