European Conference on Computer Vision (ECCV 2024), Arxive, HuggingFace
Saeed Ebrahimi★, Sahar Rahimi★, Ali Dabouei, Nasser Nasrabadi
★ Equal contribution
Aiming to enhance Face Recognition (FR) on Low-Quality (LQ) inputs, recent studies suggest incorporating synthetic LQ samples into training. Although promising, the quality factors that are considered in these works are general rather than FR-specific, e.g., atmospheric turbulence, resolution, etc. Motivated by the observation of the vulnerability of current FR models to even small Face Alignment Errors (FAE) in LQ images, we present a simple yet effective method that considers FAE as another quality factor that is tailored to FR. We seek to improve LQ FR by enhancing FR models' robustness to FAE. To this aim, we formalize the problem as a combination of differentiable spatial transformations and adversarial data augmentation in FR. We perturb the alignment of the training samples using a controllable spatial transformation and enrich the training with samples expressing FAE. We demonstrate the benefits of the proposed method by conducting evaluations on IJB-B, IJB-C, IJB-S (+4.3% Rank1), and TinyFace (+2.63%)
Visual comparison of aligned (a) and alignment-perturbed (b) samples from the IJB-B dataset. (c, d, e) The performance difference between aligned inputs and those with slight FAE. Models exhibit robustness to FAE in HQ samples but suffer significant performance drops in LQ faces, with over 50% reduction in TAR@FAR=1e-5. Results from two distinct ResNet-100 trained on MS1MV3 using ArcFace/AdaFace objective.
- We introduce Face Alignment Error (FAE) as an image degradation factor tailored for FR which has previously been ignored in LQ FR studies.
- We propose an optimization method that is specifically tailored to increase the FR model robustness against FAE.
- We show that the proposed optimization can greatly increase the FR performance in real-world LQ evaluations such as IJB-S and TinyFace. Moreover, our framework achieves these improvements without sacrificing the performance on datasets with both HQ and LQ samples such as IJB-B and IJB-C.
- We empirically show that the proposed method is a plug-and-play module, providing an orthogonal improvement to SOTA FR methods.
Method | Training Set | Rank1 | Rank5 |
---|---|---|---|
URL | MS1MV2 | 63.89 | 68.67 |
CurricularFace | MS1MV2 | 63.68 | 67.65 |
ArcFace+CFSM★ | MS1MV2 | 64.69 | 68.80 |
ArcFace+ARoFace | MS1MV2 | 67.32 | 72.45 |
ArcFace | MS1MV3 | 63.81 | 68.80 |
ArcFace+ARoFace | MS1MV3 | 67.54 | 71.05 |
AdaFace★ | WebFace4M | 72.02 | 74.52 |
AdaFace+ARoFace | WebFace4M | 73.98 | 76.47 |
AdaFace | WebFace12M | 72.29 | 74.97 |
AdaFace+ARoFace | WebFace4M | 74.00 | 76.87 |
★ Re-runs with official code due to missing trained checkpoints on the specified dataset in the official repository
Method | Venue | Dataset | Surveillance-to-Single | Surveillance-to-Booking | Surveillance-to-Surveillance | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Rank1 | Rank5 | 1 | Rank1 | Rank5 | 1 | Rank1 | Rank5 | 1 | |||
ArcFace | CVPR2019 | MS1MV2 | 57.35 | 64.42 | 41.85 | 57.36 | 64.95 | 41.23 | - | - | - |
PFE | ICCV2019 | MS1MV2 | 50.16 | 58.33 | 31.88 | 53.60 | 61.75 | 35.99 | 9.20 | 20.82 | 0.84 |
URL | ICCV2020 | MS1MV2 | 59.79 | 65.78 | 41.06 | 61.98 | 67.12 | 42.73 | - | - | - |
ArcFace+ARoFace | ECCV2024 | MS1MV2 | 61.65 | 67.6 | 47.87 | 60.66 | 67.33 | 46.34 | 18.31 | 32.07 | 2.23 |
ArcFace | CVPR2019 | WebFace4M | 69.26 | 74.31 | 57.06 | 70.31 | 75.15 | 56.89 | 32.13 | 46.67 | 5.32 |
ArcFace+ARoFace | ECCV2024 | WebFace4M | 70.96 | 75.54 | 58.67 | 71.70 | 75.24 | 58.06 | 32.95 | 50.30 | 6.81 |
AdaFace | CVPR2022 | WebFace12M | 71.35 | 76.24 | 59.40 | 71.93 | 76.56 | 59.37 | 36.71 | 50.03 | 4.62 |
AdaFace+ARoFace | ECCV2024 | WebFace12M | 72.28 | 77.93 | 61.43 | 73.01 | 79.11 | 60.02 | 40.51 | 50.90 | 6.37 |
Download and prepare datasets from InsightFace repository
The total batch size we used for training was 2048 on four Nvidia RTX 6000 ADA. To have stable training, choose the learning rate based on the total batch size on your machine:
config.lr = (0.1*config.batch_size*config.ngpus)/(1024)
Please modify the
config.ngpus = 4
according to your resources in configs.
Then, for training on one machine using four GPUs:
torchrun --nproc_per_node=4 train_v2.py configs/ms1mv2_r100
Method | Arch | Dataset | Link |
---|---|---|---|
ArcFace+ARoFace | R100 | MS1MV2 | link |
ArcFace+ARoFace | R100 | MS1MV3 | link |
ArcFace+ARoFace | R100 | WebFace4M | link |
AdaFace+ARoFace | R100 | WebFace4M | link |
AdaFace+ARoFace | R100 | WebFace12M | link |
@misc{saadabadi2024arofacealignmentrobustnessimprove,
title={ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition},
author={Mohammad Saeed Ebrahimi Saadabadi and Sahar Rahimi Malakshan and Ali Dabouei and Nasser M. Nasrabadi},
year={2024},
eprint={2407.14972},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.14972},
}
Here are some great resources we benefit from:
- ArcFace and AdaFace for the face recognition module.
- advertorch, RobustAdversarialNetwork, and CFSM for the adversarial regularization.
If there is a question regarding any part of the code, or it needs further clarification, please create an issue or send me an email: [email protected].