For two months, I have been training face recognition model using Deep Insight’s open source project InsightFace (https://github.com/deepinsight/insightface.git). All my experiments were conducted on the Tesla P40 GPU. All models I got from experiments are now kept by Shenzhen Sunwin Intelligent.
InsightFace provides a variety of network and loss function choices, but according to the author, training ArcFace with LresNet100E-IR yields the most accurate model which achieved LFW 99.8%+ and MegaFace 98.0%+. Unfortunately, the training process was too slow so I jumped to the second best option – training LrestNet50E-IR with CosineFace. I used the given cleaned ms1m dataset as my training set and after about 400000 iterations the most accurate model I got yields lfw 99.7% and agedb_30 97.6%.
I set the parameter ckpt to 2 in order to save all models (else only models that achieved lfw 99%+ will be saved). In order to know which saved model is the best one in the quickest way, I saved the logging and redirected the standard output to be appended to the log.
** Command "| tee -a " can save the log and std out to a file and still print to console.
To detect, align and crop the faces, use /src/align/align_megaface.py with argument "--name webface". However the codes are for photos which have attributes bbox and landmark, so I added the following codes:
Now landmark is a 10x1 numpy array, containing x and y coordinates of 5 facial landmarks. But only 3 points are needed for estimation, so I changed the line after similarity transform to
After running /src/align/align_megaface.py I got a dataset containing only cropped faces. It also generates a .lst file which can be directly used for the next step. However, if names of photos contain elements other than letters of English alphabet or numbers, there will be an exception during training. Also, the default raw photo format for the training is .jpg. To solve these two problems, I wrote a script which uses PIL to convert all photos to RGB, rename and save them as .jpg files.
I modified /src/data/glint2lst.py so that it writes all photo names to a .lst file in the format 1 ADDRESS LABEL.
Notice that the folders' structure should look like the following:
dataset
folders of different label each represents an individual
photos of one individual
The property file has the format <TOTAL NUMBER OF IDENTITIES,112,112>. The codes in /src/face2rec2.py directly generate a new property file when merging two datasets.
I used /src/data/dataset_merge.py to merge the two datasets.
Using train_triplet.py to fine-tune the model can sometimes improve the accuracy by about 0.1%. All the parameters are given in insightface’s readme page.
I used /src/eval/verification.py to verify the accuracy of my model.
Verification datasets are from InsightFace.
LFW(%) | CFP-FF(%) | CFP-FP(%) | AgeDB-30(%) | Vgg2-fp(%) | |
---|---|---|---|---|---|
R50 (CosineFace) | 99.717 | 99.814 | 92.714 | 97.600 | |
R50 (triplet, Cos) | 99.717 | 99.800 | 93.114 | 97.783 | |
MobileFaceNet(ArcFace) | 99.483 | 99.429 | 90.043 | 95.550 | |
MobileFaceNet(triplet, Arc) | 99.583 | 99.671 | 95.357 | 96.533 | 94.320 |
The first two steps are from: deepinsight/insightface#214. The dataset I used combined the ms1m-v1 dataset from InsightFace and a private dataset. The private dataset is provided by Shenzhen Sunwin Intelligent and contains 1,900,000 raw photos of 50,000 identities collected from Chinese social media Weibo, QQ, Wechat, Tik Tok, etc. No data overlap with ms1m, lfw and agedb-30 is detected yet. The merged dataset contains around 135k individuals.
After 140k iteration the highest accuracy on agedb-30 is 89.333%. I used the 89.3% model as the pretrained model and trained with argument "--lr_steps='100000,140000,160000'". After 400k iteration the highest accuracy on agedb-30 is 94.817%. I used the 94.8% model as the pretrained model and trained it on ms1m-v1 from InsightFace. After 600k iteration the highest accuracy on agedb-30 is 95.767%. I then fine-tuned the 95.7% model using /src/train_triplet.py and InsightFace's ms1m-v2 dataset; after 30k iteration I got agedb-30 96.317%. Then I fine-tuned it with lr 0.004 and finally got the above agedb_30 96.533% result.
@article{deng2018arcface,
title={ArcFace: Additive Angular Margin Loss for Deep Face Recognition},
author={Deng, Jiankang and Guo, Jia and Zafeiriou, Stefanos},
journal={arXiv:1801.07698},
year={2018}
}