Repository for "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks", implemented with Caffe, C++ interface. Compared with Cascade CNN, MTCNN integrates the detection net and calibration net into one net. Moreover, face alignment is also applied in the same net.
The results of each procedure in MTCNN are contained in result folder. The final results are shown in the flowing. These two pictures are collected by FDDB separately. The MTCNN fails to detect a face which is contained in FDDB in the left picture, while it detect a face in the right picture which is not contained in FDDB. (Better than the benchmark in some cases.)
The average time cost is faster than Cascade CNN which is 0.197 s/frame. The result is generated by testing a 1080P live video.
The accuracy in FDDB which is higher than 0.9. The model contained in as a pretrain model and improve the result
A hdf5 dataset is necessray to train a model which has multiple labels in a Caffe model. A sample of script which can generate .hdf5 file is list here.
In this sample, you need to prepare 4 txt file, which contains label (0 or 1), landmark (ratio in the cropped image), regression box (ratio), and cropped image pathes. A database contained landmarks infomation is needed to generate the sample with these multiple attribute, such as CelebA. Then change the pathes in the sample code. The output hdf5 file is shown in train_file_path
. Notice the image size needs to be changed to generate suitable data for you net.
label_path = '../dataset/label.txt'
landmark_path = '../dataset/landmark.txt'
regression_box_path = '../dataset/regression_box.txt'
crop_image_path = '../dataset/crop_image.txt'
train_file_path = '../dataset/train_24.hd5'
Then, write the path of the hdf5 file into a txt and contain the txt in prototxt file.
hdf5_data_param {
source: "/Users/Young/Documents/Programming/MTCNN/MTCNN_train/test_48.txt"
batch_size: 100
}
Run the shell file and train the model.