DeepFakes: Masking and Unmasking Faces using Adversarial Network

The aim of this repository is to document the code and our work on our CS 534 AI term project at Worcester Polytechnic Institute (WPI), MA.

Project Goal

The goal is to use deep learning techniques, specfically generative modelling to achieve unpaired image translation from

Unmasked face domain to masked face domain
Masked face domain to unmasked face domain

(Presentation) (Paper)

Dataset

We use 2 different data-sets and curate them according to our use-case:

Flickr-Faces-HQ (FFHQ) data-set for unmasked images
MaskedFace-Net data-set for masked images

For unmasked faces, FFHQ is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN). The data-set consists of 70,000 high-quality PNG images at 1024×1024 resolution with variation and diversity in terms of subjects and objects in the frame.

For our masked face data-set, we use the MaskedFace-Net dataset, which is a dataset of human faces with a correctly or incorrectly worn mask (133,783 images) based on the Flickr-Faces-HQ (FFHQ) data-set. The masks are photo-shopped onto the faces. Although the dataset is based on FFHQ, the facemasks are incorrectly masked for most of the images.

For both of these data-sets, we curate images based on the number of faces in images, mask placement, mask clarity, and realistic effect of the mask on the image. We use a subset of 6000 masked images and 6000 unmasked images for training and 1000 test images from each of them were used for testing.

Sample image from the unmasked real domain

Sample image from the masked real domain

The curated dataset can be found in the following drive link: https://drive.google.com/drive/folders/1qKIMJx949qAPC71GlGS1cGPceBISQGma?usp=sharing

From here on, I would be referring to

The unmasked domain as domain A
The masked domain as as domain B

Methodology

We use the CycleGAN architecture to transform the images of unmasked faces into masked faces, and transform masked faced to unmasked faces. The above problem is made more intricate by the fact that there does not exist a pair-to-pair data-set for masked to unmasked faces. We define pair to pair data-set as the same face without the mask and with the mask.

This leads us to the task of unpaired image-to-image translation. CycleGANs have previously been used for tasks such as horse-zebra, summer-winter etc. This task can be formulated as an image from a source domain X to a target domain Y in the absence of paired examples. We define the goal as mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y, using an adversarial loss. CycleGAN introduced another loss because this mapping is highly under-constrained. To address this, they couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to enforce F(G(X)) ≈ X and G(F(Y)) ≈ Y.

To explain GANs, Generative Adversarial Networks include two networks, a Generator G(x), and a Discriminator D(x). The generator tries to generate the data based on the underlying distribution of the training data whereas the discriminator tries to tell apart the fake images from the real ones. They play an adversarial game where the generator tries to fool the discriminator by generating data similar to those in the training set. The Discriminator is fooled when the generated fakes are so real that it cant tell them apart. Both of them are trained simultaneously on data-sets of images, videos, and audio files. The generator G(x) model generates images from random noise and then learns the data distribution of how to generate realistic images. Random noise is given to the generator which outputs the fake images and the real image from the training set is given to the discriminator that learns how to differentiate fake images from real images. The output of Discriminator D(x) is the probability that the given input is real if the output is 1.0, and if the output is 0 the given input is identified as fake. Thus our goal is to get the output 1 (real) for all the fake images.

In a nutshell, we have a Generator takes random noise as input and outputs a translated image (fake). On the otherhand, a Discriminator takes the generated fake by the discriminator and images from domain B as input and outputs 1 for real / 0 for fake. If the discriminator correctly identifies the fake, the generator is penalized and the discriminator is rewarded. The input to the generator is just random noise. If the discriminator incorrectly identifies the fake as real, the generator is rewarded for fooling the discriminator and the discriminator is penalized. The generator and the discriminator play this adversarial game, while improving and updatinng each other.

In essence, we have

a generator G_AB, a CNN that takes random noise as input and learns to translate images from domain A to domain B.
a discriminator D_AB, a classifier that learns to distinguish between the images from the real domain B and the fakes generated by G_AB.
a generator G_BA, a CNN that takes the output of G_AB as input and learns to translate images from domain B to domain A.
a discriminator D_BA, a classifier that learns to distinguish between the images from the real domain A and the fakes generated by G_BA.

Training

The generator network architecture consists of three convolutions, several residual blocks, two fractionally-strided convolutions with stride 1/2, and one convolution. CycleGAN uses 6 blocks for 128 × 128 images and 9 blocks for 256×256 and higher-resolution training images. For the discriminator networks we use 70 × 70 PatchGANs, which aim to classify whether 70 × 70 overlapping image patches are real or fake. To train the network we used a number of different hyper-parameter such as batch size, batch sequence, normalization type, types of optimization algorithm, change in loss from L1, log-loss, or L2 loss. We performed the below experiments using 4 Nvidia Tesla V100 GPUs and took around 12 hours to run 200 epochs.

In picture below, we present fakes generated through the training process, as we progress through epochs.

In picture below, we present the various GAN lossess and its varaiation as a function of epochs.

Evaluation and Results

Firstly, we present the fakes generated by our model in both domain A and domain B.

To evaluate our work, we use a manual evaluation in the form of visual inspection of images.

We also use the qualitative metric, a visual study is similar to the likes of perceptual studies of Amazon Mechanical Turks with participants shown a sequence of images asking them to label them as real or fake. Rating and Preference Judgment is the most used qualitative method: Images are often presented in pairs and the human judge is asked which image they prefer, e.g., which image is more realistic. We conducted a survey, where the user was asked to guess and filter out the ground truth from the GAN-generated outputs out of 22 images that were randomly presented to the user. Each correct answer fetches the user 1 point. As it stands, we received 100 responses and the following is the analysis: Of the 2200 predictions, the users wrongly guessed 881 times. This gives us an accuracy of 40.08% which implies, the generated fakes were able to dupe the user 40.08% of time, which is impressive, considering the discerning sight we humans possess, thus validating our model through visual inspection. The statistics for the survey can be seen in figure below, where theaverage score of the user is 13.24.

We also use a quantitative metric in the form of FID score. The Frechet Inception Distance (FID), is a metric for evaluating the quality of generated images and is generally used to assess the performance of generative adversarial networks. FID measures the distance between the distributions of generated and real samples. Lower FID is better, meaning they are more similar to real and generated samples as measured by the distance between their distributions. Finally, We evaluated our generator models based on Fréchet inception distance (FID), a quantitative GAN evaluation metric.

Through the above training trials, we achieved the least FID_AB of 17.07 with a generator model G_AB, which was trained with the following values of hyper-parameters: batch size 16, instance normalization, linear learning policy and lsgan optimization loss. We also observe that the FID_AB score ranges between 17.07 and 32.18. The best score of 17.07, coupled with small range means that the generator network G_AB is finding it relatively easier to learn how to apply a fake mask onto an unmasked face, irrespective of the variations in the hyper parameters. Similarly, we achieved the least FID_BA of 48.39 with a generator model G_BA, which was trained with batch size 32, instance normalization, linear learning policy and vanilla optimization loss. The best score of 48.39 implies the generator network G_BA is finding it slightly difficult to generate the masked area of the face. We also observe that FID_BA score ranges between values 48.39 and 261.78, which implies the performance of the network varies significantly with change in hyper parameters.

References

https://github.com/cabani/MaskedFace-Net (for face masked images) - Most of the images in this dataset are not well masked. We will be only selecting the images which are properly masked.\
https://github.com/NVlabs/ffhq-dataset (for unmasked images) - Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of 70,000 high-quality PNG images at 1024×1024 resolution of human faces.
https://towardsdatascience.com/demystifying-gans-cc1ac011355
Cycle GAN paper : https://arxiv.org/abs/1703.10593
https://towardsdatascience.com/cycle-gan-with-pytorch-ebe5db947a99

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.idea		.idea
Dataset		Dataset
Resources		Resources
checkpoints		checkpoints
data		data
docs		docs
models		models
options		options
scripts		scripts
util		util
AI_Beta_Team_7.pdf		AI_Beta_Team_7.pdf
AI_Presentation.pptx		AI_Presentation.pptx
AI_project.ipynb		AI_project.ipynb
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run.sh		run.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepFakes: Masking and Unmasking Faces using Adversarial Network

Project Goal

Dataset

Methodology

Training

Evaluation and Results

References

About

Releases

Packages

Languages

License

grvbhosale/AI_project_deep_fakes

Folders and files

Latest commit

History

Repository files navigation

DeepFakes: Masking and Unmasking Faces using Adversarial Network

Project Goal

Dataset

Methodology

Training

Evaluation and Results

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages