Organize a list of papers 🎈
中国计算机学会推荐国际学术会议和期刊目录-2019 [PDF]
-
3D based approaches
- Real-time facial animation with image-based dynamic avatars. Transactions on Graphics, 2016
- paGAN: real-time avatars using dynamic textures. SIGRAPH Asia, 2018. (generates key face expression textures that can be deformed and blended in real-time.)
- Neural Voice Puppetry: Audio-driven Facial Reenactment. [[PDF]](Neural Voice Puppetry Audio-driven Facial Reenactment.pdf) (编辑3DMM的expression base)
- Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion. [PDF]
- Talking-head Generation with Rhythmic Head Motion. [PDF]
-
2D landmark based approaches
- Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss. [PDF] CVPR, 2019.
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation. [PDF][Data] ECCV, 2020.
- Talking Face Generation with Expression-Tailored Generative Adversarial Network. [PDF] ACM Multimedia, 2020. (emotion, identity, speech, 多模态合成)
- Low Bandwidth video-chat compression using deep generative models. [PDF] arxiv, 2020.
- Fast face-swap using convolutional neural networks. ICCV, 2017.
- X2face: A network for controlling face generation using images, audio, and pose codes. [PDF] ECCV, 2018.
- FSGAN: Subject agnostic face swapping and reenactment. [PDF] ICCV, 2019.
- ObamaNet: Photo-realistic lip-sync from text. [PDF] arXiv, 2018.
- Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture. [PDF]
-
Optical-flow based approaches
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing. [[PDF]](./paper/One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing.pdf) arXiv, 2020.
-
Vid2vid approaches
- Few-shot video-to-video synthesis. NeurIPS, 2019.
- Video-to-video synthesis. NeurIPS, 2018.
- A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild. [PDF] ACM Multimedia 2020. (基于SyncNet和LipGAN,在原视频中抠出脸做变换再拼回去,用于visual translation)
- Towards Automatic Face-to-Face Translation. [PDF] ACM Multimedia 2020. (LipGAN, face2face translation)
- Realistic speech-driven facial animation with gans. [PDF] IJCV 2019.
- CONFIG: Controllable Neural Face Image Generation. [PDF] (AdaIN)
-
Image based approaches
- Speech Driven Talking Face Generation from a Single Image and an Emotion Condition. [[PDF]](./paper/Speech Driven Talking Face Generation from a Single Image and an Emotion Condition.pdf) arXiv, 2020.
-
Disentanglement based approaches
- Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [[PDF] ](./paper/Talking Face Generation by Adversarially Disentangled Audio-Visual Representation.pdf)CVPR, 2019.
- Mittal_Animating_Face_using_Disentangled_Audio_Representations. WACV, 2020. (解耦emotion和content)
- APB2FACEV2: Real-time audio-guided multi-face reenactment. [PDF]
- Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach. [PDF]
- (SyncNet) Out of Time: Automated Lip Sync in the Wild. [PDF] ACCV, 2016.
- Lifting 2D styleGan for 3D-aware face generation. [PDF] arxiv, 2020.
- LandmarkGAN: Synthesizing Faces from Landmarks. [PDF] arXiv, 2020.
- Fast bi-layer neural synthesis of oneshot realistic head avatars. ECCV, 2020.
- Few-shot adversarial learning of realistic neural talking head models. ICCV, 2019.
- Semantic image synthesis with spatially-adaptive normalization. CVPR, 2019.
- First order motion model for image animation. NeurIPS, 2019.
- face reenactment
-
Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. [Corrigendum] (大综述,279页...)
-
Emotion animation
- GANimation: Anatomically-aware Facial Animation from a Single Image. [PDF] CVPR, 2018. (Generates attention mask and color mask by a conditional GAN)
- Learning to Generate Customized Dynamic 3D Facial Expressions. [PDF]
- Controllable image-to-video translation: A case study on facial expression generation. AAAI, 2019.
-
Emotion recognition
- Audio emotion recognition
- f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition. [[PDF]](./paper/f-Similarity Preservation Loss for Soft Labels A Demonstration on Cross-Corpus Speech Emotion Recognition.pdf)
- Towards Discriminative Representation Learning for Speech Emotion Recognition. [PDF]
- Speech Emotion Recognition using Convolutional and Recurrent Neural Networks. [PDF]
- Acoustic Emotion Recognition: A Benchmark Comparison of Performances. [PDF]
- Visual emotion recognition
- Cross-modality emotion recognition
- M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues. [PDF] AAAI, 2020. (从三个模态提取信息识别一个视频的emotion)
- Audio emotion recognition
-
(Pix2PixGAN) Image-to-Image Translation with Conditional Adversarial Networks. [PDF] [Github] CVPR, 2017.
-
(CycleGAN) Unpaired Image-to-Image Translationusing Cycle-Consistent Adversarial Networks. [PDF] [Github] ICCV, 2017.
-
StyleGAN and its derivative
- (StyleGAN) A Style-Based Generator Architecture for Generative Adversarial Networks. [Which face is real?][PDF][GitHub][中文Blog][YouTube][Bilibili][PyTorch] CVPR, 2019. Nvidia.
- (StyleGAN2) Analyzing and Improving the Image Quality of StyleGAN. [PDF] [GitHub][翻译] CVPR, 2020. Nvidia.
- GAN-Control: Explicitly Controllable GANs. [PDF] arXiv, 2021. Amazon.
- (InterFaceGAN) Interpreting the Latent Space of GANs for Semantic Face Editing. [PDF] [Github] CVPR, 2020.
-
(BEGAN) BEGAN: Boundary Equilibrium Generative Adversarial Networks. [PDF]
- (3DMM)
- (3DFFA) Towards Fast, Accurate and Stable 3D Dense Face Alignment. [PDF]
- (3DFFA2) Face Alignment in Full Pose Range: A 3D Total Solution. [PDF]
-
Towards Real-World Blind Face Restoration with Generative Facial Prior. [PDF]
(Blind face restoration, 用到pretrained GAN as prior)
- The Creation and Detection of Deepfakes A Survey [PDF]
- 3D Morphable Face Models—Past, Present, and Future
- Transformers in Vision: A Survey
- A Survey on Visual Transformer. [[PDF]](./paper/A Survey on Visual Transformer.pdf)
- GAN Inversion: A Survey. [PDF]
- VinVL: Making Visual Representations Matter in Vision-Language Models.
- Anomaly Detection in Video Sequence with Appearance-Motion Correspondence. [PDf] [Github] ICCV, 2019. (其中用到gradient loss,原因是L2 reconstruction loss会使边缘模糊,而image gradient loss可以锐化细节,gradient loss实现代码在GAN_tf.py的224行。用到了optical flow做motion prediction)
- Arbitrary style transfer in real-time with adaptive instance normalization. [PDF] ICCV, 2017. (AdaIN)
- Understanding Ranking Loss, Contrastive Loss, Margin Loss, Triplet Loss, Hinge Loss and all those confusing names
- 常用数学符号的LaTeX表示方法
Jupyter GitHub nbviewer: https://nbviewer.jupyter.org