这个仓库用来记录看过的paper与阅读笔记。
- [0] Multimodal Machine Learning: A Survey and Taxonomy | note
- [1] Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation | note
- [2] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks | note
- [3] Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network | note ing
- [4] Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
- [5] Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments
- [6] The Conversation: Deep Audio-Visual Speech Enhancement | note
- [7] Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine
- [8] AV Speech Enhancement Challenge using a Real Noisy Corpus
- [9] Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder | note
- [10] CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement | note
- [11] Tutorial on Variational Autoencoders | note ing
- [12] Visual Speech Enhancement | note
- [13] Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement
- [14] The Sound of Pixels
- [15] Seeing Through Noise: Visually Driven Speaker Separation and Enhancement | note
- [16] Audiovisual Speech Source Separation: An overview of key methodologies | note
- [17] Using Visual Speech Information in Masking Methods for Audio Speaker Separation | note ing
- [18] Time Domain Audio Visual Speech Separation | note
- [19] Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
- [20] Supervised Speech Separation Based on Deep Learning: An Overview
- [21] Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation
- [22] Deep clustering: Discriminative embeddings for segmentation and separation
- [23] My lips are concealed: Audio-visual speech enhancement through obstructions | note
- [24] Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues | note
- [25] On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement
- [26] Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems
- [27] Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
- [28] Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect