Stars
Research on Automatic Speech Recognition for dysarthric speech
VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Instant voice cloning by MIT and MyShell. Audio foundation model.
A library to inspect and extract intermediate layers of PyTorch models.
chinese speech pretrained models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
A self-supervised learning framework for audio-visual speech
AIdea 是一款支持 GPT 以及国产大语言模型通义千问、文心一言等,支持 Stable Diffusion 文生图、图生图、 SDXL1.0、超分辨率、图片上色的全能型 APP。
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
Simple samples for TensorRT programming
Automatic Depression Detection: a GRU/ BiLSTM-based Model and An Emotional Audio-Textual Corpus
A one-of-a-kind resume builder that keeps your privacy in mind. Completely secure, customizable, portable, open-source and free forever. Try it out today!
Papers from the computer science community to read and discuss.