Highlights
- Pro
Stars
Official repository of SepReformer for speech separation
An awesome spoken LID repository. (Working in progress
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
unofficial vits2-TTS implementation in pytorch
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
[AAAI-23 Oral] Official implementation of the paper "Are Transformers Effective for Time Series Forecasting?"
Implementation of "End-to-End Speaker Diarization as Post-Processing"
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
code for "Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation, EMNLP 22"
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
[IJCAI'23] Learning to Speak from Text for Low-Resource TTS
Some comprehensive papers about speaker diarization
This is an official implementation for "Block Selection Method for Using Feature Norm in Out-of-distribution Detection".
A deep neural network architecture for low-latency audio processing
Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units.
The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"
Official Pytorch implementation of "Graphit: A Unified Framework for Diverse Image Editing Tasks"
Source code for ICASSP 2022 paper "MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations"
The proposed framework to retrieve the continuous chunk-level emotions via emo-rankers for Seq2Seq SER
Code for "Distribution-based Emotion Recognition in Conversation"
How to use our public wav2vec2 dimensional emotion model
S3PRL-VC: A Voice Conversion Toolkit based on S3PRL
[ICASSP 2023] FedAudio: A Federated Learning Benchmark for Audio and Speech Tasks
Web-crawl for "Audio Retrieval with WavText5K and CLAP Training"
Official implement of SpeechFormer written in Python (PyTorch).