-
Freeze-Omni Public
Forked from VITA-MLLM/Freeze-Omni✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Python Other UpdatedDec 3, 2024 -
GLM-4-Voice Public
Forked from THUDM/GLM-4-VoiceGLM-4-Voice | 端到端中英语音对话模型
Python Apache License 2.0 UpdatedOct 25, 2024 -
-
External-Attention-pytorch Public
Forked from xmu-xiaoma666/External-Attention-pytorch🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
Python MIT License UpdatedAug 29, 2024 -
Qifusion-net Public
The net mudule of Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
-
TIM-Net_SER Public
Forked from Jiaxin-Ye/TIM-Net_SER[ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition".
Python GNU General Public License v3.0 UpdatedMay 15, 2024 -
pytorch-metric-learning Public
Forked from KevinMusgrave/pytorch-metric-learningThe easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
Python MIT License UpdatedDec 16, 2023 -
vits_chinese_0829 Public
Forked from PlayVoice/vits_chineseBest practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support streaming out!
Python MIT License UpdatedSep 19, 2023 -
so-vits-svc-5.0 Public
Forked from PlayVoice/whisper-vits-svcCore Engine of Singing Voice Conversion & Singing Voice Clone
Python MIT License UpdatedSep 11, 2023 -
auto_avsr Public
Forked from mpc001/auto_avsrAuto-AVSR: Lip-Reading Sentences Project
Python Apache License 2.0 UpdatedSep 3, 2023 -
audiocraft Public
Forked from facebookresearch/audiocraftAudiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Python MIT License UpdatedAug 3, 2023 -
Whisper-Finetune Public
Forked from yeyupiaoling/Whisper-Finetune微调Whisper语音识别模型,支持无时间戳数据训练,有时间戳数据训练、无语音数据训练。加速推理,支持Web部署、Windows桌面部署和Android部署
C Apache License 2.0 UpdatedJul 30, 2023 -
AttentionIsOFFByOne Public
Forked from kyegomez/AttentionIsOFFByOneImplementation of "Attention Is Off By One" by Evan Miller
Python MIT License UpdatedJul 25, 2023 -
VITS-fast-fine-tuning Public
Forked from Plachtaa/VITS-fast-fine-tuningThis repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Python Apache License 2.0 UpdatedJul 2, 2023 -
CIF-HieraDist Public
Forked from MingLunHan/CIF-HieraDist[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation
Python Apache License 2.0 UpdatedJun 16, 2023 -
generative-ai-roadmap Public
Forked from SeedV/generative-ai-roadmap生成式AI的应用路线图 The roadmap of generative AI: use cases and applications
Creative Commons Attribution 4.0 International UpdatedJun 11, 2023 -
FunASR Public
Forked from modelscope/FunASRA Fundamental End-to-End Speech Recognition Toolkit
Python Other UpdatedJun 8, 2023 -
wenet Public
Forked from wenet-e2e/wenetProduction First and Production Ready End-to-End Speech Recognition Toolkit
C++ Apache License 2.0 UpdatedJun 7, 2023 -
whisper Public
Forked from openai/whisperRobust Speech Recognition via Large-Scale Weak Supervision
Python MIT License UpdatedJun 5, 2023 -
so-vits-svc Public
Forked from svc-develop-team/so-vits-svcSoftVC VITS Singing Voice Conversion
Python BSD 3-Clause "New" or "Revised" License UpdatedMay 23, 2023 -
ColossalAI Public
Forked from hpcaitech/ColossalAIMaking big AI models cheaper, easier, and scalable
Python Apache License 2.0 UpdatedFeb 15, 2023 -
-
LPCNet Public
Forked from xiph/LPCNetEfficient neural speech synthesis
C BSD 3-Clause "New" or "Revised" License UpdatedSep 30, 2022 -
Comprehensive-Transformer-TTS Public
Forked from keonlee9420/Comprehensive-Transformer-TTSA Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …
Python MIT License UpdatedSep 24, 2022 -
sound-separation Public
Forked from google-research/sound-separationPython Apache License 2.0 UpdatedSep 20, 2022 -
-
chinese_speech_pretrain Public
Forked from TencentGameMate/chinese_speech_pretrainchinese speech pretrained models
Shell UpdatedJul 13, 2022 -
Leveraging-Self-Supervised-Learning-for-AVSR Public
Forked from LUMIA-Group/Leveraging-Self-Supervised-Learning-for-AVSROfficial PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition
Python MIT License UpdatedJul 13, 2022 -
attention_keras Public
Forked from thushv89/attention_kerasKeras Layer implementation of Attention for Sequential models
Python MIT License UpdatedJul 7, 2022 -
PerceptualAudio Public
Forked from pranaymanocha/PerceptualAudioPerceptual Metrics of Audio - perceptually relevant loss function. DPAM and CDPAM
Python MIT License UpdatedJun 22, 2022