Stars
Witness the aha moment of VLM with less than $3.
Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Fully open reproduction of DeepSeek-R1
FireRedASR is a family of open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outs…
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
This is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.
Taltt / RNDVoC
Forked from Andong-Li-speech/RNDVoCThis is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.
This is the repository of the manuscript "Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement".
Sky-T1: Train your own O1 preview model within $450
Code for Audio-Visual Target Speaker Extraction with Selective Auditory Attention (TASLP)
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
A Pytorch Implementation of Finite Scalar Quantization
制作懂人情世故的大语言模型 | 涵盖提示词工程、RAG、Agent、LLM微调教程