Stars
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
🔊 Text-Prompted Generative Audio Model
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A multi-voice TTS system trained with an emphasis on quality
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
AirLLM 70B inference with single 4GB GPU
An Open Source text-to-speech system built by inverting Whisper.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.
Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of sep…
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) in Jupyter Notebooks, including normalizing da…
Intel Neuromorphic DNS Challenge
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Implementation for "Music Enhancement via Image Translation and Vocoding"
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
Generative Fixed-Filter Active Noise Control with CNN-Kalman Filtering