音频处理
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
Python script that slices audio with silence detection
Muzic: Music Understanding and Generation with Artificial Intelligence
リアルタイムボイスチェンジャー Realtime Voice Changer
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)