AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Up-to-date link: https://93868c7fa583f4b5.gradio.app
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | WIP |
Speech Separation | TF-GridNet | WIP |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural Speech | NeuralWarp | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes (WIP) |
Target sound detection | TSDNet | Yes (WIP) |
Sound Extraction | LASSNet | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
4.3 Support Talking Head Synthesis
4.1 Support Audio inpainting and clean codes
3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Audio
3.19 Support Speech Recognition
3.17 Support Text-to-Audio
- clean text to sing/speech code
- import Espnet models for speech tasks
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
We appreciate the open source of the following projects: