AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | Yes (WIP) |
Speech Separation | TF-GridNet | Yes (WIP) |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural | NeuralWarp | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes |
Target Sound Detection | TSDNet | Yes |
Sound Extraction | LASSNet | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
- clean text to sing/speech code
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
We appreciate the open source of the following projects: