Skip to content
/ AudioGPT Public
forked from AIGC-Audio/AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

License

Notifications You must be signed in to change notification settings

mall5/AudioGPT

Repository files navigation

AudioGPT

AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.

Capabilities

Up-to-date link: https://93868c7fa583f4b5.gradio.app

Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.

Speech

Task Supported Foundation Models Status
Text-to-Speech FastSpeech, SyntaSpeech, VITS Yes (WIP)
Style Transfer GenerSpeech Yes
Speech Recognition whisper, Conformer Yes
Speech Enhancement ConvTasNet WIP
Speech Separation TF-GridNet WIP
Speech Translation Multi-decoder WIP
Mono-to-Binaural Speech NeuralWarp Yes

Sing

Task Supported Foundation Models Status
Text-to-Sing DiffSinger, VISinger Yes (WIP)

Audio

Task Supported Foundation Models Status
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio Yes
Image-to-Audio Make-An-Audio Yes
Sound Detection Audio-transformer Yes (WIP)
Target sound detection TSDNet Yes (WIP)
Sound Extraction LASSNet Yes (WIP)

Talking Head

Task Supported Foundation Models Status
Talking Head Synthesis GeneFace Yes (WIP)

Internal Version Updates

4.3 Support Talking Head Synthesis
4.1 Support Audio inpainting and clean codes
3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Audio
3.19 Support Speech Recognition
3.17 Support Text-to-Audio

Todo

  • clean text to sing/speech code
  • import Espnet models for speech tasks
  • merge talking head synthesis into main
  • change audio/video log output
  • support huggingface space

Acknowledgement

We appreciate the open source of the following projects:

Visual ChatGPTHugging FaceLangChainStable Diffusion

About

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%