EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
_{The official implementation of EmoSphere-TTS}

|Demo page

Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

Department of Artificial Intelligence, Korea University, Seoul, Korea

Abstract

Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model’s ability to control emotional style and intensity with high-quality expressive speech.

Training Procedure

Environments

pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install force alignment tools

1. Preprocess data

We use ESD database, which is an emotional speech database that can be downloaded here: https://hltsingapore.github.io/ESD/.

sh preprocessing.sh

2. Training TTS module and Inference

sh train_run.sh

3. Pretrained checkpoints

TTS module trained on 160k [Download]

Acknowledgements

Our codes are based on the following repos:

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data_gen/tts		data_gen/tts
egs		egs
mfa_usr		mfa_usr
modules		modules
tasks		tasks
utils		utils
README.md		README.md
align_and_binarize.py		align_and_binarize.py
esd_text_emo.txt		esd_text_emo.txt
preprocessing.sh		preprocessing.sh
requirements.txt		requirements.txt
run.py		run.py
train_run.sh		train_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
_{The official implementation of EmoSphere-TTS}

|Demo page

Abstract

Training Procedure

Environments

1. Preprocess data

2. Training TTS module and Inference

3. Pretrained checkpoints

Acknowledgements

About

Releases

Packages

Languages

techthiyanes/EmoSphere-TTS

Folders and files

Latest commit

History

Repository files navigation

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech The official implementation of EmoSphere-TTS

|Demo page

Abstract

Training Procedure

Environments

1. Preprocess data

2. Training TTS module and Inference

3. Pretrained checkpoints

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
_{The official implementation of EmoSphere-TTS}

Packages