WhisperX for LJ Speech

Automatically create datasets in LJ Speech format to use for training TTS ( text-to-speech) models. LJ Speech is a common standard and is used in TTS frameworks such as Tortoise or Piper.

Segments detected in VAD step in WhisperX are used to create short samples in .wav format and WhisperX ASR is used to create the corresponding transcriptions.

Install

pip install -r requirements.txt

Usage

Put your (possibly long) audio files containing spoken audio into input_audio
Run python create_dataset.py --model base --gpu 0 --input input_audio --output output if you have a GPU available, or on CPU: python create_dataset.py --model tiny --cpu --input input_audio --output output
Output:
- Processed audio samples are saved as .wav files in the output/audio directory
- A metadata.csv file is generated, containing entries in the format
```
000001_000001|Transcribed text of first audiosample.     
000001_000002|Transcribed text of the second audiosample. 
...
```

Docker

Build Dockerimage: docker build -t whisperx4ljspeech .
Run docker run --gpus '"device=0"' -v $(pwd)/input_audio:/app/input_audio -v $(pwd)/output:/app/output whisperx4ljspeech --input input_audio/ --output output --gpu 0 --model large-v3 --language de

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
create_dataset.py		create_dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperX for LJ Speech

Install

Usage

Docker

About

Releases

Packages

Languages

nik-ko/whisperx4ljspeech

Folders and files

Latest commit

History

Repository files navigation

WhisperX for LJ Speech

Install

Usage

Docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages