Skip to content

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

License

Notifications You must be signed in to change notification settings

MaxMax2016/AutoVocoder

 
 

Repository files navigation

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Unofficial Pytorch implementation of Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing. This repository is based on iSTFTNet github (Paper).

Disclaimer : This repo is built for testing purpose.

Training :

python train.py --config config.json

In train.py, change --input_wavs_dir to the directory of LJSpeech-1.1/wavs.
In config.json, change latent_dim for AV128, AV192, and AV256 (Default).
Considering Section 3.3, you can select dec_istft_input between cartesian (Default), polar, and both.

Note:

  • Validation loss of AV256 during training.

  • In our test, it converges almost 3X times faster than HiFi-V1 (referring to the official repo).

Citations :

@article{Webber2022AutovocoderFW,
  title={Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing},
  author={Jacob J. Webber and Cassia Valentini-Botinhao and Evelyn Williams and Gustav Eje Henter and Simon King},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.06989}
}

References:

About

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%