This is a pytorch implementation of mFVAE in the paper: mixture factorization auto-encoder for unsupervised hierarchical deep factorization of speech signal. Note that we apply reparameterization tricks on posteriors generated by both frame tokenizer and utterance embedder.
Here is an online demo of the embeddings extracted from the embedder of mfvae. It can be seen that the speaker information exists in the embeddings. As the aim of mfvae/mfae is to factorize linguistic information and paralinguistic information. The embeddings also contain channel distortion and background noise.
- Python 3.7
- Pytorch 1.1.0
- Kaldi
- PyKaldi
- kaldi_io
- GPUtil
- NumPy, datetime, argparse, pprint
- Download and unzip audio files from http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
- Create a directory named voxceleb1 with two subdirectories named train and test. Move dev data to train directory, test data to test directory.
- Download List of trial pairs for Verification(http://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt). Move it to voxceleb1 dir.
- Go to voxceleb-mfvae directory:
- run cmd:
ln -fsr "your path to kaldi-trunk/egs/sre08v/1/utils" utils
- Modify root_data_dir in run.sh
- run cmd:
bash run.sh --stage 0