Spectro-Temporal Attention Based Voice Activity Detection (pytorch)

My implementation of STAM provides slightly better performance compared to the original tensorflow one:

Tensorflow: Global AUC: 99.86, F1-score: 98.15, DCF: 1.32, accuracy: 97.90, precision: 99.10

Pytorch: Global AUC: 99.87, F1-score: 98.31, DCF: 1.18, accuracy: 98.07, precision: 99.06

Training data

TIMIT training data + NOISEX (SNR: -10, -5, 0, 5, 10dB)

TIMIT testing data + AURORA (SNR: -5, 0, 5, 10dB)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
checkpoint		checkpoint
datasets		datasets
LICENSE		LICENSE
README.md		README.md
VAD_module.py		VAD_module.py
batch_test_TIMIT_stam.py		batch_test_TIMIT_stam.py
preprocess.py		preprocess.py
train_STAM.py		train_STAM.py
utils.py		utils.py