GitHub - virgile-blg/VAD: Voice Activity Detection

Voice Activity Detector

This project implements a Voice Activity Detection algorithm based on the paper: Sofer, A., & Chazan, S. E. (2022). CNN self-attention voice activity detector. arXiv preprint arXiv:2203.02944.

The Data Processing folder contains primarly a Notebook that has been used for the processing of the annoation and the data aumentation. data_utils.py contains helper functions for annotation processing. energy_vad.py contains code that was not written by myself. It is an implementaion of an energy-based VAD found on GitHub that I used to extract noise signals from Librispeech samples.

The algorithm training pipeline is organised as follows:

data.py implements the PyTorch dataset together with the Lightning DataModule
modules.py implements the PyTorch neural network model
model.py implements the Lightning Module for training
train.py is the main script to start a training
inference.py is a simple script to test the model on real-world audio
config folder regroupe YAML file for experiment hyperparamters. The baseline_sa_cnn.yml is the hyperparameters set as described in the paper, while 128_mels.yml is a slightly modified version.

You will also find some artifacts created after the training :

checkpoints folder is the saved model checkpoints, containing weights, optimizer state, hyperparams...
tb_logs contains the Tensorboard logs

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
checkpoints/128_mels		checkpoints/128_mels
configs		configs
data_processing		data_processing
.gitignore		.gitignore
README.md		README.md
data.py		data.py
inference.py		inference.py
model.py		model.py
modules.py		modules.py
testing.ipynb		testing.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Activity Detector

About

Releases

Packages

Languages

virgile-blg/VAD

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detector

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages