ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

This repository contains the code developed for the Interspeech accepted paper: Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering by E. Sarkar, R. Prasad, and M. Magimai Doss (2022).

Please cite the original authors for their work in any publication(s) that uses this work:

@inproceedings{sarkar22_interspeech,
author    = {Eklavya Sarkar and RaviShankar Prasad and Mathew Magimai Doss},
title     = {{Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering}},
year      = {2022},
booktitle = {Proc. Interspeech 2022},
pages     = {4626--4630},
doi       = {10.21437/Interspeech.2022-10535}
}

Approach

We jointly model voice source and vocal tract system information using zero-frequency filtering technique for the purpose of voice activity detection. This is computed by combining the ZFF filter outputs together to compose a composite signal carrying salient source and system information, such as the fundamental frequency $f_{0}$ and formants $F_{1}$ and $F_{2}$ , and then applying a dynamic threshold after spectral entropy-based weighting. Our approach operates purely in the time domain, is robust across a range of SNRs, and is much more computationally efficient than other neural methods.

Installation

This package has very few requirements. To create a new conda/mamba environment, install conda, then mamba and simply follow the next steps:

mamba env create -f environment.yml   # Create environment
conda activate zff                    # Activate environment
make install clean                    # Install packages

Command-line Usage

To segment a single audio file into a .csv file:

segment -w path/to/audio.wav -o path/to/save/segments

To segment a folder of audio files:

segment -f path/to/folder/of/audio/files -o path/to/save/segments

For more options check:

segment -h

Note: depending on the conditions of the given data, it will be necessary tune the smoothing and theta parameters.

Python Usage

To compute VAD on a given audio file:

from zff import utils
from zff.zff import zff_vad

# Read audio at native sampling rate
sr, audio = utils.load_audio("audio.wav")

# Get segments
boundary = zff_vad(audio, sr)

# Smooth
boundary = utils.smooth_decision(boundary, sr)

# Convert from sample to time domain
segments = utils.sample2time(audio, sr, boundary)

# Save as .csv file
utils.save_segments("segments", "audio", segments)

To extract the composite signal from a given audio file:

from zff.zff import zff_cs
from zff import utils

# Read audio at native sampling rate
fs, audio = utils.load_audio("audio.mp3")

# Get composite signal
composite = zff_cs(audio, sr)

# Get all signals
composite, y0, y1, y2, gcis = zff_cs(audio, sr, verbose=True)

Repository Structure

.
├── environment.yml          # Environment
├── img                      # Images
├── LICENSE                  # License
├── Makefile                 # Setup
├── MANIFEST.in              # Setup
├── pyproject.toml           # Setup
├── README.rst               # README
├── requirements.txt         # Setup
├── setup.py                 # Setup
├── version.txt              # Version
└── zff                      # Source code folder
    ├── arguments.py            # Arguments parser
    ├── segment.py              # Main method
    ├── utils.py                # Utility methods
    └── zff.py                  # ZFF methods

Contact

For questions or reporting issues to this software package, kindly contact the first author.

Name	Name	Last commit message	Last commit date
Latest commit EklavyaFCB Create CITATION.cff Oct 19, 2023 af12347 · Oct 19, 2023 History 4 Commits
img	img	Initial commit	Nov 22, 2022
zff	zff	Initial commit	Nov 22, 2022
.flake8	.flake8	Initial commit	Nov 22, 2022
.gitignore	.gitignore	Initial commit	Nov 22, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	Initial commit	Nov 22, 2022
CITATION.cff	CITATION.cff	Create CITATION.cff	Oct 19, 2023
LICENSE	LICENSE	Initial commit	Nov 22, 2022
MANIFEST.in	MANIFEST.in	Initial commit	Nov 22, 2022
Makefile	Makefile	Initial commit	Nov 22, 2022
README.rst	README.rst	Added example to get individual ZFF signals	Mar 30, 2023
environment.yml	environment.yml	Initial commit	Nov 22, 2022
pyproject.toml	pyproject.toml	Initial commit	Nov 22, 2022
requirements.txt	requirements.txt	Initial commit	Nov 22, 2022
setup.py	setup.py	Initial commit	Nov 22, 2022
version.txt	version.txt	Initial commit	Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Approach

Installation

Command-line Usage

Python Usage

Repository Structure

Contact

About

Releases 1

Packages

Languages

License

idiap/zff_vad

Folders and files

Latest commit

History

Repository files navigation

ZFF VAD

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Approach

Installation

Command-line Usage

Python Usage

Repository Structure

Contact

About

Topics

Resources

License

Citation

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages