Name	Name	Last commit message	Last commit date
parent directory ..
conf	conf
pictures	pictures
README.md	README.md
multispeaker_simulator.py	multispeaker_simulator.py

Speech Data Simulator

Outline

The speech data simulator generates synthetic multispeaker audio sessions for training or evaluating models for multispeaker ASR or speaker diarization. This tool aims to address the lack of labelled multispeaker training data and to help models deal with overlapping speech.

The simulator loads audio files from different speakers as well as forced alignments for each sentence and concatenates the audio files together to build a synthetic multispeaker audio session. The simulator uses the word alignments to segment the audio from each speaker to produce utterances of the desired length. The simulator also incorporates synthetic room impulse response (RIR) generation in order to simulate multi-microphone multispeaker sessions.

Features

The simulator is reconfigurable and has several options including:

Amount of overlapping speech
- The percentage of overlapping speech out of the total speaker time.
Percentage of silence
- The percentage of the overall audio session that has no speakers talking.
Sentence length distribution
- The distribution of sentence lengths that is used for sampling (the parameters passed in are for a negative binomial distribution).
Number of speakers per session
Length of each session
Variance in speaker dominance
- Determines what portion of the speaking time will be used by each speaker in a session. Increasing this value will make it more likely that a few speakers dominate the conversation.
Turn taking
- Determines how likely it is that a speaker keeps talking after completing an utterance.
Background noise

The simulator can be used in two modes: near field (no Room Impulse Response) as well as far field (including synthetic RIR). When using synthetic RIR generation, multiple microphones can be placed in the simulated room environment for multichannel simulations.

The simulator also has a speaker enforcement mode which ensures that the correct number of speakers appear in each session (otherwise not guaranteed since speaker turns are stochastic). In speaker enforcement mode, the length of the session or speaker probabilities may be adjusted to ensure all speakers are present.

Required Datasets

LibriSpeech (or another single-speaker dataset)
LibriSpeech word alignments from here (or alignments corresponding to another single-speaker dataset)

Example alignment format from the LibriSpeech dataset (to be passed as input to the scripts/speaker_tasks/create_librispeech_alignment_manifest.py script):

Alignment files are stored at //-.txt, and each line in the alignment file corresponds to a separate sentence
Example of a line in dev-clean/1272/128104/1272-128104.txt': '1272-128104-0000 ",MISTER,QUILTER,IS,THE,APOSTLE,OF,THE,MIDDLE,CLASSES,,AND,WE,ARE,GLAD,TO,WELCOME,HIS,GOSPEL," "0.500,0.800,1.270,1.400,1.520,2.150,2.270,2.350,2.620,3.270,3.300,3.450,3.600,3.670,4.070,4.200,4.600,4.840,5.510,5.855"

Optional Datasets

Room Impulse Response and Noise Database from here (or another background noise dataset)

Installation (after installing NeMo)

Note that only one of gpuRIR or pyroomacoustics is required for RIR simulation.

pip install cmake
pip install https://github.com/DavidDiazGuerra/gpuRIR/zipball/master
pip install pyroomacoustics

Parameters

Data simulator parameters are contained in conf/data_simulator.yaml

Example Session

Example multispeaker audio session (using LibriSpeech audio samples and word alignments). RTTM and CTM output labels are highlighted.

Running the data simulator for the LibriSpeech dataset

Download the LibriSpeech dataset

python scripts/dataset_processing/get_librispeech_data.py \
  --data_root <path to download LibriSpeech dataset to> \
  --data_sets ALL

Download LibriSpeech alignments from here (the base directory is the LibriSpeech-Alignments directory)
Create the manifest file with alignments

python <NeMo base path>/scripts/speaker_tasks/create_alignment_manifest.py \
  --input_manifest_filepath <Path to train_clean_100.json manifest file> \
  --base_alignment_path <Path to LibriSpeech_Alignments directory> \
  --output_manifest_filepath train-clean-100-align.json \
  --ctm_output_directory ./ctm_out \
  --libri_dataset_split train-clean-100

(Optional) Create the background noise manifest file

python <NeMo base path>/scripts/speaker_tasks/pathfiles_to_diarize_manifest.py \
--paths2audio_files <Path to noise list file> \
--manifest_filepath bg_noise.json

Create audio sessions (near field)

python multispeaker_simulator.py --config-path='conf' --config-name='data_simulator.yaml' \
  data_simulator.random_seed=42 \
  data_simulator.manifest_filepath=./train-clean-100-align.json \
  data_simulator.outputs.output_dir=./test \
  data_simulator.background_noise.add_bg=True \
  data_simulator.background_noise.background_manifest=./bg_noise.json

Create multi-microphone audio sessions (with synthetic RIR generation)

python multispeaker_simulator.py --config-path='conf' --config-name='data_simulator.yaml' \
  data_simulator.random_seed=42 \
  data_simulator.manifest_filepath=./train-clean-100-align.json \
  data_simulator.outputs.output_dir=./test_rir \
  data_simulator.background_noise.add_bg=True \
  data_simulator.background_noise.background_manifest=./bg_noise.json
  data_simulator.rir_generation.use_rir=True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech_data_simulator

speech_data_simulator

README.md

Speech Data Simulator

Outline

Features

Required Datasets

Optional Datasets

Installation (after installing NeMo)

Parameters

Example Session

Running the data simulator for the LibriSpeech dataset

Files

speech_data_simulator

Directory actions

More options

Directory actions

More options

Latest commit

History

speech_data_simulator

Folders and files

parent directory

README.md

Speech Data Simulator

Outline

Features

Required Datasets

Optional Datasets

Installation (after installing NeMo)

Parameters

Example Session

Running the data simulator for the LibriSpeech dataset