VoxTube/examples at main · IDRnD/VoxTube

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
load_all_examples.py		load_all_examples.py
load_example.py		load_example.py
requirements.txt		requirements.txt

README.md

layout	title	description
default	The VoxTube Dataset	A multilingual speaker recognition dataset by ID R&D Inc.

The VoxTube dataset is delivered in the form of YouTube URLs and corresponding meta information per video containing filtered segments with human speech.

Updated 02.2024: HuggingFace datasets implementation of a VoxTube is available here

Meta file example and description

Meta information is stored in a per-channel manner in resources/meta/*.json files:

{
    "video_id_0": [
        [segment1_start, segment1_end],
        [segment2_start, segment2_end],
        ...,
        [segmentN_start, segmentN_end]
    ],
    ...
    "video_id_N": [
        [segment1_start, segment1_end],
        ...,
        [segmentN_start, segmentN_end]
    ]
}

where the name of .json file is an id of a YouTube channel, json keys are ids of YouTube videos and each segmentX_start and segmentX_end are timestamps in seconds. For example:

# cat VoxTube/resources/meta/UC__gC1TbqcY5j_owWKKUEUQ.json
{
    "LYdLsl4zJj0": [
        [114.0, 118.0],
        [78.0, 82.0],
        [172.0, 176.0],
        [302.0, 306.0],
        [372.0, 376.0],
        ...,
        [204.0, 208.0]
    ],
    "4arwR9j58BY": [
        [114.0, 118.0],
        [220.0, 224.0],
        [154.0, 158.0],
        ...,
        [342.0, 346.0]
    ],
    ...
}

Segments examples

Please see below the examples of dataset samples obtained using the provided metadata.

spk_id	video_id	timestamps
UC--EryqEbhW-VtG80N21TdA	0GSmioPWEQo	[138, 142]
UC--EryqEbhW-VtG80N21TdA	0GSmioPWEQo	[324, 328]
UC--EryqEbhW-VtG80N21TdA	a_CZzxUqKrY	[272, 276]
UCzy4jKI1KXgv8NpYzP2Ezaw	4K03k8nVgp4	[476, 480]
UCzy4jKI1KXgv8NpYzP2Ezaw	4K03k8nVgp4	[108, 112]
UCzy4jKI1KXgv8NpYzP2Ezaw	K4zDtpU435c	[218, 222]

Dataset downloading

The following snippets show how to download the VoxTube data using the meta .json files.

Pre-requisites

Install ffmpeg and libsndfile1:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg libsndfile1

Download required .json files by cloning the VoxTube repo:

git clone https://github.com/IDRnD/VoxTube.git

Install Python yt-dlp library:

cd VoxTube/examples
python3 -m pip install -r requirements.txt

Example usage

Note that in default example script each audio is converted to 16 kHz sampling frequency .wav file and is split into 4-seconds segments.

cd VoxTube/examples

# example of one speaker downloading using meta .json file
python3 load_example.py ../resources/meta/UC-9GWCoQoMr_ey6AMhClStQ.json <DATASET_ROOT>

# example of downloading the whole dataset in N parallel jobs
# WARNING: you might run into HTTP Error 429 if there are too many requests
# (parallel jobs) used, decrease -j parameter in this case
python3 load_all_examples.py -r <DATASET_ROOT> -j N

Main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Meta file example and description

Segments examples

Dataset downloading

Pre-requisites

Example usage

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Meta file example and description

Segments examples

Dataset downloading

Pre-requisites

Example usage