layout | title | description |
---|---|---|
default |
The VoxTube Dataset |
A multilingual speaker recognition dataset by ID R&D Inc. |
The VoxTube dataset is delivered in the form of YouTube URLs and corresponding meta information per video containing filtered segments with human speech.
Updated 02.2024: HuggingFace datasets implementation of a VoxTube is available here
Meta information is stored in a per-channel manner in resources/meta/*.json
files:
{
"video_id_0": [
[segment1_start, segment1_end],
[segment2_start, segment2_end],
...,
[segmentN_start, segmentN_end]
],
...
"video_id_N": [
[segment1_start, segment1_end],
...,
[segmentN_start, segmentN_end]
]
}
where the name of .json file is an id of a YouTube channel, json keys are ids of YouTube videos and each segmentX_start and segmentX_end are timestamps in seconds. For example:
# cat VoxTube/resources/meta/UC__gC1TbqcY5j_owWKKUEUQ.json
{
"LYdLsl4zJj0": [
[114.0, 118.0],
[78.0, 82.0],
[172.0, 176.0],
[302.0, 306.0],
[372.0, 376.0],
...,
[204.0, 208.0]
],
"4arwR9j58BY": [
[114.0, 118.0],
[220.0, 224.0],
[154.0, 158.0],
...,
[342.0, 346.0]
],
...
}
Please see below the examples of dataset samples obtained using the provided metadata.
The following snippets show how to download the VoxTube data using the meta .json files.
- Install ffmpeg and libsndfile1:
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg libsndfile1
- Download required .json files by cloning the VoxTube repo:
git clone https://github.com/IDRnD/VoxTube.git
- Install Python yt-dlp library:
cd VoxTube/examples
python3 -m pip install -r requirements.txt
Note that in default example script each audio is converted to 16 kHz sampling frequency .wav file and is split into 4-seconds segments.
cd VoxTube/examples
# example of one speaker downloading using meta .json file
python3 load_example.py ../resources/meta/UC-9GWCoQoMr_ey6AMhClStQ.json <DATASET_ROOT>
# example of downloading the whole dataset in N parallel jobs
# WARNING: you might run into HTTP Error 429 if there are too many requests
# (parallel jobs) used, decrease -j parameter in this case
python3 load_all_examples.py -r <DATASET_ROOT> -j N