This repository contains auxiliary code for the dataset described in the paper
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio. Benno Weck, Holger Kirchhoff, Peter Grosche, Xavier Serra. to appear in MultiMedia Modeling, 2024.
We provide a short Python script that can be used to download the audio files from the URLs provided in the dataset. Please update the user agent string in the script before running it. Moreover, the script shows how the dataset can be loaded to get the data in a convenient format.
The dataset contains rich text description for music audio files collected from Wikipedia articles. The audio files are available for download through the URLs provided in the dataset.
We provide three variants of the dataset in the data
folder.
All are described in the paper.
all.csv
contains all the data we collected, without any filtering.filtered_sf.csv
contains the data obtained using the self-filtering method.filtered_mc.csv
contains the data obtained using the MusicCaps dataset method.
The dataset is available to download from Zenodo.
A download script is provided in download.bash
.
bash download.bash
The audio files have to be downloaded separately using the provided Python script.
python3 -m venv venv # create a virtual environment
source venv/bin/activate # activate the virtual environment
pip install -r requirements.txt # install dependencies
python3 wikimute.py # run the script
Each CSV file contains the following columns:
file
: the name of the audio filepageid
: the ID of the Wikipedia article where the text was collected fromaspects
: the short-form (tag) description texts collected from the Wikipedia articlessentences
: the long-form (caption) description texts collected from the Wikipedia articlesaudio_url
: the URL of the audio fileurl
: the URL of the Wikipedia article where the text was collected from
If you use this dataset in your research, please cite the following paper:
@inproceedings{wikimute,
title = {WikiMuTe: A web-sourced dataset of semantic descriptions for music audio},
author = {Weck, Benno and Kirchhoff, Holger and Grosche, Peter and Serra, Xavier},
booktitle = {to appear in MultiMedia Modeling.},
year = {2024},
publisher="Springer International Publishing",
address="Cham",
}
Note: The final paper is not yet published. The citation will be updated once the paper is published.
This repository is released under the MIT License. Please see the LICENSE file for more details.
The data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. Each entry in the dataset contains a URL linking to the article, where the text data was collected from.