This repository consists of code of the following paper:
Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, "Improving Lyrics Alignment through Joint Pitch Detection," International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022.
This repo is written in python 3. Pytorch is used as the deep learning framework. To install the required python packages, run
pip install -r requirements.txt
Besides, you might want to install some source-separation tool (e.g. Spleeter, Open-Unmix) or use your own system to prepare source-separated vocals.
Check the notebook for a quick example.
The DALI v2.0 is required for training. See instructions on how to get the dataset:
To use the DALI data loader, it is recommended to pull the repo and link to the root of this repo by running:
ln -s path/to/dali_wrapper/ DALI
The annotated Jamendo is used for evaluation:
All the songs in both datasets need to be separated and saved in advance.
When you run the training/testing scripts for the first time, hdf5 files will be generated.
python --dataset_dir=/path/to/DALI_v2.0/annotation/ --sepa_dir=/path/to/separated/DALI/vocals/
--checkpoint_dir=/where/to/save/checkpoints/ --log_dir=/where/to/save/tensorboard/logs/
--model=baseline --cuda
python --dataset_dir=/path/to/DALI_v2.0/annotation/ --sepa_dir=/path/to/separated/DALI/mp3s/
--hdf_dir=/where/to/save/hdf5/files/ --loss_w=0.5
--checkpoint_dir=/where/to/save/checkpoints/ --log_dir=/where/to/save/tensorboard/logs/
--model=MTL --cuda
Run python -h
for more options.
The following script runs alignment using a pretrained baseline model without boundary information (Baseline) on Jamendo:
python --jamendo_dir=/path/to/jamendolyrics/ --sepa_dir=/path/to/separated/jamendo/mp3s/
--load_model=./checkpoints/checkpoint_Baseline --pred_dir=/where/to/save/predictions/
The following script runs alignment using the pretrained MTL model with boundary information (MTL+BDR) on Jamendo:
python --jamendo_dir=/path/to/jamendolyrics/ --sepa_dir=/path/to/separated/jamendo/mp3s/
--load_model=./checkpoints/checkpoint_MTL --pred_dir=/where/to/save/predictions/
--bdr_model=./checkpoints/checkpoint_BDR --model=MTL
The generated csv files under pred_dir
can be easily evaluated using the evaluation script in jamendolyrics.
[1] Yun-Ning Hung, Yi-An Chen, and Yi-Hsuan Yang, “Multi-task learning for frame-level instrument recognition,” in Proc. ICASSP. 2019, pp. 381–385, IEEE.
[2] Sebastian Ewert, Meinard Müller, and Peter Grosche, “High resolution audio synchronization using chroma onset features,” in Proc. ICASSP. 2009, pp. 1869–1872, IEEE.
[3] Daniel Stoller, Simon Durand, and Sebastian Ewert, “End-to-end lyrics alignment for polyphonic music using an audio-to-character recognition model,” in Proc. ICASSP. 2019, pp. 181–185, IEEE.
[4] Gabriel Meseguer-Brocal, Alice Cohen-Hadria, and Geoffroy Peeters, “Creating DALI, a large dataset of synchronized audio, lyrics, and notes,” Transactions of the International Society for Music Information Retrieval, vol. 3, no. 1, pp. 55–67, 2020.
[5] Chitralekha Gupta, Emre Yılmaz, and Haizhou Li, “Automatic lyrics alignment and transcription in polyphonic music: Does background music help?,” in Proc. ICASSP. 2020, pp. 496–500, IEEE.
Jiawen Huang