Skip to content

๐ŸŽถ Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)

Notifications You must be signed in to change notification settings

ChenDelong1999/VirtualConductor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Virtual Conductor

The first step towards deep learning based music driven conducting motion generation.

model pipline

This repository is the official implementation of โ€œSelf-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generationโ€, by Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, and Feng Xu. This repository also provide the access to the ConductorMotion100 dataset, which consists of 100 hours of orchestral conductor motions and aligned music Mel spectrogram.

The above figure gives a high-level illustration of the proposed two-stage approach. The contrastive learning and generative learning stage are bridged by transferring learned music and motion encoders, as noted in dotted lines. Our approach can generate plausible, diverse, and music-synchronized conducting motion.

Updates๐Ÿ””

  • Mar 2021. Demo Video (preliminary version) released at bilibili.
  • Apr 2021. ICME 2021 Demo Video released at bilibili.
  • Apr 2021. Demo Video (with Dynamic Frequency Domain Decomposition) released.
  • Jun 2021. The recording of graduation thesis defense released. The graduation thesis is awarded as Outstanding Graduation Thesis of Hohai University (ๆฒณๆตทๅคงๅญฆไผ˜็ง€ๆฏ•ไธš่ฎบๆ–‡) and First-class Outstanding Graduation Thesis of Jiangsu Province (ๆฑŸ่‹็œไผ˜็ง€ๆฏ•ไธš่ฎบๆ–‡ไธ€็ญ‰ๅฅ–)!
  • Jul 2021. The VirtualConductor project is awarded as Best Demo of IEEE International Conference on Multimedia and Expo (ICME) 2021!
  • Mar 2022. ConductorMotion100 is made publicly available, as a track in the โ€œProspective Cupโ€ competition (่ฟœ่งๆฏ) hold by JSCS (ๆฑŸ่‹็œ่ฎก็ฎ—ๆœบๅญฆไผš). Please see here for details.
  • May 2022. Our paper is published at Journal of Computer Science and Technology (JCST). Check our paper!
  • Nov 2022. Code for JCST paper is released.

Getting Started

Install

  • Clone this repo:

    git clone https://github.com/ChenDelong1999/VirtualConductor.git
    cd VirtualConductor
  • Create a conda virtual environment and activate it:

    conda create -n VirtualConductor python=3.6 -y
    conda activate VirtualConductor
  • Install CUDA Toolkit 11.3 (link) and cudnn==8.2.1 (link), then install PyTorch==1.10.1:

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
    # if you prefer other cuda versions, please choose suitable pytorch versions
    # see: https://pytorch.org/get-started/locally/
  • Install other requirements:

    conda install ffmpeg -c conda-forge -y
    pip install librosa matplotlib scipy tqdm moviepy opencv-python tensorboard

Test on Your Own Music ๐ŸŽถ

  • Copy your music file to /test/test_samples/ folder. We have prepared some for you.
  • You need the pretrained weights of a M2S-GAN to generate motions. We have prepared a pretrained checkpoint, which is placed at checkpoints/M2SGAN/M2SGAN_official_pretrained.pt.
  • Now, by run the following comment, the test_unseen.py will do the following:
    1. enumerate all samples in /test/test_samples/ folder,

    2. extract Mel spectrogram from music,

    3. generate conducting motions, and

    4. save result videos to /test/result/

      python test_unseen.py --model 'checkpoints/M2SGAN/M2SGAN_official_pretrained.pt'

Data Preparation (ConductorMotion100)

The ConductorMotion100 dataset can be downloaded in the following ways:

You can also access the dataset via Google Drive

There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar files. After extract them to <Your Dataset Dir> folder, the file structure will be:

tree <Your Dataset Dir>
<Your Dataset Dir>
    โ”œโ”€โ”€โ”€train
    โ”‚   โ”œโ”€โ”€โ”€0
    โ”‚   โ”‚       mel.npy
    โ”‚   โ”‚       motion.npy
    |  ...
    โ”‚   โ””โ”€โ”€โ”€5268
    โ”‚           mel.npy
    โ”‚           motion.npy
    โ”œโ”€โ”€โ”€val
    โ”‚   โ”œโ”€โ”€โ”€0
    โ”‚   โ”‚       mel.npy
    โ”‚   โ”‚       motion.npy
    |  ...
    โ”‚   โ””โ”€โ”€โ”€290
    โ”‚           mel.npy
    โ”‚           motion.npy
    โ””โ”€โ”€โ”€test
        โ”œโ”€โ”€โ”€0
        โ”‚       mel.npy
        โ”‚       motion.npy
       ...
        โ””โ”€โ”€โ”€293
                mel.npy
                motion.npy

Each mel.npy and motion.npy are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128). The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)

We provide codes to load and visualize the dataset, as in utils/dataset.py. You can run this file by:

python utils/dataset.py --dataset_dir <Your Dataset Dir>

Then the script will enumerate all the samples in the dataset. You will get:

matshow

motion_plot

Training

During training, use tensorboard --logdir runs to set up tensorboard logging. Model checkpoints will be saved to /checkpoints/ folder.

  • Step 1

    • Start contrastive learning stage, train the M2S-Net:

      python M2SNet_train.py --dataset_dir <Your Dataset Dir>

      It takes ~36 hours with a Titan Xp GPU. With tensorboard (tensorboard --logdir runs), you can visualize the training procedure:

      M2SNet-tensorboard

      We also provide the visualization of the features extracted by M2S-Net M2SNet-features

  • Step 2 (optional)

    • Train a M2S-Net on test set to calculate the 'sync error' (see our paper for more details):

      python M2SNet_train.py --dataset_dir <Your Dataset Dir> --mode hard_test

      The training takes ~2.5 hours. img.png

  • Step 3

    • Start generative learning stage, train the M2S-GAN:

      python M2SGAN_train.py --dataset_dir <Your Dataset Dir>

      The training takes ~28 hours with a Titan Xp GPU. img.png

Prospective Cup (้ฆ–ๅฑŠๅ›ฝ้™…โ€œ่ฟœ่งๆฏโ€ๅ…ƒๆ™บ่ƒฝๆ•ฐๆฎๆŒ‘ๆˆ˜ๅคง่ต›)

For more details of the "Prospective Cup" competition, please see here.

License

Copyright (c) 2022 Delong Chen. Contact me for commercial use (or rather any use that is not academic research) (email: [email protected]). Free for research use, as long as proper attribution is given and this copyright notice is retained.

Papers

  1. Delong Chen, Fan Liu*, Zewen Li, Feng Xu. VirtualConductor: Music-driven Conducting Video Generation System. IEEE International Conference on Multimedia and Expo (ICME) 2021, Demo Track (Best Demo).

    @article{chen2021virtualconductor,
      author    = {Delong Chen and
                   Fan Liu and
                   Zewen Li and
                   Feng Xu},
      title     = {VirtualConductor: Music-driven Conducting Video Generation System},
      journal   = {CoRR},
      volume    = {abs/2108.04350},
      year      = {2021},
      url       = {https://arxiv.org/abs/2108.04350},
      eprinttype = {arXiv},
      eprint    = {2108.04350}
    }
  2. Fan Liu, Delong Chen*, Ruizhi Zhou, Sai Yang, and Feng Xu. Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation. Journal of Computer Science and Technology.

     @article{liu2022self,
       author    = {Fan Liu and
                    Delong Chen and
                    Ruizhi Zhou and
                    Sai Yang and
                    Feng Xu},
       title     = {Self-Supervised Music Motion Synchronization Learning for Music-Driven
                    Conducting Motion Generation},
       journal   = {Journal of Computer Science and Technology},
       volume    = {37},
       number    = {3},
       pages     = {539--558},
       year      = {2022},
       url       = {https://doi.org/10.1007/s11390-022-2030-z},
       doi       = {10.1007/s11390-022-2030-z}
     }