VSR_LRS3

Performance and checkpoints

We only train the linear projector in this recipe.

Encoder	Projector	LLM	test
AV-HuBERT Large + Self-Training	Linear(~15.74M)	vicuna-7b-v1.5	29.47

Follow the steps in preparation of av_hubert to pre-process LRS3 dataset

Use the specific fairseq version of av_hubert, which is compatible with hydra-core versions below 1.0.7 and omegaconf versions below 2.0.6.

bash decode_avhubert_vo_vicuna_7b.sh

Modify the path including speech_encoder_path, llm_path, output_dir, ckpt_path and decode_log in the script when you run the shell script.

bash finetune_avhubert_vo_vicuna_7b.sh