We only train the linear projector in this recipe.
Encoder | Projector | LLM | test |
---|---|---|---|
AV-HuBERT Large + Self-Training | Linear(~15.74M) | vicuna-7b-v1.5 | 29.47 |
Follow the steps in preparation of av_hubert to pre-process LRS3 dataset
Use the specific fairseq version of av_hubert, which is compatible with hydra-core versions below 1.0.7 and omegaconf versions below 2.0.6.
bash decode_avhubert_vo_vicuna_7b.sh
Modify the path including speech_encoder_path
, llm_path
, output_dir
, ckpt_path
and decode_log
in the script when you run the shell script.
bash finetune_avhubert_vo_vicuna_7b.sh