训练代码中怎么没有看到对 MI 和对比学习相关的使用 #2

GYee · 2024-01-13T15:42:07Z

No description provided.

xuyaoxun · 2024-01-14T04:39:51Z

Thank you for your interest in our project! I'd like to address your question with two main points:

Our training process is divided into two stages. The first stage is the pre-training of the Q-former, which involves the use of contrastive learning and mutual information (MI). The second stage is the overall training, where the Q-former is trained again using the output from LLaMA. In this repository, for the sake of simplicity and clarity, we only provide the second stage of the training process. The first stage is focused on the pre-training of the Q-former, and it still requires fine-tuning during the second stage. In order to make it easier for others to use and train, we believe that providing a pre-trained checkpoint and allowing users to fine-tune it with their own data in the second stage is more beneficial. Moreover, the method we use in the first stage is directly related to our training dataset. We divide the dataset based on the manually annotated speech emotion labels and use this division for MI and contrastive learning. Unfortunately, due to certain restrictions, I cannot disclose the complete training dataset. Meanwhile, the preparation of the dataset for the first stage is more challenging than that for the second stage.
If you would like to learn more about contrastive learning and MI training, I recommend focusing on the following resources: CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information (https://proceedings.mlr.press/v119/cheng20b.html) and its GitHub repository (https://github.com/Linear95/CLUB); as well as Learning Transferable Visual Models From Natural Language Supervision (https://arxiv.org/abs/2103.00020) and its GitHub repository (https://github.com/openai/CLIP).

lucashueda · 2024-12-23T20:57:05Z

Excuse me, regarding the first stage training, in Figure 3 of the paper, both transcription embedding and caption embeddings are passed through a Q-Former block, so both transcriptions and caption embeddings outputs are Q-former-like? (shape of Q-embedding == T-Embedding == C-Embedding?)

xuyaoxun closed this as completed Feb 29, 2024

xuyaoxun mentioned this issue Jun 4, 2024

How about the 1-st stage training? #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练代码中怎么没有看到对 MI 和对比学习相关的使用 #2

训练代码中怎么没有看到对 MI 和对比学习相关的使用 #2

GYee commented Jan 13, 2024

xuyaoxun commented Jan 14, 2024

lucashueda commented Dec 23, 2024

训练代码中怎么没有看到对 MI 和 对比学习相关的使用 #2

训练代码中怎么没有看到对 MI 和 对比学习相关的使用 #2

Comments

GYee commented Jan 13, 2024

xuyaoxun commented Jan 14, 2024

lucashueda commented Dec 23, 2024

训练代码中怎么没有看到对 MI 和对比学习相关的使用 #2

训练代码中怎么没有看到对 MI 和对比学习相关的使用 #2