example for audio caption using SLAM-LLM #54

cwx-worst-one · 2024-04-27T07:37:57Z

What does this PR do?

Fixes # (issue)

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

ddlBoJack · 2024-04-27T08:28:21Z

examples/aac_audiocaps/README.md

+We only train the linear projector in this recipe. We use [EAT](https://github.com/cwx-worst-one/EAT) and [BEATs](https://github.com/microsoft/unilm/tree/master/beats) as the main audio encoder for SLAM-AAC. Be sure to set up the corresponding environments based on the instructions provided in each repository.
+Audio Encoder | Projector | LLM | SPIDEr
+|---|---|---|---|
+[EAT-base (fine-tuned)](https://drive.google.com/file/d/1aCYiQmoZv_Gh1FxnR-CCWpNAp6DIJzn6/view?usp=sharing) | Linear(~18.88M) | [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) | 0.4692


add more metrics bedsides SPIDEr

ddlBoJack · 2024-04-27T08:29:00Z

examples/aac_audiocaps/README.md

+We only train the linear projector in this recipe. We use [EAT](https://github.com/cwx-worst-one/EAT) and [BEATs](https://github.com/microsoft/unilm/tree/master/beats) as the main audio encoder for SLAM-AAC. Be sure to set up the corresponding environments based on the instructions provided in each repository.
+Audio Encoder | Projector | LLM | SPIDEr
+|---|---|---|---|
+[EAT-base (fine-tuned)](https://drive.google.com/file/d/1aCYiQmoZv_Gh1FxnR-CCWpNAp6DIJzn6/view?usp=sharing) | Linear(~18.88M) | [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) | 0.4692


add projector weights on google drive

ddlBoJack · 2024-04-27T08:33:43Z

scripts/conf/aac_vicuna_lora.yaml

@@ -0,0 +1,126 @@
+
+model_config:


use python config

update

259d99c

ddlBoJack reviewed Apr 27, 2024

View reviewed changes

scripts/conf/aac_vicuna_lora.yaml

@@ -0,0 +1,126 @@

model_config:

Copy link

Collaborator

ddlBoJack Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use python config

ddlBoJack closed this Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example for audio caption using SLAM-LLM #54

example for audio caption using SLAM-LLM #54

cwx-worst-one commented Apr 27, 2024

ddlBoJack Apr 27, 2024

ddlBoJack Apr 27, 2024

ddlBoJack Apr 27, 2024

example for audio caption using SLAM-LLM #54

example for audio caption using SLAM-LLM #54

Conversation

cwx-worst-one commented Apr 27, 2024

What does this PR do?

Feature/Issue validation/testing

Before submitting

ddlBoJack Apr 27, 2024

Choose a reason for hiding this comment

ddlBoJack Apr 27, 2024

Choose a reason for hiding this comment

ddlBoJack Apr 27, 2024

Choose a reason for hiding this comment