Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long context rewards #1

Closed
nicolas-dufour opened this issue Nov 26, 2024 · 5 comments
Closed

Long context rewards #1

nicolas-dufour opened this issue Nov 26, 2024 · 5 comments
Assignees

Comments

@nicolas-dufour
Copy link

Hey,
Out of this models, do they support longer texts? Some of the base reward models support less than 40 text tokens!

@RE-N-Y
Copy link
Owner

RE-N-Y commented Nov 26, 2024

Which base model are you referring to?
Most multimodal reward models are CLIP-based so they should generally support 60~77 max token lengths.
The ones I've trained SiglipPreferenceScorer and CLIPPreferenceScorer should support more than 40 text tokens.

If you're interested in models with models with longer prompts, I can train one on top of jina-embeddings-v3 which support up to 8192 tokens.

Also, if you have suggestions for good backbones for multimodal reward models, happy to train one and support those too.

@nicolas-dufour
Copy link
Author

Hey,
I was thinking of Image rewards which has a 35 tokens cut off : https://github.com/THUDM/ImageReward/blob/2ca71bac4ed86b922fe53ddaec3109fe94d45fd3/ImageReward/ImageReward.py#L110

It would be indeed super useful to have a long version of this models but i think the recent jina clip v2 is a better base model (jina embedding don't have an image tower i believe) https://huggingface.co/jinaai/jina-clip-v2.

This would be a great ressource to have, as it would help with scoring synthetic prompts!

@RE-N-Y
Copy link
Owner

RE-N-Y commented Nov 27, 2024

I actually didn't add ImageReward yet. But, since it's a standard model used for image generation benchmark, I will be adding that in coming weeks. I will probably train jina-clip-v2 based image preference model on pick-a-pic-v2 soon.
I will keep the issue open until I implement those.

@RE-N-Y RE-N-Y self-assigned this Nov 27, 2024
@RE-N-Y
Copy link
Owner

RE-N-Y commented Dec 16, 2024

@nicolas-dufour quick update, I've briefly tried to train one on jina-clip-v2 sometime last week, but accuracy is around 55% ~ 60% for some reason. It was trained using the same recipe as standard CLIP / SigLIP based scorers, but still very strange. Will try again in next few days and let you know.

@RE-N-Y
Copy link
Owner

RE-N-Y commented Mar 3, 2025

Closing since #5 is added.

@RE-N-Y RE-N-Y closed this as completed Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants