Long context rewards #1

nicolas-dufour · 2024-11-26T17:21:30Z

Hey,
Out of this models, do they support longer texts? Some of the base reward models support less than 40 text tokens!

RE-N-Y · 2024-11-26T23:49:45Z

Which base model are you referring to?
Most multimodal reward models are CLIP-based so they should generally support 60~77 max token lengths.
The ones I've trained SiglipPreferenceScorer and CLIPPreferenceScorer should support more than 40 text tokens.

If you're interested in models with models with longer prompts, I can train one on top of jina-embeddings-v3 which support up to 8192 tokens.

Also, if you have suggestions for good backbones for multimodal reward models, happy to train one and support those too.

nicolas-dufour · 2024-11-27T01:10:52Z

Hey,
I was thinking of Image rewards which has a 35 tokens cut off : https://github.com/THUDM/ImageReward/blob/2ca71bac4ed86b922fe53ddaec3109fe94d45fd3/ImageReward/ImageReward.py#L110

It would be indeed super useful to have a long version of this models but i think the recent jina clip v2 is a better base model (jina embedding don't have an image tower i believe) https://huggingface.co/jinaai/jina-clip-v2.

This would be a great ressource to have, as it would help with scoring synthetic prompts!

RE-N-Y · 2024-11-27T19:09:00Z

I actually didn't add ImageReward yet. But, since it's a standard model used for image generation benchmark, I will be adding that in coming weeks. I will probably train jina-clip-v2 based image preference model on pick-a-pic-v2 soon.
I will keep the issue open until I implement those.

RE-N-Y · 2024-12-16T18:21:31Z

@nicolas-dufour quick update, I've briefly tried to train one on jina-clip-v2 sometime last week, but accuracy is around 55% ~ 60% for some reason. It was trained using the same recipe as standard CLIP / SigLIP based scorers, but still very strange. Will try again in next few days and let you know.

RE-N-Y · 2025-03-03T07:28:54Z

Closing since #5 is added.

RE-N-Y self-assigned this Nov 27, 2024

RE-N-Y closed this as completed Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long context rewards #1

Long context rewards #1

nicolas-dufour commented Nov 26, 2024

RE-N-Y commented Nov 26, 2024

nicolas-dufour commented Nov 27, 2024

RE-N-Y commented Nov 27, 2024

RE-N-Y commented Dec 16, 2024

RE-N-Y commented Mar 3, 2025

Long context rewards #1

Long context rewards #1

Comments

nicolas-dufour commented Nov 26, 2024

RE-N-Y commented Nov 26, 2024

nicolas-dufour commented Nov 27, 2024

RE-N-Y commented Nov 27, 2024

RE-N-Y commented Dec 16, 2024

RE-N-Y commented Mar 3, 2025