A few questions about `is_better_than_prob` #2

SauceCat · 2024-12-13T14:56:22Z

Firstly, thanks for the insightful paper and code! 💯
I am carefully going through the code as it's closely related to one of my current use cases. I have a few questions about some details inside is_better_than_prob, and I hope we can have a discussion. 😸

I'm a bit confused about prob_C. The prompt is simply asking the LLM to output A or B, so why would we expect it to give small probabilities to C? Is it just because C is the next byte after B?
What exactly does prob_C mean? Does it act as a "tie" between A and B?
Regarding the calibration part, as I understand: compare_result.prob_A is the probability of selecting id1 when it's placed first, while compare_result_reverse.prob_B is the probability of selecting id1 when it's placed second. Shouldn't we calculate raw_prob_A as compare_result.prob_A + compare_result_reverse.prob_B? If id1 is truly better, we should expect both compare_result.prob_A and compare_result_reverse.prob_B to be high. If compare_result.prob_A is high but compare_result_reverse.prob_B becomes unexpectedly low, then we should assume that id1 is not really better, right? But why we use 1-compare_result_reverse.prob_B in the code?

prompt = prompt_template.render(
    input=input,
    output_1=output[id1],
    output_2=output[id2],
)
compare_result = params['model'].compare([prompt])[0]

if params['calibrate'] == True:
    prompt = prompt_template.render(
        input=input,
        output_1=output[id2],
        output_2=output[id1],
    )
    compare_result_reverse = params['model'].compare([prompt])[0]
    compare_result = CompareResultObject(
        raw_prob_A=compare_result.prob_A + 1-compare_result_reverse.prob_B, 
        raw_prob_B=compare_result.prob_B + 1-compare_result_reverse.prob_A, 
        raw_prob_C=compare_result.prob_C + compare_result_reverse.prob_C, 
        uncertainty=(compare_result.uncertainty + compare_result_reverse.uncertainty)/2
    )

The text was updated successfully, but these errors were encountered:

williamLyh · 2024-12-16T03:04:52Z

Hi @SauceCat , thanks for your interest in this project.

prob_C is mostly deprecated in the current repo. It was designed to handle two cases: 1) when LLM produce A=B (tie), and 2) when the output cannot be parsed into A or B. Now both cases are handled by forcing prob_A=prob_B=0.5. If you would like to customize this behavior, it also makes sense.
Thank you for spotting this. The calibrated probability should be compare_result.prob_A + compare_result_reverse.prob_B. The calibration is basically averaging the probabilities with switching orders. I guess we made this bug when we reorganize the code base. I will fix this ASAP.
I'm also developing an improved version of PairS-beam algorithm, which will be released within a month or two.
If you would like to discuss anything related to this project, you are welcomed! Just drop me an email and we can start from there.

Best,
Yinhong

SauceCat · 2024-12-16T14:50:35Z

Hey @williamLyh,

Thanks for the prompt reply!

It makes sense to me. I think it might be worth trying out a more intuitive setting: prompting the model to output A, B, or Tie explicitly. Using log_prod makes it almost impossible to produce A = B, I guess.
LGTM!
That would be great! In fact, the existing PairS-beam algorithm looks a bit complicated to me. Looking forward to the improved version! 😸
I also read the ZEPO paper as well as the Batch Calibration paper and briefly went through the code. I indeed have some thoughts derived from real-world use cases to share and discuss. Let me draft the email.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few questions about `is_better_than_prob` #2

A few questions about `is_better_than_prob` #2

SauceCat commented Dec 13, 2024

williamLyh commented Dec 16, 2024 •

edited

Loading

SauceCat commented Dec 16, 2024

A few questions about is_better_than_prob #2

A few questions about is_better_than_prob #2

Comments

SauceCat commented Dec 13, 2024

williamLyh commented Dec 16, 2024 • edited Loading

SauceCat commented Dec 16, 2024

A few questions about `is_better_than_prob` #2

A few questions about `is_better_than_prob` #2

williamLyh commented Dec 16, 2024 •

edited

Loading