Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few questions about is_better_than_prob #2

Open
SauceCat opened this issue Dec 13, 2024 · 2 comments
Open

A few questions about is_better_than_prob #2

SauceCat opened this issue Dec 13, 2024 · 2 comments

Comments

@SauceCat
Copy link

Firstly, thanks for the insightful paper and code! 💯
I am carefully going through the code as it's closely related to one of my current use cases. I have a few questions about some details inside is_better_than_prob, and I hope we can have a discussion. 😸

  • I'm a bit confused about prob_C. The prompt is simply asking the LLM to output A or B, so why would we expect it to give small probabilities to C? Is it just because C is the next byte after B?
  • What exactly does prob_C mean? Does it act as a "tie" between A and B?
  • Regarding the calibration part, as I understand: compare_result.prob_A is the probability of selecting id1 when it's placed first, while compare_result_reverse.prob_B is the probability of selecting id1 when it's placed second. Shouldn't we calculate raw_prob_A as compare_result.prob_A + compare_result_reverse.prob_B? If id1 is truly better, we should expect both compare_result.prob_A and compare_result_reverse.prob_B to be high. If compare_result.prob_A is high but compare_result_reverse.prob_B becomes unexpectedly low, then we should assume that id1 is not really better, right? But why we use 1-compare_result_reverse.prob_B in the code?
prompt = prompt_template.render(
    input=input,
    output_1=output[id1],
    output_2=output[id2],
)
compare_result = params['model'].compare([prompt])[0]

if params['calibrate'] == True:
    prompt = prompt_template.render(
        input=input,
        output_1=output[id2],
        output_2=output[id1],
    )
    compare_result_reverse = params['model'].compare([prompt])[0]
    compare_result = CompareResultObject(
        raw_prob_A=compare_result.prob_A + 1-compare_result_reverse.prob_B, 
        raw_prob_B=compare_result.prob_B + 1-compare_result_reverse.prob_A, 
        raw_prob_C=compare_result.prob_C + compare_result_reverse.prob_C, 
        uncertainty=(compare_result.uncertainty + compare_result_reverse.uncertainty)/2
    )
@williamLyh
Copy link
Collaborator

williamLyh commented Dec 16, 2024

Hi @SauceCat , thanks for your interest in this project.

  1. prob_C is mostly deprecated in the current repo. It was designed to handle two cases: 1) when LLM produce A=B (tie), and 2) when the output cannot be parsed into A or B. Now both cases are handled by forcing prob_A=prob_B=0.5. If you would like to customize this behavior, it also makes sense.

  2. Thank you for spotting this. The calibrated probability should be compare_result.prob_A + compare_result_reverse.prob_B. The calibration is basically averaging the probabilities with switching orders. I guess we made this bug when we reorganize the code base. I will fix this ASAP.

  3. I'm also developing an improved version of PairS-beam algorithm, which will be released within a month or two.

  4. If you would like to discuss anything related to this project, you are welcomed! Just drop me an email and we can start from there.

Best,
Yinhong

@SauceCat
Copy link
Author

Hey @williamLyh,

Thanks for the prompt reply!

  1. It makes sense to me. I think it might be worth trying out a more intuitive setting: prompting the model to output A, B, or Tie explicitly. Using log_prod makes it almost impossible to produce A = B, I guess.
  2. LGTM!
  3. That would be great! In fact, the existing PairS-beam algorithm looks a bit complicated to me. Looking forward to the improved version! 😸
  4. I also read the ZEPO paper as well as the Batch Calibration paper and briefly went through the code. I indeed have some thoughts derived from real-world use cases to share and discuss. Let me draft the email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants