-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The FT t5 beam search algorithm generates inconsistent results with HF's #522
Comments
Do you use early_stopping in HF? |
Yes, I tried early_stopping and there was no change in results. We found that no_repeat_ngram_size has a greater impact on beam=1. When the no_repeat_ngram_size in HF is set to null, the result of HF is the same as that of FT. |
FT does not support |
If we want to add no_repeat_ngram_size features ourselves, how do we develop them? Any suggestions? |
You can add a penalty kernel in https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/layers/DynamicDecodeLayer.cc like other penalty to make sure the logit of repeated token is -inf. |
If I'm not familiar with this library, how long do you estimate it will take me to add this feature? |
Sorry, I cannot help to provide the estimated time because it depends on too many factors. |
Thanks for the suggestion! |
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/nvidia/pytorch:22.09-py3
GPU name
A30
CUDA Driver
11.6
model
mt5
Reproduced Steps
Hi! I tested the results of FT and HF, I found that the beamsearch algorithm in HF(https://arxiv.org/pdf/1610.02424.pdf) maybe different from the one in FT and may generate different results. Is the beamsearch in FT the implementation in this paper(https://arxiv.org/pdf/1601.00372.pdf)? I look forward to your reply!
The text was updated successfully, but these errors were encountered: