The FT t5 beam search algorithm generates inconsistent results with HF's #522

shiqingzhangCSU · 2023-03-25T07:54:30Z

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.09-py3

GPU name

A30

CUDA Driver

11.6

model

mt5

Reproduced Steps

Hi! I tested the results of FT and HF, I found that the beamsearch algorithm in HF(https://arxiv.org/pdf/1610.02424.pdf) maybe different from the one in FT and may generate different results. Is the beamsearch in FT the implementation in this paper(https://arxiv.org/pdf/1601.00372.pdf)? I look forward to your reply!

byshiue · 2023-03-28T01:42:47Z

Do you use early_stopping in HF?

shiqingzhangCSU · 2023-03-28T03:26:12Z

Yes, I tried early_stopping and there was no change in results. We found that no_repeat_ngram_size has a greater impact on beam=1. When the no_repeat_ngram_size in HF is set to null, the result of HF is the same as that of FT.

byshiue · 2023-03-28T03:27:39Z

FT does not support early_stopping=False and no_repeat_ngram_size now.

shiqingzhangCSU · 2023-03-28T03:28:46Z

If we want to add no_repeat_ngram_size features ourselves, how do we develop them? Any suggestions?

byshiue · 2023-03-28T03:30:54Z

You can add a penalty kernel in https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/layers/DynamicDecodeLayer.cc like other penalty to make sure the logit of repeated token is -inf.

shiqingzhangCSU · 2023-03-28T03:41:32Z

If I'm not familiar with this library, how long do you estimate it will take me to add this feature?

byshiue · 2023-03-28T04:00:11Z

Sorry, I cannot help to provide the estimated time because it depends on too many factors.

shiqingzhangCSU · 2023-03-28T05:54:22Z

Sorry, I cannot help to provide the estimated time because it depends on too many factors.

Thanks for the suggestion！

shiqingzhangCSU closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The FT t5 beam search algorithm generates inconsistent results with HF's #522

The FT t5 beam search algorithm generates inconsistent results with HF's #522

shiqingzhangCSU commented Mar 25, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

The FT t5 beam search algorithm generates inconsistent results with HF's #522

The FT t5 beam search algorithm generates inconsistent results with HF's #522

Comments

shiqingzhangCSU commented Mar 25, 2023

Branch/Tag/Commit

Docker Image Version

GPU name

CUDA Driver

model

Reproduced Steps

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023

byshiue commented Mar 28, 2023

shiqingzhangCSU commented Mar 28, 2023