You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我的資料是下載您README中提供的link:https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip
load data by data = [json.loads(line for line in open(path, "r", encoding="utf-8")]
load tokenizer model by tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtyp=torch.bfloat16).to(device)
The text was updated successfully, but these errors were encountered:
Sorry I just found out I accidentally use llama2-7B model instead of llama2-7B-chat model.
The chat version model's scores I run is:
{
"narrativeqa": 18.82,
"qasper": 23.65,
"multifieldqa_en": 36.52,
"multifieldqa_zh": 10.59,
"hotpotqa": 26.4,
"2wikimqa": 31.85,
"musique": 7.76,
"dureader": 5.2,
"gov_report": 26.56,
"qmsum": 21.28,
"multi_news": 26.3,
"vcsum": 0.18,
"trec": 65.0,
"triviaqa": 83.17,
"samsum": 41.0,
"lsht": 18.75,
"passage_count": 1.57,
"passage_retrieval_en": 7.5,
"passage_retrieval_zh": 9.5,
"lcc": 59.04,
"repobench-p": 52.91
}
I think it's pretty closed to the numbers of those on the leaderboard.
Reopen issue #55
嗨 @bys0318 (@slatter666) 您好~
我嘗試跑 Llama2-7B-chat-4k(https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
出來的結果也跟您 leaderboard 上的成績不同,差異蠻大的。
我的執行環境不能連網,因此跟您的 pred.py 唯一的差異是 loading data & model & tokenizer from local 而已。
想請問成績為何有如此大的差異呢? 謝謝
我跑出來的成績 (seed 使用 pred.py 原始的 seed-42):
{
"narrativeqa": 14.57,
"qasper": 6.6,
"multifieldqa_en": 3.65,
"multifieldqa_zh": 4.29,
"hotpotqa": 4.27,
"2wikimqa": 5.67,
"musique": 1.3,
"dureader": 15.71,
"gov_report": 24.53,
"qmsum": 16.13,
"multi_news": 2.41,
"vcsum": 0.03,
"trec": 68.0,
"triviaqa": 88.59,
"samsum": 41.38,
"lsht": 19.75,
"passage_count": 0.5,
"passage_retrieval_en": 3.0,
"passage_retrieval_zh": 0.0,
"lcc": 66.64,
"repobench-p": 60.06
}
Copy results from github leaderboard:
{
"narrativeqa": 18.7,
"qasper": 19.2,
"multifieldqa_en": 36.8,
"multifieldqa_zh": 11.9,
"hotpotqa": 25.4,
"2wikimqa": 32.8,
"musique": 9.4,
"dureader": 5.2,
"gov_report": 27.3,
"qmsum": 20.8,
"multi_news": 25.8,
"vcsum": 0.2,
"trec": 61.5,
"triviaqa": 77.8,
"samsum": 40.7,
"lsht": 19.8,
"passage_count": 2.1,
"passage_retrieval_en": 9.8,
"passage_retrieval_zh": 0.5,
"lcc": 52.4,
"repobench-p": 43.8
}
我的資料是下載您README中提供的link:https://huggingface.co/datasets/THUDM/LongBench/resolve/main/data.zip
load data by
data = [json.loads(line for line in open(path, "r", encoding="utf-8")]
load tokenizer model by
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtyp=torch.bfloat16).to(device)
The text was updated successfully, but these errors were encountered: