Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why there is no need special token for chatglm3 when counting the tokens? #77

Closed
condy0919 opened this issue Sep 6, 2024 · 2 comments
Closed

Comments

@condy0919
Copy link

LongBench/pred.py

Lines 58 to 59 in 26505c8

if "chatglm3" in model_name:
tokenized_prompt = tokenizer(prompt, truncation=False, return_tensors="pt", add_special_tokens=False).input_ids[0]

@bys0318
Copy link
Member

bys0318 commented Sep 6, 2024

Hi, you can change it to add_special_tokens=True, but I guess a few special tokens do not matter much compared to a context window of 32k tokens.

@condy0919
Copy link
Author

I guess a few special tokens do not matter much compared to a context window of 32k tokens.

Agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants