Why there is no need special token for chatglm3 when counting the tokens? #77

condy0919 · 2024-09-06T02:37:55Z

Lines 58 to 59 in 26505c8

    
           if "chatglm3" in model_name: 
        
               tokenized_prompt = tokenizer(prompt, truncation=False, return_tensors="pt", add_special_tokens=False).input_ids[0]

bys0318 · 2024-09-06T03:24:10Z

Hi, you can change it to add_special_tokens=True, but I guess a few special tokens do not matter much compared to a context window of 32k tokens.

condy0919 · 2024-09-09T07:11:19Z

I guess a few special tokens do not matter much compared to a context window of 32k tokens.

Agreed.

condy0919 closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why there is no need special token for chatglm3 when counting the tokens? #77

Why there is no need special token for chatglm3 when counting the tokens? #77

condy0919 commented Sep 6, 2024

bys0318 commented Sep 6, 2024

condy0919 commented Sep 9, 2024

Why there is no need special token for chatglm3 when counting the tokens? #77

Why there is no need special token for chatglm3 when counting the tokens? #77

Comments

condy0919 commented Sep 6, 2024

bys0318 commented Sep 6, 2024

condy0919 commented Sep 9, 2024