Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer) #154

Open
victorlee0505 opened this issue Oct 4, 2023 · 17 comments

Comments

@victorlee0505
Copy link

transformers version: pip install transformers==4.34.0
ctransformersversion: pip install ctransformers==0.2.27

I encounter the following error

File ".venv\lib\site-packages\ctransformers\transformers.py", line 84, in __init__kages\ctransformers\transformers.py", line 84, in __init__
    super().__init__(**kwargs)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 366, in __init__
    self._add_tokens(self.all_special_tokens_extended, special_tokens=True)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 462, in _add_tokens
    current_vocab = self.get_vocab().copy()

File ".venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1715, in ``get_vocab
    raise NotImplementedError()``
NotImplementedError

transformers has PreTrainedTokenizer in tokenization_utils.py code change (2da8853) where _add_tokens on line 454 current_vocab = self.get_vocab().copy().

PreTrainedTokenizer itself has added_tokens_decoder and __len__ implemented, so only get_vocab would cause NotImplementedError()

@CHesketh76
Copy link

CHesketh76 commented Oct 4, 2023

Issue can be created with this code from the readme

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2", hf=True)

print(llm("AI is going to"))

or in https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL.

Hope this issue is address because finding the correct tokenizer from a different source may not be possible for most model.

@victorlee0505
Copy link
Author

PR submitted and it works for me now, this is my setup.

model = AutoModelForCausalLM.from_pretrained(..., hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)

transformers 4.34.0 now support Mistral, so I really want to use it. 😁

@CHesketh76
Copy link

I spent all day trying to get Mistral working with ctranformers, but it is returning garbage text on my end. I believe it may be the tokenizer because tokenizer = AutoTokenizer.from_pretrained(model) will not work for any model.

Capture

@victorlee0505
Copy link
Author

Yes, they refactored PreTrainedTokenizer which cTransformers tokernizer extended from. I ran open orca Mistral it runs fine with 4.34 but all quantized failed unless I go back to 4.33, so my PR fixed that.
I will try to run quantized Mistral tomorrow to see if it works.

@victorlee0505
Copy link
Author

I just ran TheBloke/Mistral-7B-OpenOrca-GGUF, it works fine for me.

@CHesketh76
Copy link

Are you able to use the model.generate(...) I have got everything to run until I start generating text, it will just run indefinitely.

@victorlee0505
Copy link
Author

ok, I quickly write this up and it works fine (you will need transformers==4.34.0 then build ctransformers from #155 and install)


import os
from ctransformers import (
    AutoModelForCausalLM as cAutoModelForCausalLM,
    AutoTokenizer as cAutoTokenizer,
)

model = cAutoModelForCausalLM.from_pretrained(
            model_path_or_repo_id="TheBloke/Mistral-7B-OpenOrca-GGUF", 
            model_file="mistral-7b-openorca.Q5_K_M.gguf", 
            model_type="mistral",
            hf=True,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1.2,
            context_length=8096,
            max_new_tokens=2048,
            threads=os.cpu_count(),
            stream=True,
            gpu_layers=0
            )
tokenizer = cAutoTokenizer.from_pretrained(model)

mistral_no_mem_prompt_template = """
<|im_start|>system
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
<|im_end|>
{placeholder}
"""

mistral_openorca_prompt = """
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
"""

mistral_no_mem_template = mistral_no_mem_prompt_template.replace("{placeholder}", mistral_openorca_prompt)
question = "The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?"
prompt = mistral_no_mem_template.replace("{input}", question)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cpu")
generated_ids = model.generate(input_ids, max_new_tokens=2048, temperature=0.7, do_sample=True)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

@CHesketh76
Copy link

CHesketh76 commented Oct 5, 2023

Still having issues with tokenizer = cAutoTokenizer.from_pretrained(model) but using Open-Orca/Mistral-7B-OpenOrca for the tokenizer appears to resolved it. I am not too happy about the speed though. When using lllm =cAutoModelForCausalLM.from_pretrained(...) and then llm('Tell me a story about a knight') it will generate a full story in 10-24 seconds (200-800 tokens). But when using the generate function it takes about 15 minutes to generate 200 tokens. I am using a 3070ti, for reference

@CHesketh76
Copy link

So i get x15 faster token output by having no gpu layers.... I think something is wrong

@victorlee0505
Copy link
Author

victorlee0505 commented Oct 6, 2023

yes, something is wrong, for me, gpu_layer has no effect.😅

I found if I build it myself, gpu_layer does not work, no idea why.

@victorlee0505
Copy link
Author

I think my lib was a bit messy yesterday. I copy get_vocab from transformers and pushed to the PR #155 . I test with open ocra mistral code from above (type mistral) and exact same code but switch the model to vicuna 1.5 gguf (type llama) and also works.
@CHesketh76 can you rebuild and give it a try?

model = cAutoModelForCausalLM.from_pretrained(
            model_path_or_repo_id="TheBloke/vicuna-13B-v1.5-16K-GGUF", 
            model_file="vicuna-13b-v1.5-16k.Q6_K.gguf", 
            model_type="llama",
            hf=True,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1.2,
            context_length=8096,
            max_new_tokens=2048,
            threads=os.cpu_count(),
            stream=True,
            gpu_layers=0
            )

@Girrajjangid
Copy link

@victorlee0505 how to rebuilt #155 ?

@victorlee0505
Copy link
Author

victorlee0505 commented Oct 11, 2023

@victorlee0505 how to rebuilt #155 ?

pip uninstall ctransformers

straight from my fork

pip install --no-cache-dir git+https://github.com/victorlee0505/ctransformers.git@vlee/transformers#egg=ctransformers[cuda]

(even tho i put [cuda], it does not work 😕)

local

git clone https://github.com/victorlee0505/ctransformers.git
cd ctransformers
git checkout vlee/transformers

# I use venv
python -m venv .venv
pip install scikit-build
pip install cmake
python setup.py sdist

under the folder dist, you will have your new package, get the full path and install

pip install --no-cache-dir full\path\to\ctransformers\dist\ctransformers-0.2.27.tar.gz[cuda]

@kirill578
Copy link

make sure to run export CT_CUBLAS=ON before python setup.py sdist otherwise it won't build the cuda support.

you might also need to setup these two in your bashrc and confirm the nvcc version matches nvidia-smi

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"

@heiko-braun
Copy link

Hi @victorlee0505 . I've rebuilt with PR #155 and can confirm, the NotImplementedError is gone. Thanks!

@victorlee0505 victorlee0505 changed the title transformers 4.34 broke CTransformersTokenizer(PreTrainedTokenizer) transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer) Oct 27, 2023
@victorlee0505
Copy link
Author

I won't move forward with this PR, I don't think it is a good fix, but ok to use as is.

I only copy one of the def get_vocab(self): implementation from transformers related to transformers.models.llama.tokenization_llama.LlamaTokenizer.get_vocab. There are different get_vocab implementation for different type. search def get_vocab(self): in transformers you will see what i mean.

Therefore I can not guarantee nor have time to figure out the perfect solution😥

aleneum added a commit to caretech-owl/gerd that referenced this issue Nov 6, 2023
@pechaut78
Copy link

pechaut78 commented Nov 7, 2023

ok, no I do not get the error on this:

tokenizer = AutoTokenizer.from_pretrained(model)
but now on:

model_inputs = tokenizer([text], return_tensors="pt")

whit this error:

File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 803, in _batch_encode_plus
first_ids = get_input_ids(ids)
File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 770, in get_input_ids
tokens = self.tokenize(text, **kwargs)
File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 617, in tokenize
tokenized_text.extend(self._tokenize(token))
File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 628, in _tokenize
raise NotImplementedError
NotImplementedError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants