-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer) #154
Comments
Issue can be created with this code from the readme
or in https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL. Hope this issue is address because finding the correct tokenizer from a different source may not be possible for most model. |
PR submitted and it works for me now, this is my setup.
transformers 4.34.0 now support Mistral, so I really want to use it. 😁 |
Yes, they refactored PreTrainedTokenizer which cTransformers tokernizer extended from. I ran open orca Mistral it runs fine with 4.34 but all quantized failed unless I go back to 4.33, so my PR fixed that. |
I just ran TheBloke/Mistral-7B-OpenOrca-GGUF, it works fine for me. |
Are you able to use the |
ok, I quickly write this up and it works fine (you will need transformers==4.34.0 then build ctransformers from #155 and install)
|
Still having issues with |
So i get x15 faster token output by having no gpu layers.... I think something is wrong |
yes, something is wrong, for me, gpu_layer has no effect.😅 I found if I build it myself, gpu_layer does not work, no idea why. |
I think my lib was a bit messy yesterday. I copy get_vocab from transformers and pushed to the PR #155 . I test with open ocra mistral code from above (type mistral) and exact same code but switch the model to vicuna 1.5 gguf (type llama) and also works.
|
@victorlee0505 how to rebuilt #155 ? |
straight from my fork
(even tho i put [cuda], it does not work 😕) local
under the folder dist, you will have your new package, get the full path and install
|
make sure to run you might also need to setup these two in your bashrc and confirm the
|
Hi @victorlee0505 . I've rebuilt with PR #155 and can confirm, the |
I won't move forward with this PR, I don't think it is a good fix, but ok to use as is. I only copy one of the def get_vocab(self): implementation from transformers related to transformers.models.llama.tokenization_llama.LlamaTokenizer.get_vocab. There are different get_vocab implementation for different type. search Therefore I can not guarantee nor have time to figure out the perfect solution😥 |
ok, no I do not get the error on this:
whit this error:
|
transformers version:
pip install transformers==4.34.0
ctransformersversion:
pip install ctransformers==0.2.27
I encounter the following error
transformers has PreTrainedTokenizer in
tokenization_utils.py
code change (2da8853) where _add_tokens on line 454current_vocab = self.get_vocab().copy()
.PreTrainedTokenizer itself has
added_tokens_decoder
and__len__
implemented, so onlyget_vocab
would cause NotImplementedError()The text was updated successfully, but these errors were encountered: