You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe this is the same case as issues/24.
I can't run local-gemma in my python code.
Script
from local_gemma import LocalGemma2ForCausalLM
from transformers import AutoTokenizer
model = LocalGemma2ForCausalLM.from_pretrained("google/gemma-2-27b-it", preset="memory_extreme")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(model.device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded_text = tokenizer.batch_decode(generated_ids)
error message
$ python3 check.py
Traceback (most recent call last):
File "/dataset/localgemma/check.py", line 4, in <module>
model = LocalGemma2ForCausalLM.from_pretrained(
File "/dataset/localgemma/gemma-venv/lib/python3.10/site-packages/local_gemma/modeling_local_gemma_2.py", line 153, in from_pretrained
model = super().from_pretrained(
File "/dataset/localgemma/gemma-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3787, in from_pretrained
hf_quantizer.validate_environment(device_map=device_map)
File "/dataset/localgemma/gemma-venv/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 86, in validate_environment
raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
Hi, Thank you for this interesting project.
Maybe this is the same case as issues/24.
I can't run local-gemma in my python code.
Script
error message
pip
nvidia-smi
The text was updated successfully, but these errors were encountered: