Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU is not used even after specifying gpu_layers #163

Open
YogeshTembe opened this issue Oct 15, 2023 · 3 comments
Open

GPU is not used even after specifying gpu_layers #163

YogeshTembe opened this issue Oct 15, 2023 · 3 comments

Comments

@YogeshTembe
Copy link

I have installed ctransformers using -

pip install ctransformers[cuda]

I am trying following piece of code -

from langchain.llms import CTransformers
config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)

Here gpu_layers parameter is specified still gpu is not being used and complete load is on cpu.
Can someone please point out if there is any step missing.

@RicardoDominguez
Copy link

I am observing the same issue:

import torch
from ctransformers import AutoModelForCausalLM

local_model = 'Llama-2-7B-GGML'
llm = AutoModelForCausalLM.from_pretrained(local_model, model_file='llama-2-7b-chat.Q4_K_M.gguf', gpu_layers=50)
print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))

@jamestwhedbee
Copy link

jamestwhedbee commented Oct 17, 2023

I am seeing this too using
CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

@peter65374
Copy link

same here. still digging out...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants