-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
#96
Comments
This is full logs in API mode:
versions:
|
I have the same problem, did you have solved it? |
No. Tried different versions of torch, but it did not solve the issue. What GPUs are you using? Mine are P40's. |
My gpu is V100, maybe I find the reason, the V100 device use vlota architecture,this problem lead to I can't use flash-atten normally,so most of this project is incompatible. |
Sorry, we use the Marlin operator to calculate the layer on the GPU. It requires Compute Capability 8.0 or above to run. However, the Compute Capability of P40 is 6.1, so it cannot run. Maybe you can run it by removing the Marlin operator. @Azure-Tang Maybe you can show how to remove Marlin? |
I'm trying to run a DeepSeek-V2.5 model.
Command used:
python -m ktransformers.local_chat --model_path ./DeepSeek-V2.5/ --gguf_path ../
I've tried with both ktransformers.local_chat and web interface mode, with and without --optimize_config_path option. Model loads but when first iteraction occurs it fails with this traceback. Other backends (koboldcpp and llama.cpp) runs fine.
Server specs: 200Gb ram, dual P40.
The text was updated successfully, but these errors were encountered: