Skip to content

A fast inference library for running LLMs locally on modern consumer-class GPUs

License

Notifications You must be signed in to change notification settings

galindus/exllamav2

 
 

Repository files navigation

ExLlamaV2

This is a fork of ExLlamaV2, the purpose is to make exllama run full on gpu to gain a bit more inference speed and also add a transformers like api to be able to use guidance and other 3rd party libraries.

Original repo: [https://github.com/turboderp/exllamav2]

Performance

TBD

Updates

About

A fast inference library for running LLMs locally on modern consumer-class GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.1%
  • Cuda 34.1%
  • C++ 9.7%
  • C 0.1%