ExLlamaV2

This is a fork of ExLlamaV2, the purpose is to make exllama run full on gpu to gain a bit more inference speed and also add a transformers like api to be able to use guidance and other 3rd party libraries.

Original repo: [https://github.com/turboderp/exllamav2]

Performance

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.github/workflows		.github/workflows
conversion		conversion
doc		doc
examples		examples
exllamav2		exllamav2
tests		tests
util		util
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
convert.py		convert.py
requirements.txt		requirements.txt
setup.py		setup.py
test_inference.py		test_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExLlamaV2

Performance

Updates

About

Releases

Packages

Languages

License

galindus/exllamav2

Folders and files

Latest commit

History

Repository files navigation

ExLlamaV2

Performance

Updates

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages