Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
example		example
inputs		inputs
vllm_blend		vllm_blend
.gitignore		.gitignore
README.md		README.md

Repository files navigation

CacheBlend (Under Construction):

This is the code repo for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion. The current implementation is based on vLLM.

Installation

Python>=3.9 and CUDA >= 12.1 are required. An Nvidia GPU with >=40 GB memory is recommended. To install CacheBlend depenencies:

git clone [email protected]:YaoJiayi/CacheBlend.git
cd CacheBlend/vllm_blend
pip install -e .
cd ..

Example run

Run LLM inference with CacheBlend

python example/blend.py

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

python example/blend_musique.py

References

About

No description, website, or topics provided.

Readme

Activity

72 stars

5 watching

7 forks

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages

Python 81.2%
Cuda 13.5%
C++ 3.6%
CMake 0.7%
Shell 0.5%
Jinja 0.2%
Other 0.3%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CacheBlend (Under Construction):

Installation

Example run

Run LLM inference with CacheBlend

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

References

About

Releases

Packages

Contributors 2

Languages

YaoJiayi/CacheBlend

Folders and files

Latest commit

History

Repository files navigation

CacheBlend (Under Construction):

Installation

Example run

Run LLM inference with CacheBlend

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages