This is the code repo for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion. The current implementation is based on vLLM.
Python>=3.9
and CUDA >= 12.1
are required. An Nvidia GPU with >=40 GB
memory is recommended.
To install CacheBlend depenencies:
git clone [email protected]:YaoJiayi/CacheBlend.git
cd CacheBlend/vllm_blend
pip install -e .
cd ..
python example/blend.py
python example/blend_musique.py