Sarathi-Serve

Sarathi-Serve is a high througput and low-latency LLM serving framework. Please refer to our OSDI'24 paper for more details.

Setup

Setup CUDA

Sarathi-Serve has been tested with CUDA 12.3 on H100 and A100 GPUs.

Clone repository

git clone [email protected]:microsoft/sarathi-serve.git

Create mamba environment

Setup mamba if you don't already have it,

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh # follow the instructions from there

Create a Python 3.10 environment,

mamba create -p ./env python=3.10

Install Sarathi-Serve

pip install -e . --extra-index-url https://flashinfer.ai/whl/cu121/torch2.3/

Reproducing Results

Refer to readmes in individual folders corresponding to each figure in osdi-experiments.

Citation

If you use our work, please consider citing our paper:

@article{agrawal2024taming,
  title={Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve},
  author={Agrawal, Amey and Kedia, Nitin and Panwar, Ashish and Mohan, Jayashree and Kwatra, Nipun and Gulavani, Bhargav S and Tumanov, Alexey and Ramjee, Ramachandran},
  journal={Proceedings of 18th USENIX Symposium on Operating Systems Design and Implementation, 2024, Santa Clara},
  year={2024}
}

Acknowledgment

This repository originally started as a fork of the vLLM project. Sarathi-Serve is a research prototype and does not have complete feature parity with open-source vLLM. We have only retained the most critical features and adopted the codebase for faster research iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
csrc		csrc
examples		examples
sarathi		sarathi
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
environment-dev.yml		environment-dev.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sarathi-Serve

Setup

Setup CUDA

Clone repository

Create mamba environment

Install Sarathi-Serve

Reproducing Results

Citation

Acknowledgment

About

Releases

Packages

Contributors 6

Languages

License

microsoft/sarathi-serve

Folders and files

Latest commit

History

Repository files navigation

Sarathi-Serve

Setup

Setup CUDA

Clone repository

Create mamba environment

Install Sarathi-Serve

Reproducing Results

Citation

Acknowledgment

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages