Modal LLM Serving Examples and Benchmarks

About

This repo contains a collections of examples for LLM Serving on Modal. For comparison purposes on various serving frameworks, benchmarking setup heavily referenced from vLLM is also provided.

Currently, the following framework as been deployed and tested to be working via Modal Deployments.

Framework	GitHub Repo	Modal Script
vLLM	https://github.com/vllm-project/vllm	script
Text Generation Interface (TGI)	https://github.com/huggingface/text-generation-inference	script
LMDeploy	https://github.com/InternLM/lmdeploy	script

Getting Started

To ensure for deploying the respective examples, you can setup the environment using the following commands.

This project uses uv for dependency management. To install uv, please refer to this guide:

# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows.
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# With pip.
pip install uv

# With pipx.
pipx install uv

# With Homebrew.
brew install uv

# With Pacman.
pacman -S uv

To install the required dependencies:

# create a virtual env
uv venv

# install dependencies
uv pip install -r requirements.txt  # Install from a requirements.txt file.

If you are looking to contribute to the repo, you will also be required to install the pre-commit hooks to ensure that your code changes are linted and formatted accordingly:

pip install pre-commit

pre-commit install &&
pre-commit install --hook-type commit-msg

Deployment

To deploy on Modal, simply use the CLI, and deploy the respective serving framework as desired.

For example to deploy a vLLM server:

source .venv/bin/activate

modal deploy src/vllm/server.py

Upon successfully deployment, you should see the following (similar) information on your terminal:

┌───────────────────
│ 📁 ~/c/modal-llm-serving  master [!]
└─❯  modal deploy src/vllm/server.py
✓ Created objects.
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/template_mistral_7b_instruct.jinja
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/src/vllm/server.py
├── 🔨 Created download_hf_model.
└── 🔨 Created serve => https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run
✓ App deployed! 🎉

View Deployment:
https://modal.com/xxx/main/apps/deployed/vllm-mistralai--mistral-7b-instruct-v02

To access the respective Swagger UI, you can either directly access the serve URL or append /docs to the URL, depending on the serving frameworks.

Benchmark

To run benchmarks on the deployed LLM inference servers, you can run the benchmark script as follows:

python benchmark/benchmark_server.py --backend vllm \
    --model "mistralai--mistral-7b-instruct" \
    --num-request 1000 \
    --request-rate 64 \
    --num-benchmark-runs 3 \
    --max-input-len 1024 \
    --max-output-len 1024 \
    --base-url "https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run"

Important

NOTE: Replace the --base-url with your own deployment url as indicated upon successful deployment with modal deploy.

Name	Name	Last commit message	Last commit date
Latest commit wtlow003 fix: update README.md Jun 13, 2024 532ef9e · Jun 13, 2024 History 14 Commits
benchmark	benchmark	fix: modal application and serving template	Jun 12, 2024
src	src	fix: modal application and serving template	Jun 12, 2024
.gitignore	.gitignore	Initial commit	Jun 1, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	feat: init commit	Jun 1, 2024
.python-version	.python-version	feat: init commit	Jun 1, 2024
LICENSE	LICENSE	Initial commit	Jun 1, 2024
README.md	README.md	fix: update README.md	Jun 13, 2024
requirements.txt	requirements.txt	fix: add loguru deps	Jun 10, 2024
ruff.toml	ruff.toml	feat: init commit	Jun 1, 2024
template_llama_2.jinja	template_llama_2.jinja	fix: modal application and serving template	Jun 12, 2024
template_mistral_7b_instruct.jinja	template_mistral_7b_instruct.jinja	feat: init commit	Jun 1, 2024
template_qwen.jinja	template_qwen.jinja	fix: modal application and serving template	Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modal LLM Serving Examples and Benchmarks

About

Getting Started

Deployment

Benchmark

About

Releases

Packages

Languages

License

wtlow003/modal-llm-serving

Folders and files

Latest commit

History

Repository files navigation

Modal LLM Serving Examples and Benchmarks

About

Getting Started

Deployment

Benchmark

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages