Skip to content

wtlow003/modal-llm-serving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

532ef9e · Jun 13, 2024

History

14 Commits
Jun 12, 2024
Jun 12, 2024
Jun 1, 2024
Jun 1, 2024
Jun 1, 2024
Jun 1, 2024
Jun 13, 2024
Jun 10, 2024
Jun 1, 2024
Jun 12, 2024
Jun 1, 2024
Jun 12, 2024

Repository files navigation

Modal LLM Serving Examples and Benchmarks

python version uv ruff

About

This repo contains a collections of examples for LLM Serving on Modal. For comparison purposes on various serving frameworks, benchmarking setup heavily referenced from vLLM is also provided.

Currently, the following framework as been deployed and tested to be working via Modal Deployments.

Framework GitHub Repo Modal Script
vLLM https://github.com/vllm-project/vllm script
Text Generation Interface (TGI) https://github.com/huggingface/text-generation-inference script
LMDeploy https://github.com/InternLM/lmdeploy script

Getting Started

To ensure for deploying the respective examples, you can setup the environment using the following commands.

This project uses uv for dependency management. To install uv, please refer to this guide:

# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows.
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# With pip.
pip install uv

# With pipx.
pipx install uv

# With Homebrew.
brew install uv

# With Pacman.
pacman -S uv

To install the required dependencies:

# create a virtual env
uv venv

# install dependencies
uv pip install -r requirements.txt  # Install from a requirements.txt file.

If you are looking to contribute to the repo, you will also be required to install the pre-commit hooks to ensure that your code changes are linted and formatted accordingly:

pip install pre-commit

pre-commit install &&
pre-commit install --hook-type commit-msg

Deployment

To deploy on Modal, simply use the CLI, and deploy the respective serving framework as desired.

For example to deploy a vLLM server:

source .venv/bin/activate

modal deploy src/vllm/server.py

Upon successfully deployment, you should see the following (similar) information on your terminal:

┌───────────────────
│ 📁 ~/c/modal-llm-serving  master [!]
└─❯  modal deploy src/vllm/server.py
✓ Created objects.
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/template_mistral_7b_instruct.jinja
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/src/vllm/server.py
├── 🔨 Created download_hf_model.
└── 🔨 Created serve => https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run
✓ App deployed! 🎉

View Deployment:
https://modal.com/xxx/main/apps/deployed/vllm-mistralai--mistral-7b-instruct-v02

To access the respective Swagger UI, you can either directly access the serve URL or append /docs to the URL, depending on the serving frameworks.

Benchmark

To run benchmarks on the deployed LLM inference servers, you can run the benchmark script as follows:

python benchmark/benchmark_server.py --backend vllm \
    --model "mistralai--mistral-7b-instruct" \
    --num-request 1000 \
    --request-rate 64 \
    --num-benchmark-runs 3 \
    --max-input-len 1024 \
    --max-output-len 1024 \
    --base-url "https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run"

Important

NOTE: Replace the --base-url with your own deployment url as indicated upon successful deployment with modal deploy.

Releases

No releases published

Packages

No packages published