Skip to content

Commit

Permalink
Refactor and update readme for triton server
Browse files Browse the repository at this point in the history
  • Loading branch information
vectornguyen76 committed Dec 1, 2024
1 parent a9cabac commit 0d1d7a3
Show file tree
Hide file tree
Showing 11 changed files with 122 additions and 92 deletions.
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ services:
- 8002:8002
command: tritonserver --model-repository=/models
volumes:
- ./image-search-engine/model_repository:/models
- ./triton-server/model_repository:/models
deploy:
resources:
reservations:
Expand Down
2 changes: 1 addition & 1 deletion helm-charts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Instructions to create an S3 bucket and copy a model repository from local to S3
```
- **Copy Model Repository**
```bash
aws s3 cp ./../image-search-engine/model_repository s3://qai-triton-repository/model_repository --recursive
aws s3 cp ./../triton-server/model_repository s3://qai-triton-repository/model_repository --recursive
```

## Install aws-ebs-csi-driver
Expand Down
86 changes: 0 additions & 86 deletions image-search-engine/model_repository/README.md

This file was deleted.

116 changes: 116 additions & 0 deletions triton_server/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Triton Model Conversion Guide

This guide explains how to convert PyTorch models to ONNX and TensorRT formats for use with NVIDIA Triton Inference Server.

## Directory Structure

```
model_repository/
├── efficientnet_b3/ # PyTorch model
│ ├── 1/
│ │ └── model.pt
│ └── config.pbtxt
├── efficientnet_b3_onnx/ # ONNX model
│ ├── 1/
│ │ └── model.onnx
│ └── config.pbtxt
└── efficientnet_b3_trt/ # TensorRT model
├── 1/
│ └── model.plan
└── config.pbtxt
```

## Quick Start

### 1. Set Up Development Environment

```bash
# Create and activate conda environment
conda create -n triton-convert python=3.9
conda activate triton-convert

# Install dependencies
pip install -r requirements.txt

# Download and save PyTorch model
python fetch_model.py
```

### 2. Convert Models Using TensorRT Docker

```bash
# Pull TensorRT container
docker pull nvcr.io/nvidia/tensorrt:23.01-py3

# Run container with mounted volume
docker run -it --rm --gpus all \
-v $(pwd):/workspace \
-w /workspace \
nvcr.io/nvidia/tensorrt:23.01-py3

# Convert PyTorch to ONNX
python pytorch_to_onnx.py --dynamic_axes True --batch_size 32

# Convert ONNX to TensorRT
python onnx_to_tensorrt.py \
--dynamic_axes True \
--batch_size 32 \
--engine_precision FP16
```

## Conversion Details

### PyTorch to ONNX Conversion

- Supports dynamic batch sizes
- Preserves model parameters and weights
- Validates numerical accuracy between PyTorch and ONNX outputs
- Configurable options:
- `--dynamic_axes`: Enable/disable dynamic batch sizing
- `--batch_size`: Set batch size for conversion
- `--opset_version`: ONNX opset version (default: 11)

### ONNX to TensorRT Conversion

- Optimizes model for NVIDIA GPUs
- Supports FP16/FP32 precision
- Configurable batch size ranges:
- `--min_engine_batch_size`: Minimum batch size
- `--opt_engine_batch_size`: Optimal batch size
- `--max_engine_batch_size`: Maximum batch size

## Model Configurations

Each model format has its own configuration in `config.pbtxt`:

- **PyTorch Model**: Uses `pytorch_libtorch` backend
- **ONNX Model**: Uses `onnxruntime_onnx` backend
- **TensorRT Model**: Uses `tensorrt_plan` backend

All models support:

- Dynamic batching
- GPU execution
- Multiple model instances
- Configurable input/output shapes

## System Requirements

- NVIDIA GPU with compute capability 6.0+
- NVIDIA Driver 525+ (or 450.51+ for data center GPUs)
- Docker with NVIDIA Container Toolkit
- CUDA 12.0.1 (via TensorRT container)

## Troubleshooting

Common issues and solutions:

- Memory errors: Reduce batch size or model precision
- Conversion failures: Check input shapes and ONNX opset compatibility
- Performance issues: Tune batch sizes and instance counts

## References

- [TensorRT Container Documentation](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html)
- [Triton Inference Server Documentation](https://github.com/triton-inference-server/server)
- [ONNX Model Zoo](https://github.com/onnx/models)
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@

# Save the entire model to a file
traced_model = torch.jit.trace(model, torch.randn(1, 3, 300, 300).to(device))
torch.jit.save(traced_model, "./efficientnet_b3/1/model.pt")
torch.jit.save(traced_model, "./model_repository/efficientnet_b3/1/model.pt")
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@ def parse_args():
parser.add_argument(
"--onnx_model_path",
help="onnx model path",
default="./efficientnet_b3_onnx/1/model.onnx",
default="./model_repository/efficientnet_b3_onnx/1/model.onnx",
)
parser.add_argument(
"--tensorrt_engine_path",
help="tensorrt engine path",
# default="./tensorrt_engine.engine",
default="./efficientnet_b3_trt/1/model.plan",
default="./model_repository/efficientnet_b3_trt/1/model.plan",
)

# TensorRT engine params
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def parse_args():
parser.add_argument(
"--output_path",
help="onnx model path",
default="./efficientnet_b3_onnx/1/model.onnx",
default="./model_repository/efficientnet_b3_onnx/1/model.onnx",
)

# ONNX params
Expand Down
File renamed without changes.

0 comments on commit 0d1d7a3

Please sign in to comment.