Refactor and update readme for triton server

vectornguyen76 · Dec 1, 2024 · 0d1d7a3 · 0d1d7a3
1 parent a9cabac
commit 0d1d7a3
Show file tree

Hide file tree

Showing 11 changed files with 122 additions and 92 deletions.
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -9,7 +9,7 @@ services:
       - 8002:8002
     command: tritonserver --model-repository=/models
     volumes:
-      - ./image-search-engine/model_repository:/models
+      - ./triton-server/model_repository:/models
     deploy:
       resources:
         reservations:

diff --git a/helm-charts/README.md b/helm-charts/README.md
@@ -73,7 +73,7 @@ Instructions to create an S3 bucket and copy a model repository from local to S3
   ```
 - **Copy Model Repository**
   ```bash
-  aws s3 cp ./../image-search-engine/model_repository s3://qai-triton-repository/model_repository --recursive
+  aws s3 cp ./../triton-server/model_repository s3://qai-triton-repository/model_repository --recursive
   ```
 
 ## Install aws-ebs-csi-driver

diff --git a/image-search-engine/model_repository/README.md b/image-search-engine/model_repository/README.md
diff --git a/triton_server/README.md b/triton_server/README.md
@@ -0,0 +1,116 @@
+# Triton Model Conversion Guide
+
+This guide explains how to convert PyTorch models to ONNX and TensorRT formats for use with NVIDIA Triton Inference Server.
+
+## Directory Structure
+
+```
+model_repository/
+├── efficientnet_b3/           # PyTorch model
+│   ├── 1/
+│   │   └── model.pt
+│   └── config.pbtxt
+├── efficientnet_b3_onnx/      # ONNX model
+│   ├── 1/
+│   │   └── model.onnx
+│   └── config.pbtxt
+└── efficientnet_b3_trt/       # TensorRT model
+    ├── 1/
+    │   └── model.plan
+    └── config.pbtxt
+```
+
+## Quick Start
+
+### 1. Set Up Development Environment
+
+```bash
+# Create and activate conda environment
+conda create -n triton-convert python=3.9
+conda activate triton-convert
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Download and save PyTorch model
+python fetch_model.py
+```
+
+### 2. Convert Models Using TensorRT Docker
+
+```bash
+# Pull TensorRT container
+docker pull nvcr.io/nvidia/tensorrt:23.01-py3
+
+# Run container with mounted volume
+docker run -it --rm --gpus all \
+  -v $(pwd):/workspace \
+  -w /workspace \
+  nvcr.io/nvidia/tensorrt:23.01-py3
+
+# Convert PyTorch to ONNX
+python pytorch_to_onnx.py --dynamic_axes True --batch_size 32
+
+# Convert ONNX to TensorRT
+python onnx_to_tensorrt.py \
+  --dynamic_axes True \
+  --batch_size 32 \
+  --engine_precision FP16
+```
+
+## Conversion Details
+
+### PyTorch to ONNX Conversion
+
+- Supports dynamic batch sizes
+- Preserves model parameters and weights
+- Validates numerical accuracy between PyTorch and ONNX outputs
+- Configurable options:
+  - `--dynamic_axes`: Enable/disable dynamic batch sizing
+  - `--batch_size`: Set batch size for conversion
+  - `--opset_version`: ONNX opset version (default: 11)
+
+### ONNX to TensorRT Conversion
+
+- Optimizes model for NVIDIA GPUs
+- Supports FP16/FP32 precision
+- Configurable batch size ranges:
+  - `--min_engine_batch_size`: Minimum batch size
+  - `--opt_engine_batch_size`: Optimal batch size
+  - `--max_engine_batch_size`: Maximum batch size
+
+## Model Configurations
+
+Each model format has its own configuration in `config.pbtxt`:
+
+- **PyTorch Model**: Uses `pytorch_libtorch` backend
+- **ONNX Model**: Uses `onnxruntime_onnx` backend
+- **TensorRT Model**: Uses `tensorrt_plan` backend
+
+All models support:
+
+- Dynamic batching
+- GPU execution
+- Multiple model instances
+- Configurable input/output shapes
+
+## System Requirements
+
+- NVIDIA GPU with compute capability 6.0+
+- NVIDIA Driver 525+ (or 450.51+ for data center GPUs)
+- Docker with NVIDIA Container Toolkit
+- CUDA 12.0.1 (via TensorRT container)
+
+## Troubleshooting
+
+Common issues and solutions:
+
+- Memory errors: Reduce batch size or model precision
+- Conversion failures: Check input shapes and ONNX opset compatibility
+- Performance issues: Tune batch sizes and instance counts
+
+## References
+
+- [TensorRT Container Documentation](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html)
+- [Triton Inference Server Documentation](https://github.com/triton-inference-server/server)
+- [ONNX Model Zoo](https://github.com/onnx/models)
diff --git a/...ch-engine/model_repository/fetch_model.py → triton_server/fetch_model.py b/...ch-engine/model_repository/fetch_model.py → triton_server/fetch_model.py
@@ -14,4 +14,4 @@
 
 # Save the entire model to a file
 traced_model = torch.jit.trace(model, torch.randn(1, 3, 300, 300).to(device))
-torch.jit.save(traced_model, "./efficientnet_b3/1/model.pt")
+torch.jit.save(traced_model, "./model_repository/efficientnet_b3/1/model.pt")
diff --git a/...l_repository/efficientnet_b3/config.pbtxt → ...l_repository/efficientnet_b3/config.pbtxt b/...l_repository/efficientnet_b3/config.pbtxt → ...l_repository/efficientnet_b3/config.pbtxt
diff --git a/...ository/efficientnet_b3_onnx/config.pbtxt → ...ository/efficientnet_b3_onnx/config.pbtxt b/...ository/efficientnet_b3_onnx/config.pbtxt → ...ository/efficientnet_b3_onnx/config.pbtxt
diff --git a/...pository/efficientnet_b3_trt/config.pbtxt → ...pository/efficientnet_b3_trt/config.pbtxt b/...pository/efficientnet_b3_trt/config.pbtxt → ...pository/efficientnet_b3_trt/config.pbtxt
diff --git a/...gine/model_repository/onnx_to_tensorrt.py → triton_server/onnx_to_tensorrt.py b/...gine/model_repository/onnx_to_tensorrt.py → triton_server/onnx_to_tensorrt.py
@@ -34,13 +34,13 @@ def parse_args():
     parser.add_argument(
         "--onnx_model_path",
         help="onnx model path",
-        default="./efficientnet_b3_onnx/1/model.onnx",
+        default="./model_repository/efficientnet_b3_onnx/1/model.onnx",
     )
     parser.add_argument(
         "--tensorrt_engine_path",
         help="tensorrt engine path",
         # default="./tensorrt_engine.engine",
-        default="./efficientnet_b3_trt/1/model.plan",
+        default="./model_repository/efficientnet_b3_trt/1/model.plan",
     )
 
     # TensorRT engine params

diff --git a/...ngine/model_repository/pytorch_to_onnx.py → triton_server/pytorch_to_onnx.py b/...ngine/model_repository/pytorch_to_onnx.py → triton_server/pytorch_to_onnx.py
@@ -30,7 +30,7 @@ def parse_args():
     parser.add_argument(
         "--output_path",
         help="onnx model path",
-        default="./efficientnet_b3_onnx/1/model.onnx",
+        default="./model_repository/efficientnet_b3_onnx/1/model.onnx",
     )
 
     # ONNX params

diff --git a/...-engine/model_repository/requirements.txt → triton_server/requirements.txt b/...-engine/model_repository/requirements.txt → triton_server/requirements.txt