Example for Hosting TensorRT OSS HuggingFace Models on Triton Inference Server

To build the TensorRT (TRT) Engines

Build TRT 8.5 OSS container

bash build_trt_oss_docker.sh

Launch the container

bash run_trt_oss_docker.sh

Change Directory and Pip install HF demo requirements

cd demo/HuggingFace
pip install -r requirements.txt

Run build_t5_trt.py to build T5 TRT engines and build_bart_trt.py to build BART engines.
- Run python3 build_t5_trt.py --help to see all options.
- gen_t5_bs1_beam2.sh is bash script that uses python script build_t5_trt.py to generate T5 engines with batch size 1 and beam size 2 for t5-small variant and saves the TRT T5 engines in Triton Model Repository
- gen_bart_bs1_greedy.sh uses python script build_bart_trt.py to generate BART engines with batch size 1 and greedy search for bart-base variant and saves the TRT BART engines in Triton Model Repository

Triton Inference

Triton Model Repository is located at model_repository. Each model has model.py associated with it and config.pbtxt along with T5/BART TRT OSS code dependencies.

We showcase 2 models BART and T5 here. Currently, TRT T5 supports both beam search and greedy search. TRT BART only supports greedy search currently.

trt_t5_bs1_beam2 = TRT T5 Max Batch Size 1 Model with Beam Search=2
trt_bart_bs1_greedy = TRT BART Max Batch Size 1 Model with Greedy Search

Currently, TensorRT engines for T5 and BART don't produce correct output for Batch Sizes > 1 (this bug is being worked on). So we only show batch size = 1 example for T5 and BART here.

Steps for Triton TRT Inference

Build Custom Triton container with TRT and other dependencies. Dockerfile is docker/triton_trt.Dockerfile

cd docker
bash build_triton_trt_docker.sh
cd ..

Launch custom Triton container

bash run_triton_trt_docker.sh

Launch JupyterLab at port 8888

bash start_jupyter.sh

Run through 1_triton_server.ipynb to Launch Triton Server
Run through 2_triton_client.ipynb to perform sample inference for T5 and BART TRT OSS HuggingFace models using Triton Server.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
demo		demo
docker		docker
triton_model_repository		triton_model_repository
triton_notebooks		triton_notebooks
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build_trt_oss_docker.sh		build_trt_oss_docker.sh
requirements.txt		requirements.txt
run_triton_trt_docker.sh		run_triton_trt_docker.sh
run_trt_oss_docker.sh		run_trt_oss_docker.sh
start_jupyter.sh		start_jupyter.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example for Hosting TensorRT OSS HuggingFace Models on Triton Inference Server

To build the TensorRT (TRT) Engines

Triton Inference

Steps for Triton TRT Inference

About

Releases

Packages

Languages

License

kshitizgupta21/triton-trt-oss

Folders and files

Latest commit

History

Repository files navigation

Example for Hosting TensorRT OSS HuggingFace Models on Triton Inference Server

To build the TensorRT (TRT) Engines

Triton Inference

Steps for Triton TRT Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages