CLIP-TensorRT

This is a simple repository for accelerating CLIP inference using TensorRT, which includes directly converting the ONNX model to a TRT inference engine, as well as importing a self-written LayerNorm Plugin to achieve inference acceleration.

CLIP model is a multimodal pre-trained neural network, which is an efficient and scalable method for learning from natural language supervision. The core idea of the model is to use a large number of image and text pairing data for pre-training to learn the alignment relationship between image and text. CLIP model has two modes, one is text mode, one is visual mode, including two main parts:

Text Encoder: Used to convert text into a low-dimensional vector.
Image Encoder: It is used to transform the image into a similar vector representation.

In the prediction phase, the CLIP model generates predictions by calculating the cosine similarity (CS) between text and image vectors.

Cat

CS: 98%

Dog

CS: 99%

Chicken

CS: 97%

People

CS: 98%

Requirements

We have tested it on CUDA 12.1 ,TensorRT 8.6.1 ,ONNX 1.17.0 ,onnxruntime 1.13.1 .

Usage

Export the visual encoder and text encoder of CLIP to the clip_visual.onnx file and clip_textual.onnx, respectively

git clone https://github.com/xiaolu-luu/CLIP-TensorRT.git
python clip_onnx_export.py

Running Foundational CLIP-TensorRT

cd clip_trt
python clip-inference.py >result.log 2>&1

Running Modified CLIP-TensorRT (Using Plugin)

Compile the handwritten TextualLayerNorm plugin and VisualLayerNorm plugin, then test the correctness of both plugins.

cd LayerNormPlugin
make clean
make
python testLayerNormPlugin.py >test.log 2>&1

Build the engine and inference.

cd clip_trt
python clip-inference-plugin.py >result-plugin.log 2>&1

If something doesn't work when export onnx

It happens that onnx does not convert the model the first time, in these cases it is worth trying to run it again.

If it doesn't help, it makes sense to change the export settings.

Model export options in onnx looks like this:

DEFAULT_EXPORT = dict(input_names=['input'], output_names=['output'],
                      export_params=True, verbose=False, opset_version=12,
                      do_constant_folding=True,
                      dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

You can change them pretty easily.

from clip_onnx.utils import DEFAULT_EXPORT

DEFAULT_EXPORT["opset_version"] = 15

Alternative option (change only visual or textual):

from clip_onnx import clip_onnx
from clip_onnx.utils import DEFAULT_EXPORT

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

textual_export_params = DEFAULT_EXPORT.copy()
textual_export_params["dynamic_axes"] = {'input': {1: 'batch_size'},
                                         'output': {0: 'batch_size'}}
textual_export_params["opset_version"] = 12

Textual = lambda x: x

onnx_model = clip_onnx(model.cpu(), visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(dummy_input_image, dummy_input_text, verbose=True,
                        textual_wrapper=Textual,
                        textual_export_params=textual_export_params)

Additional Information

Introduces some of the uses of trtexec and poly for model export and performance verification

1. Using trtexec

Export engine by using trtexec.

trtexec --onnx=../clip_textual.onnx \
        --memPoolSize=workspace:2048 \
        --saveEngine=./engines/clip_textual_trt.engine \
        --profilingVerbosity=detailed \
        --dumpOutput \
        --dumpProfile \
        --dumpLayerInfo \
        --exportOutput=./build/log/build_output_textual.log \
        --exportProfile=./build/log/build_profile_textual.log \
        --exportLayerInfo=./build/log/build_layer_info_textual.log \
        --warmUp=200 \
        --iterations=50 \
        --verbose \
        > ./build/log/result_trt_build_textual.log

2. Using polygraphy

Export engine by using polygraphy.

polygraphy run ../clip_textual.onnx \
    --trt \
    --save-engine ./engines/clip_textual_poly.plan \
    --save-timing-cache ./engines/clip_textual_poly.cache \
    --save-tactics ./engines/clip_textual_poly_tactics.json \
    --trt-min-shapes 'input:[1,77]' \
    --trt-opt-shapes 'input:[4,77]' \
    --trt-max-shapes 'input:[16,77]' \
    --fp16 \
    --pool-limit workspace:1G \
    --builder-optimization-level 5 \
    --max-aux-streams 4 \
    --input-shapes   'input:[4,77]' \
    --verbose \
    > ./build/log/result-poly-01.log 2>&1

Compare the output of each layer between Onnxruntime and TensorRT

polygraphy run ../clip_textual.onnx \
    --onnxrt --trt \
    --save-engine=./engines/clip_textual_poly.plan \
    --onnx-outputs mark all \
    --trt-outputs mark all \
    --trt-min-shapes 'input:[1,77]' \
    --trt-opt-shapes 'input:[4,77]' \
    --trt-max-shapes 'input:[16,77]' \
    --input-shapes   'input:[4,77]' \
    --atol 1e-3 --rtol 1e-3 \
    --verbose \
    > ./build/log/result-poly-3.log 2>&1

ACKNOWLEDGE

The code for exporting the ONNX model is sourced from this repositoryCLIP-ONNX, and we are very grateful for the support it has provided to our work.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
clip_onnx		clip_onnx
clip_trt		clip_trt
.gitignore		.gitignore
CLIP.png		CLIP.png
LICENSE		LICENSE
README.md		README.md
benchmark.md		benchmark.md
clip_onnx_example.ipynb		clip_onnx_example.ipynb
clip_onnx_export.py		clip_onnx_export.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-TensorRT

Requirements

Usage

Running Foundational CLIP-TensorRT

Running Modified CLIP-TensorRT (Using Plugin)

If something doesn't work when export onnx

Additional Information

1. Using trtexec

2. Using polygraphy

ACKNOWLEDGE

About

Releases

Packages

Languages

License

xiaolu-luu/CLIP-TensorRT

Folders and files

Latest commit

History

Repository files navigation

CLIP-TensorRT

Requirements

Usage

Running Foundational CLIP-TensorRT

Running Modified CLIP-TensorRT (Using Plugin)

If something doesn't work when export onnx

Additional Information

1. Using trtexec

2. Using polygraphy

ACKNOWLEDGE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages