Skip to content

Latest commit

 

History

History
47 lines (38 loc) · 2.26 KB

launch_modes.md

File metadata and controls

47 lines (38 loc) · 2.26 KB

Launch Modes

Triton Model Analyzer supports three different launch modes along with Triton Server. In the first two modes, Triton Inference Server will be launched by the Model Analyzer. In the third mode, it is assumed there is an already running instance of Triton Inference Server.

  1. tritonserver binary is available in $PATH. In this case you can use the --triton-launch-mode local flag. Model Analyzer will launch a Triton Inference Server that will be used for benchmarking the models.

  2. Using Docker API to launch Triton Inference Server container. If you are using this mode and you are using Model Analzyer inside a Docker container, make sure that the container is launched with appropriate flags. The following flags are mandatory for correct behavior:

    --gpus 1 -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged
    

    You should use --triton-launch-mode docker flag for the Model Analyzer to use this mode.

  3. Using an already running Triton Inference Server. This mode is beneficial when you want to use an already running Triton Inference Server. You should use --triton-launch-mode remote flag to use this mode. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint depending on your chosen client protocol using the --triton-grpc-endpoint, and --triton-http-endpoint flags. You should also make sure that same GPUs are available to the Inference Server and Model Analyzer and they are on the same machine. Model Analyzer does not currently support profiling remote GPUs. You should also start the Triton Server using --model-control-mode=explicit to use this mode.