Depth estimation is the task of measuring the distance of each pixel relative to the camera. This repository contains a C++ implementation of the Depth-Anything model using the TensorRT API for real-time inference.
The inference time includes the pre-preprocessing and post-processing stages:
Device | Model | Model Input (WxH) | Image Resolution (WxH) | Inference Time(ms) |
---|---|---|---|---|
RTX4090 | Depth-Anything-S |
518x518 | 1280x720 | 3 |
RTX4090 | Depth-Anything-B |
518x518 | 1280x720 | 6 |
RTX4090 | Depth-Anything-L |
518x518 | 1280x720 | 12 |
Note that the inference was conducted using FP16
precision, with a warm-up period of 10 frames, and the reported time corresponds to the last inference.
Linux:
# infer image
./depth-anything-tensorrt-simplified depth_anything_vitb14.engine test.jpg
# infer folder(images)
./depth-anything-tensorrt-simplified depth_anything_vitb14.engine data
# infer video
./depth-anything-tensorrt-simplified depth_anything_vitb14.engine test.mp4 # the video path
Windows:
# infer image
./depth-anything-tensorrt-simplified.exe depth_anything_vitb14.engine test.jpg
# infer folder(images)
./depth-anything-tensorrt-simplified.exe depth_anything_vitb14.engine data
# infer video
./depth-anything-tensorrt-simplified.exe depth_anything_vitb14.engine test.mp4 # the video path
This project is based on the following projects:
- Depth-Anything - Unleashing the Power of Large-Scale Unlabeled Data.
- TensorRT - TensorRT samples and api documentation.