Starred repositories
Solve Visual Understanding with Reinforced VLMs
[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
[CVPR 2024] The official implementation for "SemCity: Semantic Scene Generation with Triplane Diffusion"
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Official code for SGDet3D. SGDet3D: Semantics and Geometry Fusion for 3D Object Detection Using 4D Radar and Camera.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection (NeurIPS 2022)
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
HEDNet (NeurIPS 2023) & SAFDNet (CVPR 2024 Oral)
PointPWC-Net is a deep coarse-to-fine network designed for 3D scene flow estimation from 3D point clouds.
[RA-L & IROS'22] Self-Supervised Scene Flow Estimation with 4-D Automotive Radar
[CVPR 2023 Highlight 💡] Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
Inpaint anything using Segment Anything and inpainting models.
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
[ICRA2024] RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud
[ICRA 2024] Robust 3D Object Detection from LiDAR-Radar Point Clouds Via Cross-Modal Feature Augmentation
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[ICCV 2023] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
[CVPR 2020] VoVNet backbone networks for detectron2
Depth Estimation from Camera Image and mmWave Radar Point Cloud
Semantic Guided Depth Estimation with Transformers Using Monocular Camera and Sparse Radar