[3DV 2025] LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation
End2EndPerception deployment solution based on vision sparse transformer paradigm is open sourced.
This is a collective repository for all 3DGS related progresses in research and industry world
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Code for paper "MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping", ECCV 2024 (Oral)
OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.
[ECCV 2024] RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
This repository is a paper digest of Transformer-related approaches in visual tracking tasks.
Offical implementation of CVPR2024 paper ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association.
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
A minimalist environment for decision-making in autonomous driving
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection (CVPR 2024)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Official PyTorch implementation of FB-BEV & FB-OCC - Forward-backward view transformation for vision-centric autonomous driving perception
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[AAAI 2024] BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
[IROS 2024 Oral Presentation] WidthFormer: Toward Efficient Transformer-based BEV View Transformation
[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
[ICCV 2023] Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
[NeurIPS 2023] Query-based Temporal Fusion with Explicit Motion for 3D Object Detection