Table of Contents
- Awesome Papers
- 3D Scene Understanding
- Open-Vocabulary Indoor Scene Understanding
- 3D Vision Grounding
- 3D Multimodal LLMs
- Awesome Datasets
- Basic Indoor Scenes
- Basic Outdor Scenes
- Language-assitant Tasks
- Datasets of Multimodal Instruction Tuning
Title | Venue | Date | Code | Demo |
---|---|---|---|---|
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation |
ECCV | 2024-07-18 | Github | - |
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding |
ECCV | 2024-07-13 | Github | - |
Title | Venue | Date | Code | Demo |
---|---|---|---|---|
A Unified Framework for 3D Scene Understanding |
Arxiv | 2024-07-03 | Github | - |
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding | CVPR | 2023 | - | - |
Title | Venue | Date | Code | Demo |
---|---|---|---|---|
Multi-branch Collaborative Learning Network for 3D Visual Grounding |
ECCV | 2024-07-10 | Github | - |
Title | Venue | Date | Code | Demo |
---|---|---|---|---|
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding |
ECCV | 2024-04-31 | Github | - |
- PointLLM: Empowering Large Language Models to Understand Point Clouds [Paper] [Homepage] [Github]
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [Paper] [Demo] [Github]
- 3D-LLM: Injecting the 3D World into Large Language Models (NeurIPS2023 Spotlight) (10TB Object data)[Paper] [Homepage] [Github]
- LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning[Paper] [Homepage] [Github]
- AN EMBODIED GENERALIST AGENT IN 3D WORLD[Paper] [Homepage] [Github]
- M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts[Paper] [Homepage]
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI [Paper] [Homepage]
- ODIN: A Single Model for 2D and 3D Perception[Paper] [Homepage]
- ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [Paper] [Github]
- ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding [Paper] [Github]
- OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding [Paper] [Github] [Homepage]
- CLIP 2 : Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [Paper] [Github]
- CLIP Goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition [Paper] [Github]
- CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training [Paper] [Github]
- Uni3D: Exploring Unified 3D Representation at Scale [Paper] [Github]
- MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation [Paper] [Github]
- OmniObject3D (CVPR 2023 Award Candidate): real-scanned 3D objects(6K), 190 classes [Paper] [Homepage]
- Objaverse-XL: 3D Objects(10M+) [Paper] [Homepage] [Dataset]
- Cap3D: 3D-Text pairs(660K) [Paper] [Download]
- ULIP - Objaverse Triplets: 3D Point Clouds(800K)-Images(10M)-Language(100M) Triplets, [Download]
- ULIP - ShapeNet Triplets: 3D Point Clouds(52.5K)-Images(3M)-Language(30M) Triplets,[Download]
- ScanRefer: 3D object localization in RGB-D scans using natural language
- SQA3D: 650 Scenes, 6.8K situations, 20.4k descriptions and 33.4k diverse reasoning questions for these situations[Paper] [Homepage]