Awesome-3D-Understanding

Table of Contents

Awesome Papers
- 3D Scene Understanding
- Open-Vocabulary Indoor Scene Understanding
- 3D Vision Grounding
- 3D Multimodal LLMs
Awesome Datasets
- Basic Indoor Scenes
- Basic Outdor Scenes
- Language-assitant Tasks
- Datasets of Multimodal Instruction Tuning

Awesome Papers

Open-Vocabulary Indoor Scene Understanding

Title	Venue	Date	Code	Demo
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation	ECCV	2024-07-18	Github	-
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding	ECCV	2024-07-13	Github	-

3D Scene Understanding

Title	Venue	Date	Code	Demo
A Unified Framework for 3D Scene Understanding	Arxiv	2024-07-03	Github	-
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding	CVPR	2023	-	-

3D Vision Grounding

Title	Venue	Date	Code	Demo
Multi-branch Collaborative Learning Network for 3D Visual Grounding	ECCV	2024-07-10	Github	-

3D Multimodal LLMs

Title	Venue	Date	Code	Demo
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding	ECCV	2024-04-31	Github	-

Object-level

PointLLM: Empowering Large Language Models to Understand Point Clouds [Paper] [Homepage] [Github]
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [Paper] [Demo] [Github]

Scenes-level

3D-LLM: Injecting the 3D World into Large Language Models (NeurIPS2023 Spotlight) (10TB Object data)[Paper] [Homepage] [Github]
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning[Paper] [Homepage] [Github]
AN EMBODIED GENERALIST AGENT IN 3D WORLD[Paper] [Homepage] [Github]
M3DBench: Let’s Instruct Large Models with Multi-modal 3D Prompts[Paper] [Homepage]
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI [Paper] [Homepage]
ODIN: A Single Model for 2D and 3D Perception[Paper] [Homepage]

3D With CLIP

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [Paper] [Github]
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding [Paper] [Github]
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding [Paper] [Github] [Homepage]
CLIP ² : Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [Paper] [Github]
CLIP Goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition [Paper] [Github]
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training [Paper] [Github]
Uni3D: Exploring Unified 3D Representation at Scale [Paper] [Github]
MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation [Paper] [Github]

3D-Dataset

Object-level

OmniObject3D (CVPR 2023 Award Candidate): real-scanned 3D objects(6K), 190 classes [Paper] [Homepage]
Objaverse-XL: 3D Objects(10M+) [Paper] [Homepage] [Dataset]
Cap3D: 3D-Text pairs(660K) [Paper] [Download]
ULIP - Objaverse Triplets: 3D Point Clouds(800K)-Images(10M)-Language(100M) Triplets, [Download]
ULIP - ShapeNet Triplets: 3D Point Clouds(52.5K)-Images(3M)-Language(30M) Triplets,[Download]

Scene-level

ScanRefer: 3D object localization in RGB-D scans using natural language
SQA3D: 650 Scenes, 6.8K situations, 20.4k descriptions and 33.4k diverse reasoning questions for these situations[Paper] [Homepage]

Survey

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation [Paper]
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-3D-Understanding

Awesome Papers

Open-Vocabulary Indoor Scene Understanding

3D Scene Understanding

3D Vision Grounding

3D Multimodal LLMs

Object-level

Scenes-level

3D With CLIP

3D-Dataset

Object-level

Scene-level

Survey

About

Releases

Packages

Yioutpi/Awesome-3D-Understanding

Folders and files

Latest commit

History

Repository files navigation

Awesome-3D-Understanding

Awesome Papers

Open-Vocabulary Indoor Scene Understanding

3D Scene Understanding

3D Vision Grounding

3D Multimodal LLMs

Object-level

Scenes-level

3D With CLIP

3D-Dataset

Object-level

Scene-level

Survey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages