Stars
[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
Code for the paper "PointAttN: You Only Need Attention for Point Cloud Completion"
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Papers and Datasets about Point Cloud.
[MICCAI 2024] TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs
Video Object Segmentation using Space-Time Memory Networks
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Code for paper titled, "Learning to Predict Task Progress by Self-Supervised Video Alignment" by Gerard Donahue and Ehsan Elhamifar, published at CVPR 2024.
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
"Interaction-centric Spatio-Temporal Context Reasoning for Muti-Person Video HOI Recognition" ECCV 2024
Official Implementation of STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering, AAAI 2024
Official repository of ECCV 2024 paper - "HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization"
[CVPR 2021] Actor-Context-Actor Relation Network for Spatio-temporal Action Localization
Video Evnet Extraction via Tracking Visual States of Arguments (AAAI2023)
[ECCV 2024 oral] -C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Code release for Hu et al., Language-Conditioned Graph Networks for Relational Reasoning. in ICCV, 2019
[ACL 2024 Findings] LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
[CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
[TPAMI 2024] This is the Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding".
[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
[TGRS 2024] Language-Guided Progressive Attention for Visual Grounding in Remote Sensing Images.
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.