Stars
Efficient Text-to-3D Generation via Semantic-enhanced Sparse-view Prompting with Hybrid Reconstruction
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
Adapter-Enhanced Hierarchical Cross-Modal Pre-training for Lightweight Medical Report Generation
Observation Driven Memory Synergistic Planning for Continuous Vision-Language Navigation
A consistent Med-VQA dataset, C-SLAKE , extended by Slake for further consistency assessment .
tenaflyyy / CoCoMeD
Forked from OpenMICG/CoCoMeDConsistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
Multigranularity Contrastive cross-modal collaborative Generation (MCG) model for Video QA
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
[NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
PyTorch implementation of ICLR 2020 paper "CLEVRER: CoLlision Events for Video REpresentation and Reasoning"
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
VaLM: Visually-augmented Language Modeling. ICLR 2023.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
End-to-End Object Detection with Transformers
tenaflyyy / ClipBERT
Forked from jayleicn/ClipBERT[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
tenaflyyy / hcrn-videoqa
Forked from thaolmk54/hcrn-videoqaImplementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)