-
Seoul National University
- Seoul, South Korea
- https://minjoong507.github.io/
- @minjoon507
Highlights
Stars
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
Official Repository of the paper "On the Consistency of Video Large Language Models in Temporal Comprehension".
📎 + 🦾 CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
SelecMix: Debiased Learning by Contradicting-pair Sampling (NeurIPS 2022)
Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning (ICML 2024)
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Code release for "Learning Video Representations from Large Language Models"
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
[EMNLP 2022] Official Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Code release for ActionFormer (ECCV 2022)
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
[WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"