-
HuaZhong University of Science and Technology
Highlights
- Pro
Stars
Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis
Codes for KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Multilingual Voice Understanding Model
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
Fusional approaches for temporal action localization in untrimmed videos
Span-based Localizing Network for Natural Language Video Localization (ACL 2020)
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
ACM Multimedia 2023 - Temporal Sentence in Streaming Videos
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[AAAI 2024] Prompt-based Distribution Alignment for Unsupervised Domain Adaptation
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
UniMD: Towards Unifying Moment retrieval and temporal action Detection
Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"