-
BUPT NIRC LAB
- Beijing haidian district west TuCheng Road 10, Beijing university of posts and telecommunications
-
23:39
(UTC +08:00) - https://www.bupt.edu.cn/
Highlights
- Pro
Stars
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Solve Visual Understanding with Reinforced VLMs
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
A library for advanced large language model reasoning
Frontier Multimodal Foundation Models for Image and Video Understanding
Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Witness the aha moment of VLM with less than $3.
Janus-Series: Unified Multimodal Understanding and Generation Models
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Official implementation of "Harnessing Large Language Models for Training-free Video Anomaly Detection", CVPR 2024
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
✨✨✨Official implementation of "Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity"
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
[AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Enhance Vision-Language Alignment with Noise (AAAI 2025)