Skip to content
View Mr-xiu's full-sized avatar
  • BUPT NIRC LAB
  • Beijing haidian district west TuCheng Road 10, Beijing university of posts and telecommunications
  • 23:39 (UTC +08:00)

Highlights

  • Pro

Block or report Mr-xiu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 19 2 Updated Jul 22, 2024

✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Python 40 2 Updated Oct 17, 2024
HTML 2 Updated Nov 11, 2024
7 Updated Feb 16, 2025

Solve Visual Understanding with Reinforced VLMs

Python 3,550 210 Updated Feb 27, 2025

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,237 45 Updated Dec 11, 2024

A library for advanced large language model reasoning

Python 1,989 176 Updated Feb 21, 2025

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 557 35 Updated Feb 24, 2025

Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"

Python 60 6 Updated Dec 16, 2024

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 42,280 5,174 Updated Feb 27, 2025

Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

Python 719 54 Updated Feb 16, 2025

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 245 10 Updated Dec 22, 2024
Python 14 Updated Feb 3, 2025

Witness the aha moment of VLM with less than $3.

Python 2,916 232 Updated Feb 25, 2025

Tools for checking ACL paper submissions

Python 669 48 Updated Oct 20, 2024

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,408 2,158 Updated Feb 1, 2025

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,582 534 Updated Apr 24, 2024

Official implementation of "Harnessing Large Language Models for Training-free Video Anomaly Detection", CVPR 2024

Python 75 4 Updated Jul 15, 2024

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 120 11 Updated Dec 14, 2024

✨✨✨Official implementation of "Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity"

Python 22 2 Updated Jan 5, 2025

Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"

Python 105 4 Updated Jan 5, 2025

Official implement of MIA-DPO

Python 49 1 Updated Jan 23, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 367 14 Updated Jan 4, 2025

[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Python 36 1 Updated Dec 10, 2024

https://arxiv.org/abs/2408.02032

Python 97 6 Updated Jan 16, 2025

[AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation

Python 30 6 Updated Jan 14, 2025
Python 87 2 Updated Dec 30, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,087 72 Updated Jan 23, 2025

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,982 95 Updated Jan 26, 2025

Enhance Vision-Language Alignment with Noise (AAAI 2025)

Python 11 1 Updated Dec 19, 2024
Next