[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
-
Updated
Feb 26, 2025 - Python
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".
[CVPR 2025] The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
EventGPT: Event Stream Understanding with Multimodal Large Language Models
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
🎉 The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.
The official GitHub page for the survey paper "A Survey on Multimodal Retrieval-Augmented Generation".
Multimodal Multi-agent Organization and Benchmarking
Add a description, image, and links to the mllms topic page so that developers can more easily learn about it.
To associate your repository with the mllms topic, visit your repo's landing page and select "manage topics."