Stars
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement
[preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
AnchorAttention: Improved attention for LLMs long-context training
Fast inference from large lauguage models via speculative decoding
[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
FocusLLM: Scaling LLM’s Context by Parallel Decoding
[EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"
Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
DepictQA: Depicted Image Quality Assessment with Vision Language Models
E5-V: Universal Embeddings with Multimodal Large Language Models
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
Official PyTorch implementation of "Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization" (ECCV 2024)
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
④[ECCV 2024 Oral, Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.
teowu / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalQ-Bench, Q-Bench+ and LongVideoBench for LMMs-Eval