LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 357 16 Updated Jan 13, 2025

hulianyuyy / iLLaVA

iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models

Python 17 3 Updated Jan 29, 2025

BandarLabs / clickclickclick

A framework to enable autonomous android and computer use using any LLM (local or remote)

Python 352 39 Updated Feb 10, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,140 49 Updated Nov 16, 2024

maitrix-org / llm-reasoners

A library for advanced large language model reasoning

Python 1,898 166 Updated Feb 14, 2025

facebookresearch / coconut

Training Large Language Model to Reason in a Continuous Latent Space

Python 834 70 Updated Jan 24, 2025

lucidrains / coconut-pytorch

Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch

Python 152 9 Updated Dec 31, 2024

bytedance / Valley

Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.

Python 211 13 Updated Feb 9, 2025

LLaVA-VL / LLaVA-NeXT

Python 3,394 310 Updated Feb 13, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 5,278 595 Updated Feb 13, 2025

shawnricecake / Heima

Code for Heima

Python 27 3 Updated Feb 11, 2025

GAIR-NLP / LIMO

LIMO: Less is More for Reasoning

Python 569 20 Updated Feb 14, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 1,465 87 Updated Feb 12, 2025

ZhangXJ199 / TinyLLaVA-Video

A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.

Python 32 3 Updated Jan 31, 2025

RUCAIBox / Slow_Thinking_with_LLMs

A series of technical report on Slow Thinking with LLM

Python 393 20 Updated Feb 12, 2025

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)

Python 197 17 Updated Feb 14, 2025

mbzuai-oryx / GeoPixel

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding cap…

52 1 Updated Jan 24, 2025

RUCAIBox / Virgo

Forked from Richar-Du/Virgo

Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*

Python 85 3 Updated Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Danield21

Block or report Danield21

Stars

daixiangzi / Awesome-Token-Compress

MengLcool / SliMM

zhangfaen / finetune-Qwen2-VL

david-abel / simple_rl

EvolvingLMMs-Lab / open-r1-multimodal

winycg / CLIP-KD

ictnlp / LLaVA-Mini