Skip to content
View Danield21's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Danield21

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A paper list of some recent works about Token Compress for Vit and VLM

314 16 Updated Feb 9, 2025
Python 17 1 Updated Dec 26, 2024

A simple framework for experimenting with Reinforcement Learning in Python.

Python 294 101 Updated Feb 27, 2024

A fork to add multimodal model training to open-r1

Python 632 34 Updated Feb 8, 2025

[CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation

Python 96 3 Updated Jul 5, 2024

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 357 16 Updated Jan 13, 2025

iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models

Python 17 3 Updated Jan 29, 2025

A framework to enable autonomous android and computer use using any LLM (local or remote)

Python 352 39 Updated Feb 10, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,140 49 Updated Nov 16, 2024

A library for advanced large language model reasoning

Python 1,898 166 Updated Feb 14, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 834 70 Updated Jan 24, 2025

Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch

Python 152 9 Updated Dec 31, 2024

Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.

Python 211 13 Updated Feb 9, 2025
Python 3,394 310 Updated Feb 13, 2025

s1: Simple test-time scaling

Python 5,278 595 Updated Feb 13, 2025

Code for Heima

Python 27 3 Updated Feb 11, 2025

LIMO: Less is More for Reasoning

Python 569 20 Updated Feb 14, 2025

Reproduce R1 Zero on Logic Puzzle

Python 1,465 87 Updated Feb 12, 2025

A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.

Python 32 3 Updated Jan 31, 2025

A series of technical report on Slow Thinking with LLM

Python 393 20 Updated Feb 12, 2025

Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)

Python 197 17 Updated Feb 14, 2025

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding cap…

52 1 Updated Jan 24, 2025

Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*

Python 85 3 Updated Jan 14, 2025

Everything you need to build state-of-the-art foundation models, end-to-end.

Python 7,091 506 Updated Feb 15, 2025

Fully open reproduction of DeepSeek-R1

Python 19,843 1,695 Updated Feb 14, 2025

An Open Source Toolkit For LLM Distillation

Python 485 52 Updated Jan 7, 2025

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 538 61 Updated Jun 7, 2024

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 10,003 1,295 Updated Feb 1, 2025
Next