Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,275 538 Updated Mar 16, 2025

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

366 19 Updated Mar 10, 2025

VITA-MLLM / Long-VITA

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 254 29 Updated Mar 12, 2025

FanqingM / R1-Multimodal-Journey

A jounery to real multimodel R1 ! We are doing on large-scale experiment

Python 274 5 Updated Mar 8, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 3,262 257 Updated Mar 1, 2025

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,184 1,423 Updated Mar 10, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,133 140 Updated Mar 13, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 22,827 2,057 Updated Mar 16, 2025

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,164 235 Updated Feb 19, 2025

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,298 170 Updated Mar 4, 2025

guoxy25 / Ocean-OCR

Python 23 1 Updated Feb 7, 2025

baaivision / CapsFusion

[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale

Python 204 5 Updated Feb 27, 2024

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,311 73 Updated Jan 17, 2024

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,720 105 Updated Aug 29, 2023

opendatalab / WanJuan1.0

万卷1.0多模态语料

556 28 Updated Oct 20, 2023

opendatalab / laion5b-downloader

Python 108 10 Updated May 16, 2023

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 552 186 Updated Mar 14, 2025

google / perfetto

Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)

C++ 3,158 389 Updated Mar 15, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 452 73 Updated Mar 13, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,528 1,670 Updated Feb 26, 2025

showlab / ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1,094 68 Updated Mar 13, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,650 282 Updated Mar 15, 2025

farewellthree / PPLLaVA

Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"

Python 126 6 Updated Nov 19, 2024

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hedes1992

Achievements

Achievements

Block or report hedes1992

Starred repositories

volcengine / verl

allenai / open-instruct

JDAI-CV / fast-reid

DefTruth / CUDA-Learn-Notes

DCDmllm / Momentor

modelscope / ms-swift