Skip to content
View sterzhang's full-sized avatar
😋
wish me good luck
😋
wish me good luck

Highlights

  • Pro

Block or report sterzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 1,737 134 Updated Jan 28, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 6,605 810 Updated Feb 1, 2025

A fork to add multimodal model training to open-r1

Python 196 11 Updated Jan 28, 2025

Fully open reproduction of DeepSeek-R1

Python 14,548 1,139 Updated Jan 31, 2025

RAGEN is the first open-source reproduction of DeepSeek-R1 for training agentic models via reinforcement learning.

Python 582 36 Updated Jan 30, 2025

Benchmarking LLMs' Gaming Ability in Multi-Agent Environments

Jupyter Notebook 65 Updated Jan 27, 2025

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

Jupyter Notebook 183 20 Updated Nov 26, 2024

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,045 193 Updated Feb 3, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 39,210 4,807 Updated Feb 1, 2025

Official inference repo for FLUX.1 models

Python 19,873 1,391 Updated Jan 31, 2025
Python 1,849 130 Updated Nov 8, 2024

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 311 30 Updated Jan 8, 2024

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 6,447 458 Updated Feb 1, 2025
Python 30 2 Updated Apr 18, 2024

(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

Python 26 Updated Sep 27, 2024

A minimal and universal controller for FLUX.1.

Python 1,143 75 Updated Jan 23, 2025

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Python 563 30 Updated Jan 26, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,853 229 Updated Jan 24, 2025

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,876 91 Updated Jan 26, 2025

Video-LlaVA fine-tune for CinePile evaluation

Jupyter Notebook 46 5 Updated Aug 8, 2024

[Siggraph Asia 2024] Follow-Your-Emoji: This repo is the official implementation of "Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation"

Python 358 28 Updated Sep 11, 2024

Fast and memory-efficient exact attention

Python 15,264 1,441 Updated Feb 3, 2025

A personal investigative project to track the latest progress in the field of multi-modal object tracking.

Python 144 14 Updated Jan 21, 2025

(arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives

Python 32 1 Updated Oct 31, 2024

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Jupyter Notebook 769 50 Updated Jul 30, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,146 620 Updated Sep 26, 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 787 59 Updated Oct 11, 2024

Official repository for the paper PLLaVA

Python 636 48 Updated Jul 28, 2024
Next