Skip to content
View shiml20's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report shiml20

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
213 results for source starred repositories
Clear filter

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,449 76 Updated Feb 22, 2025

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Python 46 3 Updated Feb 18, 2025

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Python 42 2 Updated Feb 21, 2025

SkyReels V1: The first and most advanced open-source human-centric video foundation model

Python 1,432 123 Updated Feb 24, 2025
Python 377 9 Updated Dec 5, 2024
Python 203 10 Updated Feb 21, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 1,305 139 Updated Feb 23, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 12,682 1,348 Updated Feb 23, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Python 380 15 Updated Jan 24, 2025
Python 428 29 Updated Nov 26, 2024

Various AI scripts. Mostly Stable Diffusion stuff.

Python 4,063 459 Updated Feb 24, 2025

s1: Simple test-time scaling

Python 5,658 642 Updated Feb 23, 2025

Fully open reproduction of DeepSeek-R1

Python 21,312 1,872 Updated Feb 24, 2025
JavaScript 2,931 1,084 Updated Jun 21, 2024
Python 2,223 153 Updated Feb 24, 2025

Ongoing research training transformer models at scale

Python 11,499 2,583 Updated Feb 24, 2025

[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Python 70 10 Updated Feb 7, 2025

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

362 9 Updated Jan 17, 2025

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

250 16 Updated Jan 21, 2025
Python 416 29 Updated Mar 27, 2024

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Jupyter Notebook 972 67 Updated Mar 25, 2023

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,572 479 Updated Feb 12, 2025

“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.

Python 90 2 Updated Dec 23, 2024

Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

Python 58 Updated Dec 20, 2024

MoH: Multi-Head Attention as Mixture-of-Head Attention

Python 205 7 Updated Oct 29, 2024

一年过去了,你在华子食堂里花的钱都花在哪儿了?

Python 459 80 Updated Dec 23, 2024

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 33,260 4,850 Updated Feb 23, 2025

[NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model

Python 62 10 Updated Dec 5, 2024
Next