Skip to content
View shiml20's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report shiml20

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,436 76 Updated Feb 22, 2025

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Python 46 3 Updated Feb 18, 2025

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Python 42 2 Updated Feb 21, 2025

SkyReels V1: The first and most advanced open-source human-centric video foundation model

Python 1,409 119 Updated Feb 24, 2025
Python 377 9 Updated Dec 5, 2024
Python 203 10 Updated Feb 21, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 1,295 138 Updated Feb 23, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 12,628 1,340 Updated Feb 23, 2025

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Python 380 15 Updated Jan 24, 2025
Python 427 29 Updated Nov 26, 2024

Various AI scripts. Mostly Stable Diffusion stuff.

Python 4,061 459 Updated Feb 23, 2025

s1: Simple test-time scaling

Python 5,647 640 Updated Feb 23, 2025

Fully open reproduction of DeepSeek-R1

Python 21,292 1,870 Updated Feb 24, 2025
JavaScript 2,929 1,084 Updated Jun 21, 2024
Python 2,222 153 Updated Feb 24, 2025

Ongoing research training transformer models at scale

Python 11,492 2,581 Updated Feb 23, 2025

[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Python 70 10 Updated Feb 7, 2025

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

361 9 Updated Jan 17, 2025

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

250 16 Updated Jan 21, 2025
Python 416 29 Updated Mar 27, 2024

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Jupyter Notebook 972 67 Updated Mar 25, 2023

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,570 479 Updated Feb 12, 2025

“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.

Python 90 2 Updated Dec 23, 2024

Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

Python 57 Updated Dec 20, 2024

MoH: Multi-Head Attention as Mixture-of-Head Attention

Python 205 7 Updated Oct 29, 2024

一年过去了,你在华子食堂里花的钱都花在哪儿了?

Python 459 80 Updated Dec 23, 2024

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 33,260 4,850 Updated Feb 23, 2025

[NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model

Python 62 10 Updated Dec 5, 2024
Next