Skip to content
View yzxing87's full-sized avatar

Block or report yzxing87

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let's finetune video generation models!

Python 416 15 Updated Feb 24, 2025

Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas

Python 3,791 549 Updated Jan 26, 2025

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 294 7 Updated Jan 19, 2025

Next-Token Prediction is All You Need

Python 2,020 78 Updated Oct 24, 2024

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 4,193 300 Updated Oct 5, 2024

High-resolution models for human tasks.

Python 4,858 289 Updated Nov 18, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 963 42 Updated Feb 1, 2025

Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"

Jupyter Notebook 363 27 Updated Sep 7, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,424 127 Updated Mar 3, 2025

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Python 31 Updated Feb 6, 2025

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,063 519 Updated Mar 7, 2025

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

896 11 Updated Jun 21, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,530 6,100 Updated Mar 7, 2025

Creative Commons Licenses for Github

580 306 Updated Dec 10, 2024

Pythonic bindings for FFmpeg's libraries.

Cython 2,684 380 Updated Feb 25, 2025

Text-to-3D Generation within 5 Minutes

Python 696 50 Updated Mar 10, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3,646 543 Updated Apr 24, 2024

[WIP] Layer Diffusion for WebUI (via Forge)

Python 3,986 343 Updated Aug 30, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 659 30 Updated Dec 2, 2024

Transparent Image Layer Diffusion using Latent Transparency

2,089 30 Updated Jun 16, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 2,992 189 Updated Oct 31, 2024
Python 3,879 254 Updated Mar 15, 2024

Official Code for MotionCtrl [SIGGRAPH 2024]

Python 1,407 75 Updated Feb 19, 2025

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 436 25 Updated Feb 24, 2025

Code for the paper "Pix2Video: Video Editing using Image Diffusion"

Python 69 5 Updated Oct 2, 2023

Focus on prompting and generating

Python 43,617 6,570 Updated Jan 24, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,713 2,385 Updated Aug 12, 2024

A feature-rich command-line audio/video downloader

Python 103,009 8,080 Updated Mar 5, 2025
Next