Skip to content
View ruitian12's full-sized avatar
  • Fudan University
  • Shanghai, China

Block or report ruitian12

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 158 4 Updated Dec 13, 2024
Python 6 Updated Dec 14, 2024
Jupyter Notebook 4 Updated Dec 4, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,498 209 Updated Dec 13, 2024

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 175 3 Updated Oct 24, 2024

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 1,229 57 Updated Nov 13, 2024

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

344 19 Updated Oct 19, 2024

LPIPS metric. pip install lpips

Python 3,731 502 Updated Jul 2, 2024

Ongoing research training transformer models at scale

Python 10,807 2,416 Updated Dec 14, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 9,710 918 Updated Dec 13, 2024

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Python 640 33 Updated Dec 6, 2024

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Python 312 29 Updated Apr 20, 2024
Python 3,089 265 Updated Oct 16, 2024

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 1,325 118 Updated Dec 11, 2024

4M: Massively Multimodal Masked Modeling

Python 1,638 99 Updated Oct 7, 2024

Official inference repo for FLUX.1 models

Python 18,378 1,300 Updated Nov 21, 2024

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 33 2 Updated Jun 17, 2024

Example models using DeepSpeed

Python 6,149 1,050 Updated Dec 14, 2024

Latte: Latent Diffusion Transformer for Video Generation.

Python 1,731 180 Updated Sep 28, 2024

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 26,627 5,488 Updated Dec 14, 2024

VideoSys: An easy and efficient system for video generation

Python 1,811 126 Updated Dec 11, 2024

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Python 383 19 Updated Jul 5, 2024

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 771 30 Updated Dec 4, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,717 1,034 Updated Dec 13, 2024

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 272 7 Updated Jul 9, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,108 88 Updated Aug 6, 2024

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Python 2,669 213 Updated Sep 8, 2024

Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)

Python 270 17 Updated May 1, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,030 602 Updated Sep 26, 2024

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…

Python 6,169 418 Updated Dec 6, 2024
Next