Stars
[LCLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
PantoMatrix: Generating Face and Body Animation from Speech
"SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network" (ICRA 2023)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
[CVPR 2025] Official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors"
[CoRL 2022] SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Fully open reproduction of DeepSeek-R1
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
An open-source impl. of Large Reconstruction Models
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
Self-reimplemented version of Long-LRM.
[ECCV 2024] Implementation of latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacch…
CUDA accelerated rasterization of gaussian splatting
Official Implementation of "PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting"
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
[CVPR 2025] Official implementation of the paper "Generative Inbetweening through Frame-wise Conditions-Driven Video Generation"
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling