-
Seoul National University
- Seoul, Korea
- jjihwan.github.io
- in/jjihwan
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
Official PyTorch Implementation of "History-Guided Video Diffusion"
Fully open reproduction of DeepSeek-R1
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre-trained weights
Implementation of "Conditional Score Guidance for Text-Driven Image-to-Image Translation" (NeurIPS 2023).
Memory-optimized training scripts for video models based on Diffusers
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
SEED-Voken: A Series of Powerful Visual Tokenizers
The first open autoregressive foundational video AI model.
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
Official inference repo for FLUX.1 models
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Tips for Writing a Research Paper using LaTeX
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Offical codes for "AutoVFX: Physically Realistic Video Editing from Natural Language Instructions."
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223