-
KAIST
- Daejeon, South Korea
- https://phillipinseoul.github.io/
- @yuseungleee
- in/yuseung-lee-6b085223a
Highlights
- Pro
Stars
Official implementation of Occupancy-Based Dual Contouring (SIGGRAPH Asia 2024).
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Papers, code and datasets about deep learning for 3D Object Detection.
Code for "Open Vocabulary Monocular 3D Object Detection"
Official repository for our work on micro-budget training of large-scale diffusion models.
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
[arXiv 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Official pytorch repository for “Guidance with Spherical Gaussian Constraint for Conditional Diffusion”
Official repository for PERSE: Personalized 3D Generative Avatars from A Single Portrait
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
ROOT: VLM based System for Indoor Scene Understanding and Beyond
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official repo and evaluation implementation of VSI-Bench
Code release for https://kovenyu.com/WonderWorld/
Code for FreeScale, a tuning-free method for higher-resolution visual generation
A generative world for general-purpose robotics & embodied AI learning.
A precise and stable CFG for negative prompts, derived via guided sampling with contrastive loss.