Stars
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
[NeurIPS'2024]: DiffGS: Functional Gaussian Splatting Diffusion
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Source code for <Large language models surpass human experts in predicting neuroscience results>
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Source codes collection for 3d vision 视觉三维重建领域的源码收集
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Train a 1B LLM with 1T tokens from scratch by personal
An Open Large Reasoning Model for Real-World Solutions
Real time interactive streaming digital human
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Jittor implementation of DiffPoseTalk(SIGGRAPH 2024)
A tiny soft-renderer built from scratch using C++ 11
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Official code for paper: Scaling Mesh Generation via Compressive Tokenization
The Scene Language: Representing Scenes with Programs, Words, and Embeddings (arXiv preprint)
Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instruction Following tasks in the Habitat simulator.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
A community-maintained Python framework for creating mathematical animations.
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation