Lists (1)
Sort Name ascending (A-Z)
Stars
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Instant voice cloning by MIT and MyShell. Audio foundation model.
Stable Diffusion built-in to Blender
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
Real-time and accurate open-vocabulary end-to-end object detection
[CVPR2024] DisCo: Referring Human Dance Generation in Real World
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Code of [CVPR 2024] "Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling"
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
High-accuracy NLP parser with models for 11 languages.
[CVPR 2024] Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models, a no lighting baked texture generative model
[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
[ICCV 2023] Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Investigating CoT Reasoning in Autoregressive Image Generation
[CVPR2024] Official implementation of SplattingAvatar.
Create 3d rooms in blender from floorplans.
[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"
This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].
[ICLR 2024] Generalizable and Precise Head Avatar from Image(s)
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Large Motion Model for Unified Multi-Modal Motion Generation
[CVPR'23] Learning Neural Parametric Head Models
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation