Lists (23)
Sort Name ascending (A-Z)
3D
Audio/ASR
Audio/AudioSeparation
Audio/BaseModel
Audio/Data
Audio/TTS
Audio/VC
Image/BaseModel
Image/Detection
Image/reid
Image/Segmentation
ModelTraining
MultiModal/3DGen
MultiModal/BaseModel
MultiModal/ImageGen
MultiModal/TalkingHead
MultiModal/VideoGen
NeRF
NLP
Other
Python Tools
Video
Video/Data
Stars
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Python packaging and dependency management made easy
An extremely fast Python package and project manager, written in Rust.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
stock股票.获取股票数据,计算股票指标,筹码分布,识别股票形态,综合选股,选股策略,股票验证回测,股票自动交易,支持PC及移动设备。
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"
Colab for making Wav2Lip high quality and easy to use
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Project Page of 'GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction' [CVPR2019]
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]
[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
COLMAP - Structure-from-Motion and Multi-View Stereo
Instant neural graphics primitives: lightning fast NeRF and more
A book about Text-to-Speech (TTS) in Chinese.
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis